• Stars
    star
    176
  • Rank 209,118 (Top 5 %)
  • Language
    Java
  • Created almost 11 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Hadoop on Mesos

Hadoop on Mesos

Build Status

Overview

To run Hadoop on Mesos you need to add the hadoop-mesos-0.1.0.jar library to your Hadoop distribution (any distribution that uses protobuf > 2.5.0) and set some new configuration properties. Read on for details.

The pom.xml included is configured and tested against CDH5 and MRv1. Hadoop on Mesos does not currently support YARN (and MRv2).

Prerequisites

To use the metrics feature (which uses the CodaHale Metrics library), you need to install libsnappy. The snappy-java package also includes a bundled version of libsnappyjava.

Build

You can build hadoop-mesos-0.1.0.jar using Maven:

mvn package

If successful, the JAR will be at target/hadoop-mesos-0.1.0.jar.

NOTE: If you want to build against a different version of Mesos than the default you'll need to update mesos-version in pom.xml.

We plan to provide already built JARs at http://repository.apache.org in the near future!

Package

You'll need to download an existing Hadoop distribution. For this guide, we'll use CDH5. First grab the tar archive and extract it.

wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.2.0.tar.gz
...
tar zxf hadoop-2.5.0-cdh5.2.0.tar.gz

Take note, the extracted directory is hadoop-2.5.0-cdh5.2.0.

Now copy hadoop-mesos-0.1.0.jar into the share/hadoop/common/lib folder.

cp /path/to/hadoop-mesos-0.1.0.jar hadoop-2.5.0-cdh5.2.0/share/hadoop/common/lib/

Since CDH5 includes both MRv1 and MRv2 (YARN) and is configured for YARN by default, we need update the symlinks to point to the correct directories.

cd hadoop-2.5.0-cdh5.2.0

mv bin bin-mapreduce2
mv examples examples-mapreduce2 
ln -s bin-mapreduce1 bin
ln -s examples-mapreduce1 examples

pushd etc
mv hadoop hadoop-mapreduce2
ln -s hadoop-mapreduce1 hadoop
popd

pushd share/hadoop
rm mapreduce
ln -s mapreduce1 mapreduce
popd

That's it! You now have a Hadoop on Mesos distribution!

Upload

You'll want to upload your Hadoop on Mesos distribution somewhere that Mesos can access in order to launch each TaskTracker. For example, if you're already running HDFS:

$ tar czf hadoop-2.5.0-cdh5.2.0.tar.gz hadoop-2.5.0-cdh5.2.0
$ hadoop fs -put hadoop-2.5.0-cdh5.2.0.tar.gz /hadoop-2.5.0-cdh5.2.0.tar.gz

Consider any permissions issues with your uploaded location (i.e., on HDFS you'll probably want to make the file world readable).

Now you'll need to configure your JobTracker to launch each TaskTracker on Mesos!

Configure

Along with the normal configuration properties you might want to set to launch a JobTracker, you'll need to set some Mesos specific ones too.

Here are the mandatory configuration properties for conf/mapred-site.xml (initialized to values representative of running in pseudo distributed operation:

<property>
  <name>mapred.job.tracker</name>
  <value>localhost:9001</value>
</property>
<property>
  <name>mapred.jobtracker.taskScheduler</name>
  <value>org.apache.hadoop.mapred.MesosScheduler</value>
</property>
<property>
  <name>mapred.mesos.taskScheduler</name>
  <value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value>
</property>
<property>
  <name>mapred.mesos.master</name>
  <value>localhost:5050</value>
</property>
<property>
  <name>mapred.mesos.executor.uri</name>
  <value>hdfs://localhost:9000/hadoop-2.5.0-cdh5.2.0.tar.gz</value>
</property>

More details on configuration propertios can be found here.

Start

Now you can start the JobTracker but you'll need to include the path to the Mesos native library.

On Linux:

$ MESOS_NATIVE_LIBRARY=/path/to/libmesos.so hadoop jobtracker

And on OS X:

$ MESOS_NATIVE_LIBRARY=/path/to/libmesos.dylib hadoop jobtracker

NOTE: You do not need to worry about distributing your Hadoop configuration! All of the configuration properties read by the JobTracker along with any necessary TaskTracker specific overrides will get serialized and passed to each TaskTracker on startup.

Containers

As of Mesos 0.19.0 you can now specify a container to be used when isolating a task on a Mesos Slave. If you're making use of this new container mechanism, you can configure the hadoop jobtracker to send a custom container image and set of options using two new JobConf options.

This is purely opt-in, so omitting these jobconf options will cause no ContainerInfo to be sent to Mesos. Also, if you don't use these options there's no requirement to use version 0.19.0 of the mesos native library.

This feature can be especially useful if your hadoop jobs have software dependencies on the slaves themselves, as using a container can isolate these dependencies between other users of a Mesos cluster.

It's important to note that the container/image you use does need to have the mesos native library installed already.

<property>
  <name>mapred.mesos.container.image</name>
  <value>docker:///ubuntu</value>
</property>
<property>
  <name>mapred.mesos.container.options</name>
  <value>-v,/foo/bar:/bar</value>
</property>

Please email [email protected] with questions!


More Repositories

1

chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
Scala
4,363
star
2

spark

Lightning-fast cluster computing in Java, Scala and Python.
Scala
1,426
star
3

mesos-go

Go language bindings for Apache Mesos
Go
545
star
4

mesos

PLEASE NOTE: Mesos is now hosted in Apache git! Get it using git clone https://git-wip-us.apache.org/repos/asf/mesos.git
C++
416
star
5

kafka

Apache Kafka on Apache Mesos
Scala
415
star
6

elasticsearch

Elasticsearch on Mesos
Java
244
star
7

storm

Storm on Mesos!
Java
138
star
8

mr-redis

Redis on Apache Mesos
Go
127
star
9

cloudfoundry-mesos

Cloud Foundry on Mesos Framework
Go
108
star
10

mesos_exporter

Prometheus Mesos Exporter
Go
104
star
11

logstash

Logstash on Mesos
Java
67
star
12

docker-compose-executor

DEPRECATED: Find the new docker compose executor here https://github.com/paypal/dce-go
Java
55
star
13

kibana

Kibana on Mesos
Shell
27
star
14

mesos-distcc

Distcc framework for Mesos.
Python
27
star
15

modules

Mesos modules examples and open source modules outside of the Apache Mesos source tree.
C++
26
star
16

cdh-mesos

Patched version of Cloudera's Distribution of Hadoop with Mesos support
Java
13
star
17

mih

Mesos-in-Hadoop: allows launching a Mesos cluster as a Hadoop job
Java
9
star
18

go-proto

Mesos protobuf bindings for Go.
Go
5
star
19

llvm

Mesos LLVM tools
Dockerfile
4
star
20

mesos-packaging

Packaging for Apache Mesos.
Shell
3
star
21

3rdparty

Collection of the 3rdparty dependencies bundled into Mesos.
3
star
22

homebrew-llvm

[DECPRECATED, USE https://github.com/mesos/llvm INSTEAD] Mesos LLVM tools for OS X
Ruby
1
star
23

mesos-rxjava

RxJava client for Apache Mesos HTTP APIs
Java
1
star