• Stars
    star
    2,196
  • Rank 21,004 (Top 0.5 %)
  • Language
    Shell
  • Created over 8 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Apache Hadoop docker image

Gitter chat

Changes

Version 2.0.0 introduces uses wait_for_it script for the cluster startup

Hadoop Docker

Supported Hadoop Versions

See repository branches for supported hadoop versions

Quick Start

To deploy an example HDFS cluster, run:

  docker-compose up

Run example wordcount job:

  make wordcount

Or deploy in swarm:

docker stack deploy -c docker-compose-v3.yml hadoop

docker-compose creates a docker network that can be found by running docker network list, e.g. dockerhadoop_default.

Run docker network inspect on the network (e.g. dockerhadoop_default) to find the IP the hadoop interfaces are published on. Access these interfaces with the following URLs:

  • Namenode: http://<dockerhadoop_IP_address>:9870/dfshealth.html#tab-overview
  • History server: http://<dockerhadoop_IP_address>:8188/applicationhistory
  • Datanode: http://<dockerhadoop_IP_address>:9864/
  • Nodemanager: http://<dockerhadoop_IP_address>:8042/node
  • Resource manager: http://<dockerhadoop_IP_address>:8088/

Configure Environment Variables

The configuration parameters can be specified in the hadoop.env file or as environmental variables for specific services (e.g. namenode, datanode etc.):

  CORE_CONF_fs_defaultFS=hdfs://namenode:8020

CORE_CONF corresponds to core-site.xml. fs_defaultFS=hdfs://namenode:8020 will be transformed into:

  <property><name>fs.defaultFS</name><value>hdfs://namenode:8020</value></property>

To define dash inside a configuration parameter, use triple underscore, such as YARN_CONF_yarn_log___aggregation___enable=true (yarn-site.xml):

  <property><name>yarn.log-aggregation-enable</name><value>true</value></property>

The available configurations are:

  • /etc/hadoop/core-site.xml CORE_CONF
  • /etc/hadoop/hdfs-site.xml HDFS_CONF
  • /etc/hadoop/yarn-site.xml YARN_CONF
  • /etc/hadoop/httpfs-site.xml HTTPFS_CONF
  • /etc/hadoop/kms-site.xml KMS_CONF
  • /etc/hadoop/mapred-site.xml MAPRED_CONF

If you need to extend some other configuration file, refer to base/entrypoint.sh bash script.

More Repositories

1

docker-spark

Apache Spark docker image
Shell
2,036
star
2

docker-hive

Shell
1,020
star
3

docker-hadoop-spark-workbench

[EXPERIMENTAL] This repo includes deployment instructions for running HDFS/Spark inside docker containers. Also includes spark-notebook and HDFS FileBrowser.
Makefile
688
star
4

docker-hbase

Makefile
246
star
5

docker-flink

Apache Flink docker image
Shell
191
star
6

README

General README for the Big Data Europe project's sources
83
star
7

demo-spark-sensor-data

Demo Spark application to transform data gathered on sensors for a heatmap application
Java
33
star
8

docker-kafka

Shell
31
star
9

docker-hive-metastore-postgresql

Postgresql configured to work as metastore for Hive.
TSQL
30
star
10

app-bde-pipeline

Bootstrap a pipeline on the BDE platform
Elixir
26
star
11

docker-zeppelin

Makefile
25
star
12

docker-hdfs-filebrowser

A docker image for HDFS FileBrowser. Cloudera Hue with FileBrowser only.
Mako
11
star
13

docker-spark-notebook

Spark Notebook docker image
Makefile
10
star
14

docker-flume

Python
8
star
15

docker-zookeeper

[DEPRECATED]
Shell
8
star
16

docker-elasticsearch

Start Elasticsearch instance, initiate an index and submit the index schema (mappings)
Shell
8
star
17

app-bdi-ide

Common Lisp
7
star
18

WorkFlow-Builder

Application to build and export Big Data pipelines
Elixir
7
star
19

demo-integrator-ui

Showcase the demo for integrator UI with Hadoop, HDFS browser, Spark, Flink, Strabon, Sextant, Solr.
Shell
6
star
20

docker-ontario

Ontario: Ontology-based Architecture for Semantic Data Lakes
5
star
21

app-integrator-ui

Wrapping user interface for embedding pipeline component interfaces
JavaScript
5
star
22

app-stack-builder

Application which helps in the construction of docker-compose.yml files
Common Lisp
4
star
23

mu-init-daemon-service

Microservice to report the progress of a service's initialization process
Ruby
4
star
24

docker-event-detection

Shell
4
star
25

docker-strabon

Shell
4
star
26

pilot-sc6-cycle2

Shell
3
star
27

mu-swarm-admin-service

A microservice that allows BDE pipelines to be managed through a graph database
Python
3
star
28

app-http-logger

Logging system to observe running containers, inspect their traffic and make it available for visualization in ElasticSearch
Shell
3
star
29

graph-acl-basics

Testing environment for graph-based ACL using the Mu Query Rewriter
Common Lisp
2
star
30

docker-postgres

Dockerized postgres
Shell
2
star
31

vagrant-mesos-multinode

[DEPRECATED] Boot Mesos with Vagrant
Shell
2
star
32

pilot-sc7-change-detector

Java
2
star
33

mu-query-rewriter

Scheme
2
star
34

app-swarm-ui

Swarm User Interface based on docker-compose, mu.semte.ch and EmberJS
Common Lisp
2
star
35

ember-stack-builder-frontend

Frontend for the Stack Builder
JavaScript
2
star
36

demo-d3js-with-sparqlendpoint

JavaScript
2
star
37

docker-nginx-proxy-with-css

Nginx proxy topping pages with a BDE CSS style
CSS
2
star
38

docker-elk-stack

ELK stack Dockers for BDE pipelines
2
star
39

docker-4store

Shell
2
star
40

mu-event-query-service

Microservice to query a DB for docker container events and return information in json format.
Python
1
star
41

pilot-sc2-cycle1

Scala
1
star
42

mu-swarm-admin-proxy

The entrypoint of all pipelines
1
star
43

docker-solr

1
star
44

WorkFlow-Monitor

Ember frontend to monitor a BDE pipeline
JavaScript
1
star
45

mu-swarm-logger-service

Writes docker logs into the triplestore and/or into files
Python
1
star
46

docker-kafkasail

1
star
47

vagrant-hadoop-singlenode

[DEPRECATED] Boot Hadoop with Vagrant
Shell
1
star
48

docker-geotriples-ws

1
star
49

mu-docker-stats

Microservice to fetch statistics data about the running containers to show it in the frontend for visual feedback.
Python
1
star
50

mu-query-rewriter-sandbox

A sandbox application that allows people to check the query rewriter
JavaScript
1
star
51

pilot-sc7-geotriples

Java
1
star
52

mu-pipeline-service

Provides resources to describe a Big Data pipeline in mu.semte.ch
Common Lisp
1
star
53

mu-har-transformation-service

Transforms each pcap file in a given directory into .har files (json) and pushes them into an ELK instance
Python
1
star
54

docker-kibana

Extended Kibana docker image with several plugins installed by default
1
star