big-data-europe/docker-hadoop-spark-workbench

Stars
688
Rank 65,712 (Top 2 %)
Language Makefile
Created over 8 years ago
Updated about 4 years ago

big-data-europe/docker-hadoop-spark-workbench

big-data-europe

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

[EXPERIMENTAL] This repo includes deployment instructions for running HDFS/Spark inside docker containers. Also includes spark-notebook and HDFS FileBrowser.

How to use HDFS/Spark Workbench

To start an HDFS/Spark Workbench:

    docker-compose up -d

docker-compose does not work to scale up spark-workers, for distributed setup see swarm folder

Starting workbench with Hive support

Before starting the next command, check that the previous service is running correctly (with docker logs servicename).

docker-compose -f docker-compose-hive.yml up -d namenode hive-metastore-postgresql
docker-compose -f docker-compose-hive.yml up -d datanode hive-metastore
docker-compose -f docker-compose-hive.yml up -d hive-server
docker-compose -f docker-compose-hive.yml up -d spark-master spark-worker spark-notebook hue

Interfaces

Namenode: http://localhost:50070
Datanode: http://localhost:50075
Spark-master: http://localhost:8080
Spark-notebook: http://localhost:9001
Hue (HDFS Filebrowser): http://localhost:8088/home

Important

When opening Hue, you might encounter NoReverseMatch: u'about' is not a registered namespace error after login. I disabled 'about' page (which is default one), because it caused docker container to hang. To access Hue when you have such an error, you need to append /home to your URI: http://docker-host-ip:8088/home

Docs

Motivation behind the repo and an example usage @BDE2020 Blog

Count Example for Spark Notebooks

val spark = SparkSession
  .builder()
  .appName("Simple Count Example")
  .getOrCreate()

val tf = spark.read.textFile("/data.csv")
tf.count()

Maintainer

Ivan Ermilov @earthquakesan

Note: this repository was a part of BDE H2020 EU project and no longer actively maintained by the project participants.

docker-hadoop

Apache Hadoop docker image

docker-spark

Apache Spark docker image

docker-hive

docker-hbase

docker-flink

Apache Flink docker image

README

General README for the Big Data Europe project's sources

demo-spark-sensor-data

Demo Spark application to transform data gathered on sensors for a heatmap application

docker-kafka

docker-hive-metastore-postgresql

Postgresql configured to work as metastore for Hive.

app-bde-pipeline

Bootstrap a pipeline on the BDE platform

docker-zeppelin

docker-hdfs-filebrowser

A docker image for HDFS FileBrowser. Cloudera Hue with FileBrowser only.

docker-spark-notebook

Spark Notebook docker image

docker-flume

docker-zookeeper

docker-elasticsearch

Start Elasticsearch instance, initiate an index and submit the index schema (mappings)

app-bdi-ide

WorkFlow-Builder

Application to build and export Big Data pipelines

demo-integrator-ui

Showcase the demo for integrator UI with Hadoop, HDFS browser, Spark, Flink, Strabon, Sextant, Solr.

docker-ontario

Ontario: Ontology-based Architecture for Semantic Data Lakes

app-integrator-ui

Wrapping user interface for embedding pipeline component interfaces

app-stack-builder

Application which helps in the construction of docker-compose.yml files

mu-init-daemon-service

Microservice to report the progress of a service's initialization process

docker-event-detection

docker-strabon

pilot-sc6-cycle2

mu-swarm-admin-service

A microservice that allows BDE pipelines to be managed through a graph database

app-http-logger

Logging system to observe running containers, inspect their traffic and make it available for visualization in ElasticSearch

graph-acl-basics

Testing environment for graph-based ACL using the Mu Query Rewriter

docker-postgres

Dockerized postgres

vagrant-mesos-multinode

[DEPRECATED] Boot Mesos with Vagrant

pilot-sc7-change-detector

mu-query-rewriter

app-swarm-ui

Swarm User Interface based on docker-compose, mu.semte.ch and EmberJS

ember-stack-builder-frontend

Frontend for the Stack Builder

demo-d3js-with-sparqlendpoint

docker-nginx-proxy-with-css

Nginx proxy topping pages with a BDE CSS style

docker-elk-stack

ELK stack Dockers for BDE pipelines

docker-4store

mu-event-query-service

Microservice to query a DB for docker container events and return information in json format.

pilot-sc2-cycle1

mu-swarm-admin-proxy

The entrypoint of all pipelines

docker-solr

WorkFlow-Monitor

Ember frontend to monitor a BDE pipeline

mu-swarm-logger-service

Writes docker logs into the triplestore and/or into files

docker-kafkasail

vagrant-hadoop-singlenode

[DEPRECATED] Boot Hadoop with Vagrant

docker-geotriples-ws

mu-docker-stats

Microservice to fetch statistics data about the running containers to show it in the frontend for visual feedback.

mu-query-rewriter-sandbox

A sandbox application that allows people to check the query rewriter

pilot-sc7-geotriples

mu-pipeline-service

Provides resources to describe a Big Data pipeline in mu.semte.ch

mu-har-transformation-service

Transforms each pcap file in a given directory into .har files (json) and pushes them into an ELK instance

docker-kibana

Extended Kibana docker image with several plugins installed by default