• Stars
    star
    120
  • Rank 295,983 (Top 6 %)
  • Language
  • License
    MIT License
  • Created over 5 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)

GitHub GitHub GitHub

One Click Deploy: Kafka Spark Streaming with Zeppelin UI

This repository contains a docker-compose stack with Kafka and Spark Streaming, together with monitoring with Kafka Manager and a Grafana Dashboard. The networking is set up so Kafka brokers can be accessed from the host.

It also comes with a producer-consumer example using a small subset of the US Census adult income prediction dataset.

High level features:

Monitoring with grafana

Zeppelin UI

Kafka access from host

Multiple spark interpreters

Detail Summary

Container Image Tag Accessible
zookeeper wurstmeister/zookeeper latest 172.25.0.11:2181
kafka1 wurstmeister/kafka 2.12-2.2.0 172.25.0.12:9092 (port 8080 for JMX metrics)
kafka1 wurstmeister/kafka 2.12-2.2.0 172.25.0.13:9092 (port 8080 for JMX metrics)
kafka_manager hlebalbau/kafka_manager 1.3.3.18 172.25.0.14:9000
prometheus prom/prometheus v2.8.1 172.25.0.15:9090
grafana grafana/grafana 6.1.1 172.25.0.16:3000
zeppelin apache/zeppelin 0.8.1 172.25.0.19:8080

Quickstart

The easiest way to understand the setup is by diving into it and interacting with it.

Running Docker Compose

To run docker compose simply run the following command in the current folder:

docker-compose up -d

This will run deattached. If you want to see the logs, you can run:

docker-compose logs -f -t --tail=10

To see the memory and CPU usage (which comes in handy to ensure docker has enough memory) use:

docker stats

Accessing the notebook

You can access the default notebook by going to http://172.25.0.19:8080/#/notebook/2EAB941ZD. Now we can start running the cells.

1) Setup

Install python-kafka dependency

2) Producer

We have an interpreter called %producer.pyspark that we'll be able to run in parallel.

Load our example dummy dataset

We have made available a 1000-row version of the US Census adult income prediction dataset.

Start the stream of rows

We now take one row at random, and send it using our python-kafka producer. The topic will be created automatically if it doesn't exist (given that auto.create.topics.enable is set to true).

3) Consumer

We now use the %consumer.pyspark interpreter to run our pyspark job in parallel to the producer.

Connect to the stream and print

Now we can run the spark stream job to connect to the topic and listen to data. The job will listen for windows of 2 seconds and will print the ID and "label" for all the rows within that window.

4) Monitor Kafka

We can now use the kafka manager to dive into the current kafka setup.

Setup Kafka Manager

To set up kafka manager we need to configure it. In order to do this, access http://172.25.0.14:9000/addCluster and fill up the following two fields:

  • Cluster name: Kafka
  • Zookeeper hosts: 172.25.0.11:2181

Optionally:

  • You can tick the following;
    • Enable JMX Polling
    • Poll consumer information

Access the topic information

If your cluster was named "Kafka", then you can go to http://172.25.0.14:9000/clusters/Kafka/topics/default_topic, where you will be able to see the partition offsets. Given that the topic was created automatically, it will have only 1 partition.

Visualise metrics in Grafana

Finally, you can access the default kafka dashboard in Grafana (username is "admin" and password is "password") by going to http://172.25.0.16:3000/d/xyAGlzgWz/kafka?orgId=1

More Repositories

1

awesome-production-machine-learning

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
17,336
star
2

awesome-artificial-intelligence-guidelines

This repository aims to map the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
1,245
star
3

xai

XAI - An eXplainability toolbox for machine learning
Python
1,098
star
4

explainability-and-bias

HTML
102
star
5

fml-security

Practical examples of "Flawed Machine Learning Security" together with ML Security best practice across the end to end stages of the machine learning model lifecycle from training, to packaging, to deployment.
Python
97
star
6

state-of-mlops-2020

JavaScript
60
star
7

ethical

This is the live website of The Institute for Ethical AI & ML, as well as The 8 Principles for Machine Learning.
HTML
47
star
8

sml-security

MLOps Cookiecutter Template: A Base Project Structure for Secure Production ML Engineering
Python
39
star
9

awesome-annual-reviews-and-trends

A curated list of awesome year-in-review and annual trends / predictions for 2022, 2023 and beyond 🚀
33
star
10

machine-learning-principles-in-slides

JavaScript
12
star
11

reproducible-machine-learning

List of articles that contain best practices about reproducible machine learning
9
star
12

ai-rfx-procurement-framework

7
star
13

ai-risk-and-impact

JavaScript
7
star
14

apc2018-ethical-framework-talk

JavaScript
5
star
15

meditations-on-first-deployment

JavaScript
4
star
16

strategy-2018

JavaScript
4
star
17

ethically-aligned-licenses

3
star
18

iceml2019

HTML
2
star