• Stars
    star
    120
  • Rank 286,749 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created about 3 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Series of Notebooks on how to start with Kafka and Python

Python Jupyter Notebooks for Apache Kafka®

This is a series of Jupyter Notebooks on how to start with Apache Kafka® and Python. You can try these notebooks in order to learn the basic concepts of Apache Kafka in an environment containing markdown text, media and executable code on the same page.

The notebooks are based on a managed Apache Kafka instance created on Aiven's website, but can be also customised to any Apache Kafka instance running locally with SSL authentication. Aiven's offer 300$ of free credit that you can redeem by creating your account on Aiven's website.

If you have any question or improvement suggestion regarding the notebooks, please open an issue. Any contributions are welcome!

Start JupyterLab on Docker

You can access the notebooks via Jupyterlab, this example will be based on docker

  1. clone the repository
  2. open a terminal
  3. go to the folder where the repository has been cloned
  4. run the following
docker run --rm -p 8888:8888 \
  -e JUPYTER_ENABLE_LAB=yes  \
  -v "$PWD":/home/jovyan/work \
  jupyter/datascience-notebook

You'll see a folder named work on the top left, under it you'll find the list of notebooks.

Notebook Overview

This repository contains the following notebooks.

Notebook Details

The notebooks are divided per Apache Kafka functionality.

Create Managed Apache Kafka and PostgreSQL instances with Aiven.io

Create services

00 - Aiven Setup.ipynb notebook downloads Aiven's command line interface and creates an Apache Kafka and a PostgreSQL instance.

Please change <INSERT_TOKEN_HERE> and <INSERT_EMAIL_HERE> with a valid email address and token created on Aiven's website. The notebook creates the instances and also stores all the required connection credentials locally.

Produce and read Messages to Apache Kafka

Producer

01 - Producer.ipynb Creates a Python Apache Kafka Producer and produces the first messages. After the first message is produced, open the 02 - Consumer.ipynb notebook and pace it alongside the Producer.

Place consumer alongside the producer

02 - Consumer.ipynb reads from the topic where 01 - Producer wrote. But it does it from the point in time that it attaches to Apache Kafka, not going back to history.

Consumer

If you want to read messages created with 01 - Producer you need to run 02 - Consumer.ipynb's last code block before producing any messages on 01 - Producer. This behaviour is Apache Kafka's default and can be changed by adding a line 'auto.offset.reset'='earliest' to the consumer properties.

Understanding Apache Kafka Partitions

Partitions

Partitions is Apache Kafka are a way to divide messages belonging to the same topic in sub-logs.

  • 03 - 00 - Partition Producer.ipynb creates a topic with two partitions using KafkaAdmin and sends a message to each partition. We can then open both 03 - 01 - Consumer - Partition 0.ipynb and 03 - 02 - Consumer - Partition 1.ipynb which will read messages from Partition 0 and Partition 1 respectively.

New Consumer Group

Consumer groups

Messages in Apache Kafka are not deleted when read from a consumer. This makes them available for other consumers to be read. 04 - New Consumer Group.ipynb creates a new consumer part of the a new Consumer Group and reads from the topic where 01 - Producer wrote. We can check now, by sending a message from the 01 - Producer notebook, that we can receive it both in 02 - Consumer.ipynb and 04 - New Consumer Group.

Kafka Connect

Kafka Connect

Apache Kafka Connect® is a prebuilt framework enabling an easy integration of Apache Kafka with existing data sources or sinks. Aiven provides Kafka connect as managed service making the integration a matter of a single config file. 05 - Kafka Connect.ipynb: Creates a new Kafka topic containing messages with both schema and payload, and then pushes them to a PostgreSQL database via Apache Kafka Connect.

Delete Aiven Services

Delete services

Once you're done, you can delete all the services create on Aiven's website by executing the code in ON - Aiven - Delete Services.ipynb

Keep Reading

We maintain some other resources that you may also find useful:

License

This project is licensed under the Apache License, Version 2.0.

Apache Kafka is either a registered trademark or trademark of the Apache Software Foundation in the United States and/or other countries. Aiven has no affiliation with and is not endorsed by The Apache Software Foundation.

More Repositories

1

pghoard

PostgreSQL® backup and restore service
Python
1,224
star
2

karapace

Karapace - Your Apache Kafka® essentials in one tool
HTML
313
star
3

pglookout

PostgreSQL replication monitoring and failover daemon
Python
170
star
4

terraform-provider-aiven

Aiven Terraform Provider
Go
106
star
5

aiven-client

aiven-client (avn) is the official command-line client for Aiven
Python
84
star
6

pgtracer

Tracing tools for PostgreSQL, using eBPF
Python
83
star
7

myhoard

MySQL Backup and Point-in-time Recovery service
Python
81
star
8

jdbc-connector-for-apache-kafka

Aiven's JDBC Sink and Source Connectors for Apache Kafka®
Java
70
star
9

prometheus-exporter-plugin-for-opensearch

Prometheus exporter plugin for OpenSearch & OpenSearch Mixin
Java
61
star
10

python-fake-data-producer-for-apache-kafka

The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and push it to an Apache Kafka topic.
Python
59
star
11

devportal

Resources for users of the projects on the Aiven platform
PLpgSQL
56
star
12

gcs-connector-for-apache-kafka

Aiven's GCS Sink Connector for Apache Kafka®
Java
54
star
13

journalpump

systemd journald to aws_cloudwatch, elasticsearch, google cloud logging, kafka, rsyslog or logplex log sender
Python
51
star
14

http-connector-for-apache-kafka

Apache Kafka Connect sink connector for HTTP
Java
50
star
15

transforms-for-apache-kafka-connect

Aiven's collection of Single Message Transformations (SMTs) for Apache Kafka Connect
Java
48
star
16

klaw

Klaw, the latest OS tool by Aiven, helps enterprises cope with Apache Kafka(r) topics, schema registry and connectors governance by introducing roles/authorizations to users of various teams of an org.
Java
48
star
17

opensearch-connector-for-apache-kafka

Aiven's OpenSearch® Connector for Apache Kafka®
Java
44
star
18

s3-connector-for-apache-kafka

Aiven's S3 Sink Connector for Apache Kafka®
Java
43
star
19

aiven-examples

Aiven "getting started" code examples
Python
31
star
20

sql-cli-for-apache-flink-docker

SQL CLI for Apache Flink® via docker-compose
Dockerfile
31
star
21

astacus

Clustered database backup
Python
28
star
22

aiven-operator

Provision and manage Aiven Services from your Kubernetes cluster.
Go
20
star
23

aiven-db-migrate

Python
17
star
24

aiven-go-client

Aiven Golang API Client
Go
17
star
25

tsc-output-parser

Parses errors from tsc output to a structured JSON format
TypeScript
17
star
26

rohmu

Python library for database backups
Python
14
star
27

aiven-extras

Aiven PostgreSQL® extras
PLpgSQL
12
star
28

presentations

Public presentations given by the Aiven staff
Go
12
star
29

metadata-parser

A python tool scraping Aiven services metadata and building a connected graph
Python
12
star
30

tiered-storage-for-apache-kafka

RemoteStorageManager for Apache Kafka® Tiered Storage
Java
10
star
31

guardian-for-apache-kafka

Set of tools for creating backups, compaction and restoration of Apache Kafka® Clusters
HTML
10
star
32

aiven-laravel

Extend Laravel PHP framework to make working with Aiven databases simpler
PHP
8
star
33

aiven-benchmark

Aiven Benchmark Tools
Python
8
star
34

encrypted-repository-opensearch

Client side encryption plugin for Opensearch
Java
8
star
35

thingum-industries

Imaginary manufacturing company event-driven application examples
HTML
8
star
36

aiven-mysql-migrate

MySQL® migration tool
Python
8
star
37

aiven-pg-security

Aiven PostgreSQL® add-on for adding some DBaaS security
C
7
star
38

bigquery-connector-for-apache-flink

Apache Flink® connector for BigQuery
Java
7
star
39

pghostile

Pghostile is a tool to automate the exploitation of PostgreSQL® specific vulnerabilities that could lead to privilege escalation. It can be used to identify security issues in PostgreSQL extensions, to test system hardening and for security research in general.
Python
7
star
40

fake-data-producer-for-apache-kafka-docker

Fake Data Producer for Aiven for Apache Kafka® in a Docker Image
Shell
4
star
41

demo-opensearch-python

This repository contains code example in how to write search queries with OpenSearch Python client
Python
4
star
42

klaw-docs

Klaw documentation site
4
star
43

commons-for-apache-kafka-connect

Shared common functionality among Aiven's connectors for Apache Kafka®
Java
4
star
44

remirepo-redis

Shell
3
star
45

auth-for-apache-kafka

Aiven Authentication and Authorization Plugins for Apache Kafka®
Java
3
star
46

mastodon-to-kafka

Bringing data from Mastodon public timeline into Apache Kafka® topic with TypeScript
TypeScript
3
star
47

aiven-kafka-restore

Aiven Kafka Restore Utility
Python
3
star
48

aiven-charts

The official @Aiven helm repository
Smarty
3
star
49

rpm-s3-mirror

Python
2
star
50

elasticsearch-connector-for-apache-kafka

Apache Kafka Connect sink connector for Elasticsearch
Java
2
star
51

opensearch-migration-examples

Code examples showing how to change from Elasticsearch to OpenSearch® library dependencies.
Java
2
star
52

slack-connector-for-apache-flink

Apache Flink® connector for Slack
Java
2
star
53

uptime-conference-2022

Uptime 2022 - the conference on open source data in the cloud.
2
star
54

demo-open-search-node-js

This repository contains code examples from the tutorial on how to use OpenSearch with NodeJS
JavaScript
1
star
55

k8s-logging-demo

Quick getting started guide to send logs from Kubernetes to Elasticsearch
Dockerfile
1
star
56

aiven-repositories-for-opensearch

Java
1
star
57

influxql-to-m3-dashboard-converter

Grafana® dashboard converter tool to convert InfluxQL® using dashboards to use M3 (subset of) PromQL™ instead.
Python
1
star
58

go-api-schemas

A tool for generating and persisting user configuration option schemas from Aiven APIs
Go
1
star
59

cloud-cookie-recipe

The best all-in-one open source cloud cookie recipe for the best all-in-one open source cloud data platform.
1
star