awesome-kafka
This list is for anyone wishing to learn about Apache Kafka, but do not have a starting point.
You can help by sending Pull Requests to add more information.
If you're not inclined to make PRs, you can tweet me at @infoslack
Table of Contents
Articles
- http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/ - Apache Kafka for Beginners
- http://sysadvent.blogspot.com.br/2014/12/day-4-introduction-to-kafka.html - Introduction to Kafka
- https://medium.com/@ruurtjan/understanding-kafka-with-factorio-74e8fc9bf181 - Understanding Kafka with Factorio
- http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple - Introducing Kafka Streams: Stream Processing Made Simple
- https://medium.com/swlh/apache-kafka-in-a-nutshell-5782b01d9ffb - Apache Kafka in a Nutshell
- http://www.confluent.io/blog/apache-kafka-samza-and-the-unix-philosophy-of-distributed-data - Apache Kafka, Samza, and the Unix Philosophy of Distributed Data
- http://www.confluent.io/blog/using-logs-to-build-a-solid-data-infrastructure-or-why-dual-writes-are-a-bad-idea/ - Using logs to build a solid data infrastructure (or: why dual writes are a bad idea)
- http://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/ - Bottled Water: Real-time integration of PostgreSQL and Kafka
- http://www.confluent.io/blog/deploying-apache-kafka-on-aws-elastic-block-store-ebs - Deploying Apache Kafka on AWS Elastic Block Store (EBS)
- http://www.confluent.io/blog/how-we-monitor-and-run-kafka-at-scale-signalfx - How We Monitor and Run Kafka At Scale
- https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/ - Monitoring Kafka performance metrics
- http://oobaloo.co.uk/kafka-for-uswitchs-event-pipeline - Kafka for uSwitch's Event Pipeline
- http://blog.infochimps.com/2012/10/30/next-gen-real-time-streaming-storm-kafka-integration/ - Next Gen Real-time Streaming with Storm-Kafka Integration
- https://medium.com/netflix-techblog/search?q=kafka - all netflix tech blog kafka tagged articles
- https://engineering.linkedin.com/blog/topic/kafka - all linked in tech blog kafka tagged articlesEEEEE
- https://medium.com/@stephane.maarek/how-to-use-apache-kafka-to-transform-a-batch-pipeline-into-a-real-time-one-831b48a6ad85 - How to use Apache Kafka to transform a batch pipeline into a real-time one
- https://dev.to/barryosull/event-sourcing-what-it-is-and-why-its-awesome - Event Sourcing: What it is and why it's awesome
- https://www.confluent.io/blog/getting-started-with-rust-and-kafka - Using Kafka with Rust
- https://medium.com/@gayathrisiva8608/apache-kafka-is-a-distributed-streaming-system-that-can-publish-and-subscribe-a-stream-of-records-12af92e2d491 - Apache Kafka-Intro
- https://www.loginradius.com/engineering/blog/stream-processing-using-kafka/ - Introduction to Stream Processing using Kafka Streams
- https://jaehyeon.me/blog/2022-03-07-schema-registry-part1/ - Use External Schema Registry With MSK Connect – Part 1 Local Development
- https://jaehyeon.me/blog/2022-04-03-schema-registry-part2/ - Use External Schema Registry With MSK Connect – Part 2 MSK Deployment
- https://jaehyeon.me/blog/2023-01-10-kafka-consumer-seek-offsets/ - How to Configure Kafka Consumers to Seek Offsets by Timestamp
- https://jaehyeon.me/blog/2023-02-08-simplify-streaming-ingestion-redshift/ - Simplify Streaming Ingestion on AWS – Part 1 MSK and Redshift
- https://jaehyeon.me/blog/2023-03-14-simplify-streaming-ingestion-athena/ - Simplify Streaming Ingestion on AWS – Part 2 MSK and Athena
- https://jaehyeon.me/blog/2023-04-12-integrate-glue-schema-registry/ - Integrate Glue Schema Registry With Your Python Kafka App
- https://jaehyeon.me/blog/2021-12-05-datalake-demo-part1/ - Data Lake Demo Using Change Data Capture (CDC) on AWS – Part 1 Local Development
- https://jaehyeon.me/blog/2021-12-12-datalake-demo-part2/ - Data Lake Demo Using Change Data Capture (CDC) on AWS – Part 2 Implement CDC
- https://jaehyeon.me/blog/2021-12-19-datalake-demo-part3/ - Data Lake Demo Using Change Data Capture (CDC) on AWS – Part 3 Implement Data Lake
Books
- http://www.confluent.io/making-sense-of-stream-processing-ebook - Making sense of stream processing
- http://shop.oreilly.com/product/0636920032175.do - Designing Data-Intensive Applications
- http://apachekafkabook.com/ - Effective Kafka - A Hands-On Guide to Building Robust and Scalable Event-Driven Applications
- https://www.confluent.io/wp-content/uploads/confluent-kafka-definitive-guide-complete.pdf - Kafka: The Definitive Guide
- https://www.manning.com/books/kafka-in-action - A fast-paced introduction to every aspect of working with Kafka you need to really reap its benefits.
- https://www.manning.com/books/kafka-streams-in-action - By the end of the book, you'll be ready to use Kafka Streams in your projects to reap the benefits of the insight your data holds quickly and easily.
- https://www.manning.com/books/kafka-streams-in-action-second-edition - The book that teaches you to implement stream processing within the Kafka platform.
Courses
- Udemy Apache Kafka Series - Learning Apache Kafka for Beginners - All Level
- Udemy Apache Kafka Series - Kafka Connect Hands-on Learning - Intermediate
- Udemy Apache Kafka Series - Kafka Streams for Data Processing - Intermediate
- Udemy Apache Kafka Series - Kafka Cluster Setup & Administration - Expert
- Udemy Apache Kafka Series - Confluent Schema Registry & REST Proxy - Intermediate
Papers
- http://www.vldb.org/pvldb/vol8/p1654-wang.pdf - Building a Replicated Logging System with Apache Kafka
- http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf - Kafka: a Distributed Messaging System for Log Processing
- http://sites.computer.org/debull/A12june/pipeline.pdf - Building LinkedIn’s Real-time Activity Data Pipeline
- https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying - The Log: What every software engineer should know about real-time data's unifying abstraction
Presentations
- http://www.slideshare.net/charmalloc/developing-with-the-go-client-for-apache-kafka - Developing with the Go client for Apache Kafka
- http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign - Apache Kafka 0.8 basic training - Verisign
- http://www.slideshare.net/charmalloc/developingwithapachekafka-29910685 - Developing Real-Time Data Pipelines with Apache Kafka
- http://www.slideshare.net/AmazonWebServices/infrastructure-at-scale-apache-kafka-twitter-storm-elastic-search-arc303-aws-reinvent-2013 - Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303) | AWS re:Invent 2013
- http://www.slideshare.net/charmalloc/real-timestreamingdata-pipelinesapachekafka - Real-time streaming and data pipelines with Apache Kafka
- http://www.slideshare.net/Hadoop_Summit/building-a-realtime-data-pipeline-apache-kafka-at-linkedin - Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
- http://www.slideshare.net/junrao/kafka-replication-apachecon2013 - Kafka replication apachecon_2013
- http://www.slideshare.net/mumrah/kafka-talk-tri-hug - Introduction and Overview of Apache Kafka
Talks
- https://www.youtube.com/watch?v=qc33qMUvR7c - Introduction to Apache Kafka by Joe Stein
- https://www.youtube.com/watch?v=9RMOc0SwRro - Apache Kafka and the Next 700 Stream Processing Systems by Jay Kreps
- https://martin.kleppmann.com/2015/05/27/logs-for-data-infrastructure.html - Transcript of Martin Kleppmann (Linkedin) - Using Logs To Build a Solid Data Infrastructure
- https://www.youtube.com/watch?v=aJuo_bLSW6s - I
♥ Logs: Apache Kafka and Real-Time Data Integration - https://www.youtube.com/watch?v=InAKDEk7H0M - Apache Kafka: Real-time Streaming and Data Pipelines with Apache Kafka by Joe Stein
- https://vimeo.com/63040812 - AJUG - Apache Kafka - Chris Curtin
- https://www.youtube.com/playlist?list=PLkz1SCf5iB4enAR00Z46JwY9GGkaS2NON - Getting started guide on Apache Kafka by Learning Journal
- https://youtu.be/qoeuufklgUE - the basics of what a KStream, KTable, and GlobalKTable
Tools
-
Firehose - Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.
-
Dagger - Dagger is an easy-to-use, SQL framework to process built on top of Apache Flink for stateful processing of Kafka data.
-
Raccoon - Raccoon is a high-throughput, low-latency service to collect events in real-time from your web, mobile apps, and services using multiple network protocols and publish to Kafka.
-
Stencil - Stencil is a Protobuf schema registry for Kafka that enables you to create, maintain and consume protobuf messages and APIs dynamically, efficiently, and reliably.
-
Meteor - Meteor is a scalable, easy-to-use, extensible metadata collection framework from the different cloud providers and on-prem sources and publish to Kafka.
-
Logit.io - Logit.io allows you to send logs and metrics from Kafka for centralised monitoring, alerting and analysis.
-
Zilla - An API gateway built for event-driven architectures and streaming that supports standard protocols such as HTTP, SSE, gRPC, MQTT, and the native Kafka protocol.
-
https://github.com/kafka-ops/kafka-topology-builder Gitops and Automation for Apache Kafka
-
Strimzi Operator for deploying and running Apache Kafka on Kubernetes and OpenShift
-
kafkacat Generic CLI non-JVM Apache Kafka producer and consumer
-
connectctl Manage kafka connect connectors easily
-
Kafdrop Web UI for browsing Kafka topics and consumer groups
-
KokpitBasic Kafka client that does publish subscribe and runs as a Windows application
-
Strimzi Kafka CLI A CLI for Strimzi Kafka Operator
-
Kowl A modern Kafka WebUI written in Go & React
-
UI for Apache Kafka A modern Kafka WebUI written in Java & React
-
Franz A native macOS GUI client for Kafka
Docker Compose
- Kafka Cluster Kraft Mode - DockerCompose - Workable kafka cluster with kraft mode using docker-compose