• Stars
    star
    2,169
  • Rank 20,433 (Top 0.5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Demo applications and code examples for Apache Kafka's Streams API.

Kafka Streams Examples

This project contains code examples that demonstrate how to implement real-time applications and event-driven microservices using the Streams API of Apache Kafka aka Kafka Streams.

For more information take a look at the latest Confluent documentation on the Kafka Streams API, notably the Developer Guide


Table of Contents


This repository has several branches to help you find the correct code examples for the version of Apache Kafka and/or Confluent Platform that you are using. See Version Compatibility Matrix below for details.

There are three kinds of examples:

  • Examples under src/main/: These examples are short and concise. Also, you can interactively test-drive these examples, e.g. against a local Kafka cluster. If you want to actually run these examples, then you must first install and run Apache Kafka and friends, which we describe in section Packaging and running the examples. Each example also states its exact requirements and instructions at the very top.
  • Examples under src/test/: These examples should test applications under src/main/. Unit Tests with TopologyTestDriver test the stream logic without external system dependencies. The integration tests use an embedded Kafka clusters, feed input data to them (using the standard Kafka producer client), process the data using Kafka Streams, and finally read and verify the output results (using the standard Kafka consumer client). These examples are also a good starting point to learn how to implement your own end-to-end integration tests.
  • Ready-to-run Docker Examples: These examples are already built and containerized.

Additional examples may be found under src/main/.

Application Name Concepts used Java 8+ Java 7+ Scala
WordCount DSL, aggregation, stateful Java 8+ example Scala Example
MapFunction DSL, stateless transformations, map() Java 8+ example Scala Example
SessionWindows Sessionization of user events, user behavior analysis Java 7+ example
GlobalKTable join() between KStream and GlobalKTable Java 8+ example
GlobalStore "join" between KStream and GlobalStore Java 8+ example
PageViewRegion join() between KStream and KTable Java 8+ example Java 7+ example
PageViewRegionGenericAvro Working with data in Generic Avro format Java 8+ example Java 7+ example
WikipediaFeedSpecificAvro Working with data in Specific Avro format Java 8+ example Java 7+ example
SecureKafkaStreams Secure, encryption, client authentication Java 7+ example
Sum DSL, stateful transformations, reduce() Java 8+ example
WordCountInteractiveQueries Interactive Queries, REST, RPC Java 8+ example
KafkaMusic Interactive Queries, State Stores, REST API Java 8+ example
ApplicationReset Application Reset Tool kafka-streams-application-reset Java 8+ example
Microservice Microservice ecosystem, state stores, dynamic routing, joins, filtering, branching, stateful operations Java 8+ example

The stream processing of Kafka Streams can be unit tested with the TopologyTestDriver from the org.apache.kafka:kafka-streams-test-utils artifact. The test driver allows you to write sample input into your processing topology and validate its output.

See the documentation at Testing Streams Code.

We also provide several integration tests, which demonstrate end-to-end data pipelines. Here, we spawn embedded Kafka clusters and the Confluent Schema Registry, feed input data to them (using the standard Kafka producer client), process the data using Kafka Streams, and finally read and verify the output results (using the standard Kafka consumer client).

Additional examples may be found under src/test/.

Tip: Run mvn test to launch the tests.

Integration Test Name Concepts used Java 8+ Java 7+ Scala
WordCount DSL, aggregation, stateful Java 8+ Example Scala Example
WordCountInteractiveQueries Interactive Queries, REST, RPC Java 7+ Example
Aggregate DSL, groupBy(), aggregate() Java 8+ Example Scala Example
CustomStreamTableJoin DSL, Processor API, Transformers Java 8+ Example
EventDeduplication DSL, Processor API, Transformers Java 8+ Example
GlobalKTable DSL, global state Java 7+ Example
GlobalStore DSL, global state, Transformers Java 7+ Example
HandlingCorruptedInputRecords DSL, flatMap() Java 8+ Example
KafkaMusic (Interactive Queries) Interactive Queries, State Stores, REST API Java 7+ Example
MapFunction DSL, stateless transformations, map() Java 8+ Example
MixAndMatch DSL + Processor API Integrating DSL and Processor API Java 8+ Example
PassThrough DSL, stream(), to() Java 7+ Example
PoisonPill DSL, flatMap() Java 8+ Example
ProbabilisticCounting*** DSL, Processor API, custom state stores Scala Example
Reduce (Concatenate) DSL, groupByKey(), reduce() Java 8+ Example Scala Example
SessionWindows DSL, windowed aggregation, sessionization Java 7+ Example
StatesStoresDSL DSL, Processor API, Transformers Java 8+ Example
StreamToStreamJoin DSL, join() between KStream and KStream Java 7+ Example
StreamToTableJoin DSL, join() between KStream and KTable Java 7+ Example Scala Example
Sum DSL, aggregation, stateful, reduce() Java 8+ Example
TableToTableJoin DSL, join() between KTable and KTable Java 7+ Example
UserCountsPerRegion DSL, aggregation, stateful, count() Java 8+ Example
ValidateStateWithInteractiveQueries Interactive Queries for validating state Java 8+ Example
GenericAvro Working with data in Generic Avro format Java 7+ Example Scala Example
SpecificAvro Working with data in Specific Avro format Java 7+ Example Scala Example

***demonstrates how to probabilistically count items in an input stream by implementing a custom state store (CMSStore) that is backed by a Count-Min Sketch data structure (with the CMS implementation of Twitter Algebird)

This containerized example launches:

The Kafka Music application demonstrates how to build of a simple music charts application that continuously computes, in real-time, the latest charts such as latest Top 5 songs per music genre. It exposes its latest processing results -- the latest charts -- via Kafka’s Interactive Queries feature via a REST API. The application's input data is in Avro format, hence the use of Confluent Schema Registry, and comes from two sources: a stream of play events (think: "song X was played") and a stream of song metadata ("song X was written by artist Y").

You can find detailed documentation at https://docs.confluent.io/current/streams/kafka-streams-examples/docs/index.html.

For additional examples that showcase Kafka Streams applications within an event streaming platform, please refer to the examples GitHub repository.

The code in this repository requires Apache Kafka 0.10+ because from this point onwards Kafka includes its Kafka Streams library. See Version Compatibility Matrix for further details, as different branches of this repository may have different Kafka requirements.

For the master branch: To build a development version, you typically need the latest trunk version of Apache Kafka (cf. kafka.version in pom.xml for details). The following instructions will build and locally install the latest trunk Kafka version:

$ git clone [email protected]:apache/kafka.git
$ cd kafka
$ git checkout trunk

# Now build and install Kafka locally
$ ./gradlew clean && ./gradlewAll install

The code in this repository requires Confluent Schema Registry. See Version Compatibility Matrix for further details, as different branches of this repository have different Confluent Platform requirements.

For the master branch: To build a development version, you typically need the latest master version of Confluent Platform's Schema Registry (cf. confluent.version in pom.xml, which is set by the upstream Confluent Common project). The following instructions will build and locally install the latest master Schema Registry version, which includes building its dependencies such as Confluent Common and Confluent Rest Utils. Please read the Schema Registry README for details.

$ git clone https://github.com/confluentinc/common.git
$ cd common
$ git checkout master

# Build and install common locally
$ mvn -DskipTests=true clean install

$ git clone https://github.com/confluentinc/rest-utils.git
$ cd rest-utils
$ git checkout master

# Build and install rest-utils locally
$ mvn -DskipTests=true clean install

$ git clone https://github.com/confluentinc/schema-registry.git
$ cd schema-registry
$ git checkout master

# Now build and install schema-registry locally
$ mvn -DskipTests=true clean install

Also, each example states its exact requirements at the very top.

If you are using an IDE and import the project you might end up with a "missing import / class not found" error. Some Avro classes are generated from schema files and IDEs sometimes do not generate these classes automatically. To resolve this error, manually run:

$ mvn -Dskip.tests=true compile

If you are using Eclipse, you can also right-click on pom.xml file and choose Run As > Maven generate-sources.

Some code examples require Java 8+, primarily because of the usage of lambda expressions.

IntelliJ IDEA users:

  • Open File > Project structure
  • Select "Project" on the left.
    • Set "Project SDK" to Java 1.8.
    • Set "Project language level" to "8 - Lambdas, type annotations, etc."

Scala is required only for the Scala examples in this repository. If you are a Java developer you can safely ignore this section.

If you want to experiment with the Scala examples in this repository, you need a version of Scala that supports Java 8 and SAM / Java lambda (e.g. Scala 2.11 with -Xexperimental compiler flag, or 2.12).

If you are compiling with Java 9+, you'll need to have Scala version 2.12+ to be compatible with the Java version.

The instructions in this section are only needed if you want to interactively test-drive the application examples under src/main/.

Tip: If you only want to run the integration tests (mvn test), then you do not need to package or install anything -- just run mvn test. These tests launch embedded Kafka clusters.

The first step is to install and run a Kafka cluster, which must consist of at least one Kafka broker as well as at least one ZooKeeper instance. Some examples may also require a running instance of Confluent schema registry. The Confluent Platform Quickstart guide provides the full details.

In a nutshell:

# Ensure you have downloaded and installed Confluent Platform as per the Quickstart instructions above.

# Start ZooKeeper
$ ./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties

# In a separate terminal, start Kafka broker
$ ./bin/kafka-server-start ./etc/kafka/server.properties

# In a separate terminal, start Confluent Schema Registry
$ ./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties

# Again, please refer to the Confluent Platform Quickstart for details such as
# how to download Confluent Platform, how to stop the above three services, etc.

The next step is to create a standalone jar ("fat jar") of the application examples:

# Create a standalone jar ("fat jar")
$ mvn clean package

# >>> Creates target/kafka-streams-examples-7.8.0-0-standalone.jar

Tip: If needed, you can disable the test suite during packaging, for example to speed up the packaging or to lower JVM memory usage:

$ mvn -DskipTests=true clean package

You can now run the application examples as follows:

# Run an example application from the standalone jar. Here: `WordCountLambdaExample`
$ java -cp target/kafka-streams-examples-7.8.0-0-standalone.jar \
  io.confluent.examples.streams.WordCountLambdaExample

The application will try to read from the specified input topic (in the above example it is streams-plaintext-input), execute the processing logic, and then try to write back to the specified output topic (in the above example it is streams-wordcount-output). In order to observe the expected output stream, you will need to start a console producer to send messages into the input topic and start a console consumer to continuously read from the output topic. More details in how to run the examples can be found in the java docs of each example code.

If you want to turn on log4j while running your example application, you can edit the log4j.properties file and then execute as follows:

# Run an example application from the standalone jar. Here: `WordCountLambdaExample`
$ java -cp target/kafka-streams-examples-7.8.0-0-standalone.jar \
  -Dlog4j.configuration=file:src/main/resources/log4j.properties \
  io.confluent.examples.streams.WordCountLambdaExample

Keep in mind that the machine on which you run the command above must have access to the Kafka/ZooKeeper clusters you configured in the code examples. By default, the code examples assume the Kafka cluster is accessible via localhost:9092 (aka Kafka's bootstrap.servers parameter) and the ZooKeeper ensemble via localhost:2181. You can override the default bootstrap.servers parameter through a command line argument.

This project uses the standard maven lifecycle and commands such as:

$ mvn compile # This also generates Java classes from the Avro schemas
$ mvn test    # Runs unit and integration tests
$ mvn package # Packages the application examples into a standalone jar
Branch (this repo) Confluent Platform Apache Kafka
5.4.x* 5.4.0-SNAPSHOT 2.4.0-SNAPSHOT
5.3.0-post 5.3.0 2.3.0
5.2.2-post 5.2.2 2.2.1
5.2.1-post 5.2.1 2.2.1
5.1.0-post 5.1.0 2.1.0
5.0.0-post 5.0.0 2.0.0
4.1.0-post 4.1.0 1.1.0
4.0.0-post 4.0.0 1.0.0
3.3.0-post 3.3.0 0.11.0

*You must manually build the 2.3 version of Apache Kafka and the 5.3.x version of Confluent Platform. See instructions above.

The master branch of this repository represents active development, and may require additional steps on your side to make it compile. Check this README as well as pom.xml for any such information.

License

Usage of this image is subject to the license terms of the software contained within. Please refer to Confluent's Docker images documentation reference for further information. The software to extend and build the custom Docker images is available under the Apache 2.0 License.

More Repositories

1

librdkafka

The Apache Kafka C/C++ library
C
7,234
star
2

ksql

The database purpose-built for stream processing applications.
Java
5,533
star
3

confluent-kafka-go

Confluent's Apache Kafka Golang client
Go
4,402
star
4

confluent-kafka-python

Confluent's Kafka Python Client
Python
3,388
star
5

confluent-kafka-dotnet

Confluent's Apache Kafka .NET client
C#
2,560
star
6

kafka-rest

Confluent REST Proxy for Kafka
Java
2,137
star
7

schema-registry

Confluent Schema Registry for Kafka
Java
2,022
star
8

examples

Apache Kafka and Confluent Platform examples and demos
Shell
1,853
star
9

bottledwater-pg

Change data capture from PostgreSQL into Kafka
C
1,521
star
10

demo-scene

👾Scripts and samples to support Confluent Demos and Talks. ⚠️Might be rough around the edges ;-) 👉For automated tutorials and QA'd code, see https://github.com/confluentinc/examples/
Shell
1,356
star
11

cp-docker-images

[DEPRECATED] Docker images for Confluent Platform.
Python
1,140
star
12

kafka-connect-jdbc

Kafka Connect connector for JDBC-compatible databases
Java
953
star
13

cp-all-in-one

docker-compose.yml files for cp-all-in-one , cp-all-in-one-community, cp-all-in-one-cloud, Apache Kafka Confluent Platform
Python
882
star
14

cp-helm-charts

The Confluent Platform Helm charts enable you to deploy Confluent Platform services on Kubernetes for development, test, and proof of concept environments.
Mustache
764
star
15

kafka-connect-elasticsearch

Kafka Connect Elasticsearch connector
Java
715
star
16

parallel-consumer

Parallel Apache Kafka client wrapper with per message ACK, client side queueing, a simpler consumer/producer API with key concurrency and extendable non-blocking IO processing.
Java
654
star
17

cp-demo

Confluent Platform Demo including Apache Kafka, ksqlDB, Control Center, Schema Registry, Security, Schema Linking, and Cluster Linking
Shell
528
star
18

cp-ansible

Ansible playbooks for the Confluent Platform
Jinja
469
star
19

kafka-connect-hdfs

Kafka Connect HDFS connector
Java
465
star
20

kafka-tutorials

Tutorials and Recipes for Apache Kafka
Java
296
star
21

kafka-images

Confluent Docker images for Apache Kafka
Python
295
star
22

ducktape

System integration and performance tests
Python
294
star
23

kafka-connect-storage-cloud

Kafka Connect suite of connectors for Cloud storage (Amazon S3)
Java
260
star
24

jmx-monitoring-stacks

📊 Monitoring examples for Confluent Cloud and Confluent Platform
Shell
236
star
25

confluent-kubernetes-examples

Example scenario workflows for Confluent for Kubernetes
Shell
147
star
26

kafka-rest-node

Node.js client for the Kafka REST proxy
JavaScript
146
star
27

confluent-platform-security-tools

Security tools for the Confluent Platform.
Shell
146
star
28

kafka-connect-datagen

Connector that generates data for demos
Java
143
star
29

docker-images

DEPRECATED - Dockerfiles for Confluent Stream Data Platform
Shell
116
star
30

rest-utils

Utilities and a small framework for building REST services with Jersey, Jackson, and Jetty.
Java
111
star
31

terraform-provider-confluent

Terraform Provider for Confluent
Go
110
star
32

cli

CLI for Confluent Cloud and Confluent Platform
Go
103
star
33

confluent-sigma

JavaScript
90
star
34

openmessaging-benchmark

Java
89
star
35

camus

Mirror of Linkedin's Camus
Java
88
star
36

common

Common utilities library containing metrics, config and utils
Java
85
star
37

streaming-ops

Simulated production environment running Kubernetes targeting Apache Kafka and Confluent components on Confluent Cloud. Managed by declarative infrastructure and GitOps.
Shell
84
star
38

learn-kafka-courses

Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.
Shell
76
star
39

kafka-workshop

JavaScript
75
star
40

kafka-connect-storage-common

Shared software among connectors that target distributed filesystems and cloud storage.
Java
71
star
41

training-developer-src

Source Code accompanying the Confluent Kafka for Developers course
Java
70
star
42

ccloud-tools

Running Tools from Confluent Platform along with your Confluent Cloud™ Cluster
HCL
67
star
43

confluent-kafka-javascript

Confluent's Apache Kafka JavaScript client
JavaScript
67
star
44

bincover

Easily measure code coverage of Golang binaries
Go
62
star
45

libserdes

Avro Serialization/Deserialization C/C++ library with Confluent schema-registry support
C
62
star
46

kafka-connect-blog

Demo for Kafka Connect with JDBC and HDFS Connectors
Shell
59
star
47

data-mesh-demo

A Data Mesh proof-of-concept built on Confluent Cloud
Java
59
star
48

confluent-cli

Confluent Platform CLI
Shell
58
star
49

ksqldb-graphql

Node.js GraphQL integration for ksqlDB
TypeScript
56
star
50

terraform-provider-confluentcloud

Confluent Cloud Terraform Provider is deprecated in favor of Confluent Terraform Provider
Go
52
star
51

commercial-workshops

Confluent Commercial SE Team's Demo and Workshop Repository
Python
51
star
52

confluent-hybrid-cloud-workshop

Confluent Hybrid Cloud Workshop
HCL
42
star
53

qcon-microservices

Example online orders app composed of event-driven microservices. Built for QCon workshop.
Java
38
star
54

securing-kafka-blog

Secure Kafka cluster (in a VM) for development and testing
Puppet
38
star
55

operator-earlyaccess

Confluent Operator Early Access docs
37
star
56

training-administration-src

Contains docker-compose file needed for Apache Kafka Administration by Confluent training
HTML
36
star
57

mox

A hybrid mock and proxy server - easily programmable and runs on express
JavaScript
35
star
58

terraform-state-s3

Terraform module to create the S3/DynamoDB backend to store the Terraform state+lock
HCL
34
star
59

common-docker

Confluent Commons with support for building and testing Docker images.
Java
34
star
60

ksql-recipes-try-it-at-home

Files needed to try out KSQL Recipes for yourself
Shell
34
star
61

demo-database-modernization

This demo shows how to stream data to cloud databases with Confluent. It includes fully-managed connectors (Oracle CDC, RabbitMQ, MongoDB Atlas), ksqlDB/Flink SQL as stream processing engine.
HCL
32
star
62

ccloud-connectivity

Setup and testing connectivity to Confluent Cloud
Shell
31
star
63

training-ksql-and-streams-src

Sample solutions for the exercises of the course KSQL & Kafka Streams
Java
30
star
64

pmm

Java
30
star
65

DotNetStreamProcessing

This repository explores stream processing with the Confluent .NET Producer, Consumer and the TPL library
C#
30
star
66

schema-registry-images

Docker Images for Schema Registry
Python
29
star
67

confluent-docker-utils

Common Python utils for testing Confluent's Docker images
Python
28
star
68

ksql-images

KSQL platform docker images
Shell
27
star
69

live-labs

This repository contains demos created by Technical Marketing team and delivered as part of Confluent Live Labs program. The team members will continuously publish more demos over time, meanwhile we encourage you to create a Confluent Cloud account and try the existing content.
HTML
26
star
70

proto-go-setter

Go
23
star
71

stream-me-up-scotty

A wide range of Digital Assets from Confluent's Solution Engineering team for Confluent Cloud
22
star
72

online-inferencing-blog-application

Source code and application accompanying the online inferencing blog
Java
21
star
73

coding-in-motion

Source code for the "Coding in Motion" series.
Nix
21
star
74

training-fundamentals-src

Source code accompanying the course "Apache Kafka Technical Essentials"
Shell
19
star
75

infoq-kafka-ksql

Code samples to go with InfoQ article
Shell
17
star
76

kafka-rest-images

Docker Images for Kafka REST
Python
17
star
77

kafka-mqtt-images

Confluent Docker images for Kafka MQTT
Shell
16
star
78

training-cao-src

Source code accompanying the course "Monitoring, Troubleshooting and Tuning"
Java
13
star
79

event-streaming-patterns

A collection of Event Streaming Patterns, including problem statements, solutions, and implementation examples.
HTML
13
star
80

learn-building-flink-applications-in-java-exercises

Java
13
star
81

demo-realtime-data-warehousing

Streaming data pipelines for real-time data warehousing. Includes fully managed connectors (PostgreSQL CDC, Snowflake).
HCL
13
star
82

learn-apache-flink-101-exercises

Dockerfile
12
star
83

kibosh

C
12
star
84

ksql-workshop

KSQL Workshop
11
star
85

control-center-images

Docker images for enterprise control center images
Python
11
star
86

kafka-connect-http-demo

A demo target for running the Confluent HTTP sink connector
Java
11
star
87

castle

Castle is a test harness for Apache Kafka, Trogdor, and related projects.
Java
11
star
88

kafkacat-images

Docker Images for Kafkacat
10
star
89

ksqldb-recipes

Makefile
10
star
90

demo-stream-designer

Current 2022 Confluent Keynote Demo covering Stream Designer, Stream Catalog, and Stream Sharing.
Python
10
star
91

confluent-kafka-go-dev

[EXPERIMENTAL] Development / WIP / exploratory / test fork of confluent-kafka-go
Go
10
star
92

apac-workshops

Pull Requests for GitHub repository settings
Jupyter Notebook
10
star
93

cfk-workshop

Java
10
star
94

policy-library-confluent-terraform

HCL
10
star
95

csid-secrets-providers

Enables use of external third-party systems for storing/retrieving key/value pairs with Confluent clusters.
Java
10
star
96

demo-change-data-capture

This demo shows how to capture data changes from relational databases (Oracle and PostgreSQL) and stream them to Confluent Cloud, use ksqlDB for real-time stream processing, send enriched data to cloud data warehouses (Snowflake and Amazon Redshift).
HCL
10
star
97

learn-kafka-kraft

KRaft mode playground
Shell
9
star
98

demo-application-modernization

Application modernization example including Confluent Cloud, ksqlDB, Postgres, and Elasticsearch.
JavaScript
9
star
99

hackathons

Contains skeleton projects for hackathons.
Python
8
star
100

ksql-elasticsearch-demo

TSQL
8
star