• Stars
    star
    1,521
  • Rank 30,543 (Top 0.7 %)
  • Language
    C
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Change data capture from PostgreSQL into Kafka

NOTE: Bottled Water is unmaintained

Please note that Bottled Water is no longer being actively developed.

Bottled Water pioneered change data capture from PostgreSQL into Kafka using the logical decoding API, but now other projects have adopted the technique and are continuing to develop it. Please see the following resources for supported solutions for CDC from PostgreSQL into Kafka:

The remaining README is kept here for historical interest.

Bottled Water for PostgreSQL

How do you export water from your country? Well, you first go to your reservoir, pump out all the water, and fill it into bottles. You then go to the streams of clear mountain water flowing into the reservoir, and tap into them, filling the fresh water into bottles as it arrives. Then you ship those bottles all around the world.

How do you export data from your database? Well, you first take a consistent snapshot of your entire database, and encode it in a language-independent format. You then look at the stream of transactions writing to your database, and parse the transaction log, encoding the inserts/updates/deletes into the same language-independent format as they happen. Then you take that data and ship it to your other systems: build search indexes, update caches, load it into a data warehouse, calculate analytics, monitor it for fraud, and so on.

How it works

Bottled Water uses the logical decoding feature (introduced in PostgreSQL 9.4) to extract a consistent snapshot and a continuous stream of change events from a database. The data is extracted at a row level, and encoded using Avro. A client program connects to your database, extracts this data, and relays it to Kafka (you could also integrate it with other systems if you wish, but Kafka is pretty awesome).

Key features of Bottled Water are:

  • Works with any PostgreSQL database (version 9.4 or later). There are no restrictions on your database schema.
  • No schema changes are required, no triggers or additional tables. (However, you do need to be able to install a PostgreSQL extension on the database server. More on this below.)
  • Negligible impact on database performance.
  • Transactionally consistent output. That means: writes appear only when they are committed to the database (writes by aborted transactions are discarded), writes appear in the same order as they were committed (no race conditions).
  • Fault-tolerant: does not lose data, even if processes crash, machines die, the network is interrupted, etc.

Quickstart

There are several possible ways of installing and trying Bottled Water:

Running in Docker

The easiest way to try Bottled Water is to use the Docker images we have prepared. You need at least 2GB of memory to run this demo, so if you're running inside a virtual machine (such as Boot2docker on a Mac), please check that it is big enough.

First, install:

  • Docker, which is used to run the individual services/containers, and
  • docker-compose, which is used to orchestrate the interaction between services.

After the prerequisite applications are installed, you need to build the Docker containers for Bottled Water and Postgres:

$ make docker-compose

Once the build process finishes, set up some required environment variables, then start up Postgres, Kafka and the Confluent schema registry by running docker-compose as follows:

$ export KAFKA_ADVERTISED_HOST_NAME=$(docker run --rm debian:jessie ip route | awk '/^default via / { print $3 }') \
         KAFKA_LOG_CLEANUP_POLICY=compact \
         KAFKA_AUTO_CREATE_TOPICS_ENABLE=true
$ docker-compose up -d kafka schema-registry postgres

The postgres-bw image extends the official Postgres docker image and adds Bottled Water support. However, before Bottled Water can be used, it first needs to be enabled. To do this, start a psql shell for the Postgres database:

$ docker-compose run --rm psql

When the prompt appears, enable the bottledwater extension, and create a database with some test data, for example:

create extension bottledwater;
create table test (id serial primary key, value text);
insert into test (value) values('hello world!');

You can keep the psql terminal open, and run the following in a new terminal.

The next step is to start the Bottled Water client, which relays data from Postgres to Kafka. You start it like this:

$ docker-compose up -d bottledwater-avro

You can run docker-compose logs bottledwater-avro to see what it's doing. Now Bottled Water has taken the snapshot, and continues to watch Postgres for any data changes. You can see the data that has been extracted from Postgres by consuming from Kafka (the topic name test must match up with the name of the table you created earlier):

$ docker-compose run --rm kafka-avro-console-consumer \
    --from-beginning --property print.key=true --topic test

This should print out the contents of the test table in JSON format (key/value separated by tab). Now go back to the psql terminal, and change some data — insert, update or delete some rows in the test table. You should see the changes swiftly appear in the Kafka consumer terminal.

When you're done testing, you can destroy the cluster and it's associated data volumes with:

$ docker-compose stop
$ docker-compose rm -vf

Building from source

To compile Bottled Water is just a matter of:

make && make install

For that to work, you need the following dependencies installed:

  • PostgreSQL 9.5 development libraries (PGXS and libpq). (Homebrew: brew install postgresql; Ubuntu: sudo apt-get install postgresql-server-dev-9.5 libpq-dev)
  • libsnappy, a dependency of Avro. (Homebrew: brew install snappy; Ubuntu: sudo apt-get install libsnappy-dev)
  • avro-c (1.8.0 or later), the C implementation of Avro. (Homebrew: brew install avro-c; others: build from source)
  • Jansson, a JSON parser. (Homebrew: brew install jansson; Ubuntu: sudo apt-get install libjansson-dev)
  • libcurl, a HTTP client. (Homebrew: brew install curl; Ubuntu: sudo apt-get install libcurl4-openssl-dev)
  • librdkafka (0.9.1 or later), a Kafka client. (Ubuntu universe: sudo apt-get install librdkafka-dev, but see known gotchas; others: build from source)

You can see the Dockerfile for building the quickstart images as an example of building Bottled Water and its dependencies on Debian.

If you get errors about Package libsnappy was not found in the pkg-config search path, and you have Snappy installed, you may need to create /usr/local/lib/pkgconfig/libsnappy.pc with contents something like the following (be sure to check which version of libsnappy is installed in your system):

Name: libsnappy
Description: Snappy is a compression library
Version: 1.1.2
URL: https://google.github.io/snappy/
Libs: -L/usr/local/lib -lsnappy
Cflags: -I/usr/local/include

Configuration

The make install command above installs an extension into the Postgres installation on your machine, which does all the work of encoding change data into Avro. There's then a separate client program which connects to Postgres, fetches the data, and pushes it to Kafka.

To configure Bottled Water, you need to set the following in postgresql.conf: (If you're using Homebrew, you can probably find it in /usr/local/var/postgres. On Linux, it's probably in /etc/postgres.)

wal_level = logical
max_wal_senders = 8
wal_keep_segments = 4
max_replication_slots = 4

You'll also need to give yourself the replication privileges for the database. You can do this by adding the following to pg_hba.conf (in the same directory, replacing <user> with your login username):

local   replication     <user>                 trust
host    replication     <user>  127.0.0.1/32   trust
host    replication     <user>  ::1/128        trust

Restart Postgres for the changes to take effect. Next, enable the Postgres extension that make install installed previously. Start psql -h localhost and run:

create extension bottledwater;

That should be all the setup on the Postgres side. Next, make sure you're running Kafka and the Confluent schema registry, for example by following the quickstart.

Assuming that everything is running on the default ports on localhost, you can start Bottled Water as follows:

./kafka/bottledwater --postgres=postgres://localhost

The first time this runs, it will create a replication slot called bottledwater, take a consistent snapshot of your database, and send it to Kafka. (You can change the name of the replication slot with the --slot command line flag.) When the snapshot is complete, it switches to consuming the replication stream.

If the slot already exists, the tool assumes that no snapshot is needed, and simply resumes the replication stream where it last left off.

In some scenarios, if you only care about streaming ongoing changes (and not replicating the existing database contents into Kafka), you may want to skip the snapshot - e.g. to avoid the performance overhead of taking the snapshot, or because you are repointing Bottled Water at a newly promoted replica. In that case, you can pass --skip-snapshot at the command line. (This option is ignored if the replication slot already exists.)

When you no longer want to run Bottled Water, you have to drop its replication slot (otherwise you'll eventually run out of disk space, as the open replication slot prevents the WAL from getting garbage-collected). You can do this by opening psql again and running:

select pg_drop_replication_slot('bottledwater');

Error handling

If Bottled Water encounters an error - such as failure to communicate with Kafka or the Schema Registry - its default behaviour is for the client to exit, halting the flow of data into Kafka. This may seem like an odd default, but since Postgres will retain and replay the logical replication stream until Bottled Water acknowledges it, it ensures that:

  • it will never miss an update (every update made in Postgres will eventually be written to Kafka) - provided that whatever caused the error is resolved externally (e.g. restoring connectivity to Kafka) and the Bottled Water client is then restarted;

  • it will never write corrupted data to Kafka: e.g. if unable to obtain a schema id from the Schema Registry for the current update, rather than writing to Kafka without a schema id (which would leave consumers unable to parse the update), it will wait until the problem is resolved.

However, in some scenarios, exiting on the first error may not be desirable:

  • if there is an error publishing for one table (e.g. if Kafka is configured not to autocreate topics and the corresponding topic has not been explicitly created), you may not want to halt updates for all other tables.

  • if the reason for the error cannot be resolved quickly (e.g. Kafka misconfiguration or prolonged outage), Bottled Water may threaten the stability of the Postgres server. This is because Postgres will store WAL on disk for all updates made since Bottled Water last acknowledged (i.e. successfully published) an update. If Postgres has a high write throughput, Bottled Water being unavailable may cause the disk on the Postgres server to fill up, likely causing Postgres to crash.

To support these scenarios, Bottled Water supports an alternative error handling policy where it will simply log that the error occurred and drop the update it was attempting to process, acknowledging the update so that Postgres can stop retaining WAL. This policy can be enabled via the --on-error command-line switch. N.B. that in this mode Bottled Water can no longer guarantee to never miss an update.

Consuming data

Bottled Water creates one Kafka topic per database table, with the same name as the table. The messages in the topic use the table's primary key (or replica identity index, if set) as key, and the entire table row as value. With inserts and updates, the message value is the new contents of the row. With deletes, the message value is null, which allows Kafka's log compaction to garbage-collect deleted values.

If a table doesn't have a primary key or replica identity index, Bottled Water will complain and refuse to start. You can override this with the --allow-unkeyed option. Any inserts and updates to tables without primary key or replica identity will be sent to Kafka as messages without a key. Deletes to such tables are not sent to Kafka.

Messages are written to Kafka by default in a binary Avro encoding, which is efficient, but not human-readable. To view the contents of a Kafka topic, you can use the Avro console consumer:

./bin/kafka-avro-console-consumer --topic test --zookeeper localhost:2181 \
    --property print.key=true

Output formats

Bottled Water currently supports writing messages to Kafka in one of two output formats: Avro, or JSON. The output format is configured via the --output-format command-line switch.

Avro is recommended for large scale use, since it uses a much more efficient binary encoding for messages, defines rules for schema evolution, and is able to faithfully represent a wide range of column types. Avro output requires an instance of the Confluent Schema Registry to be running, and consumers will need to query the schema registry in order to decode messages.

JSON is ideal for evaluation and prototyping, or integration with languages without good Avro library support. JSON is human readable, and widely supported among programming languages. JSON output does not require a schema registry.

Topic names

For each table being streamed, Bottled Water publishes messages to a corresponding Kafka topic. The naming convention for topics is [topic_prefix].[postgres_schema_name].table_name:

  • table_name is the name of the table in Postgres.
  • postgres_schema_name is the name of the Postgres schema the table belongs to; this is omitted if the schema is "public" (the default schema under the default Postgres configuration). N.B. this requires the avro-c library to be at least version 0.8.0.
  • topic_prefix is omitted by default, but may be configured via the --topic-prefix command-line option. A prefix is useful: * to prevent name collisions with other topics, if the Kafka broker is also being used for other purposes besides Bottled Water. * if you want to stream several databases into the same broker, using a separate Bottled Water instance with a different prefix for each database. * to make it easier for a Kafka consumer to consume updates from all Postgres tables, by using a topic regex that matches the prefix.

For example:

  • with no prefix configured, a table named "users" in the public (default) schema would be streamed to a topic named "users".
  • with --topic-prefix=bottledwater, a table named "transactions" in the "point-of-sale" schema would be streamed to a topic named "bottledwater.point-of-sale.transactions".

(Support for namespaces in Kafka has been proposed that would replace this sort of ad-hoc prefixing, but it's still under discussion.)

Known gotchas with older dependencies

It is recommended to compile Bottled Water against the versions of librdkafka and avro-c specified above. However, Bottled Water may work with older versions, with degraded functionality.

At time of writing, the librdkafka-dev packages in the official Ubuntu repositories (for all releases up to 15.10) contain a release prior to 0.8.6. This means if you are building on Ubuntu, building librdkafka from source is recommended, until an updated librdkafka package is available.

librdkafka < 0.8.6: empty messages sent for deletes

As noted above, Bottled Water sends a null message to represent a row deletion. librdkafka only added support for null messages in release 0.8.6. If Bottled Water is compiled against a version of librdkafka prior to 0.8.6, deletes will instead be represented by empty messages, i.e. a message whose payload is an empty byte sequence. This means that Kafka will not garbage-collect deleted values on log compaction, and also may confuse consumers that expect all non-null message payloads to begin with a header.

librdkafka < 0.9.0: messages partitioned randomly

librdkafka 0.9.0+ provides a "consistent partitioner", which assigns messages to partitions based on the hash of the key. Bottled Water takes advantage of this to ensure that all inserts, updates and deletes for a given key get sent to the same partition.

If Bottled Water is compiled against a version of librdkafka prior to 0.9.0, messages will instead be assigned randomly to partitions. If the topic corresponding to a given table has more than one partition, this will lead to incorrect log compaction behaviour (e.g. if the initial insert for row 42 goes to partition 0, then a subsequent delete for row 42 goes to partition 1, then log compaction will be unable to garbage-collect the insert). It will also break any consumer relying on seeing all updates relating to a given key (e.g. for a stream-table join).

avro-c < 0.8.0: schema omitted from topic name

Bottled Water encodes the Postgres schema to which tables belong in the Avro schema namespace. Support for accessing the schema namespace was added to the Avro C library in version 0.8.0, so prior releases do not have access to this information.

If Bottled Water is compiled against a version of avro-c prior to 0.8.0, the schema will be omitted from the Kafka topic name. This means that tables with the same names in different schemas will have changes streamed to the same topic.

Command-line options

This serves as a reference for the various command-line options accepted by the Bottled Water client, annotated with links to the relevant areas of documentation. If this disagrees with the output of bottledwater --help, then --help is correct (and please file a pull request to update this reference!).

  • -d, --postgres=postgres://user:pass@host:port/dbname (required): Connection string or URI of the PostgreSQL server.

  • -s, --slot=slotname (default: bottledwater): Name of replication slot. The slot is automatically created on first use.

  • -b, --broker=host1[:port1],host2[:port2]... (default: localhost:9092): Comma-separated list of Kafka broker hosts/ports.

  • -r, --schema-registry=http://hostname:port (default: http://localhost:8081): URL of the service where Avro schemas are registered. (Used only for --output-format=avro. Omit when --output-format=json.)

  • -f, --output-format=[avro|json] (default: avro): How to encode the messages for writing to Kafka. See discussion of output formats.

  • -u, --allow-unkeyed: Allow export of tables that don't have a primary key. This is disallowed by default, because updates and deletes need a primary key to identify their row.

  • -p, --topic-prefix=prefix: String to prepend to all topic names. e.g. with --topic-prefix=postgres, updates from table "users" will be written to topic "postgres.users".

  • -e, --on-error=[log|exit] (default: exit): What to do in case of a transient error, such as failure to publish to Kafka. See discussion of error handling.

  • -x, --skip-snapshot: Skip taking a consistent snapshot of the existing database contents and just start streaming any new updates. (Ignored if the replication slot already exists.)

  • -C, --kafka-config property=value: Set global configuration property for Kafka producer (see librdkafka docs).

  • -T, --topic-config property=value: Set topic configuration property for Kafka producer (see librdkafka docs).

  • --config-help: Print the list of Kafka configuration properties.

  • -h, --help: Print this help text.

Developing

If you want to work on the Bottled Water codebase, the Docker setup is a good place to start.

Bottled Water ships with a test suite that verifies basic functionality, documents supported Postgres types and tests message publishing semantics. The test suite also relies on Docker and Docker Compose. To run it:

  1. Install Docker and Docker Compose (see Docker setup)
  2. Install Ruby 2.2.4 (see ruby-lang.org) (required to run the tests)
  3. Install Bundler: gem install bundler
  4. Build the Docker images: make docker-compose
  5. Run the tests: make test

If submitting a pull request, particularly one that adds new functionality, it is highly encouraged to include tests that exercise the changed code!

Status

Bottled Water has been tested on a variety of use cases and Postgres schemas, and is believed to be fairly stable. In particular, because of its design, it is unlikely to corrupt the data in Postgres. However, it has not yet been run on large production databases, or for long periods of time, so proceed with caution if you intend to use it in production. See this discussion about production readiness, and Github issues for a list of known issues.

Bug reports and pull requests welcome.

Note that Bottled Water has nothing to do with Sparkling Water, a machine learning engine for Spark.

License

Copyright 2015 Confluent, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License in the enclosed file called LICENSE.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

See CONTRIBUTORS.md for a list of contributors.

More Repositories

1

librdkafka

The Apache Kafka C/C++ library
C
7,353
star
2

ksql

The database purpose-built for stream processing applications.
Java
5,533
star
3

confluent-kafka-go

Confluent's Apache Kafka Golang client
Go
4,402
star
4

confluent-kafka-python

Confluent's Kafka Python Client
Python
3,388
star
5

confluent-kafka-dotnet

Confluent's Apache Kafka .NET client
C#
2,560
star
6

kafka-streams-examples

Demo applications and code examples for Apache Kafka's Streams API.
Java
2,169
star
7

kafka-rest

Confluent REST Proxy for Kafka
Java
2,137
star
8

schema-registry

Confluent Schema Registry for Kafka
Java
2,022
star
9

examples

Apache Kafka and Confluent Platform examples and demos
Shell
1,878
star
10

demo-scene

👾Scripts and samples to support Confluent Demos and Talks. ⚠️Might be rough around the edges ;-) 👉For automated tutorials and QA'd code, see https://github.com/confluentinc/examples/
Shell
1,356
star
11

cp-docker-images

[DEPRECATED] Docker images for Confluent Platform.
Python
1,143
star
12

kafka-connect-jdbc

Kafka Connect connector for JDBC-compatible databases
Java
953
star
13

cp-all-in-one

docker-compose.yml files for cp-all-in-one , cp-all-in-one-community, cp-all-in-one-cloud, Apache Kafka Confluent Platform
Python
889
star
14

cp-helm-charts

The Confluent Platform Helm charts enable you to deploy Confluent Platform services on Kubernetes for development, test, and proof of concept environments.
Mustache
764
star
15

kafka-connect-elasticsearch

Kafka Connect Elasticsearch connector
Java
715
star
16

parallel-consumer

Parallel Apache Kafka client wrapper with per message ACK, client side queueing, a simpler consumer/producer API with key concurrency and extendable non-blocking IO processing.
Java
654
star
17

kafka-connect-hdfs

Kafka Connect HDFS connector
Java
465
star
18

kafka-tutorials

Tutorials and Recipes for Apache Kafka
Java
302
star
19

kafka-images

Confluent Docker images for Apache Kafka
Python
295
star
20

ducktape

System integration and performance tests
Python
294
star
21

kafka-rest-node

Node.js client for the Kafka REST proxy
JavaScript
146
star
22

confluent-platform-security-tools

Security tools for the Confluent Platform.
Shell
146
star
23

kafka-connect-datagen

Connector that generates data for demos
Java
143
star
24

docker-images

DEPRECATED - Dockerfiles for Confluent Stream Data Platform
Shell
116
star
25

rest-utils

Utilities and a small framework for building REST services with Jersey, Jackson, and Jetty.
Java
111
star
26

cli

CLI for Confluent Cloud and Confluent Platform
Go
103
star
27

openmessaging-benchmark

Java
89
star
28

camus

Mirror of Linkedin's Camus
Java
88
star
29

common

Common utilities library containing metrics, config and utils
Java
85
star
30

kafka-workshop

JavaScript
75
star
31

training-developer-src

Source Code accompanying the Confluent Kafka for Developers course
Java
70
star
32

ccloud-tools

Running Tools from Confluent Platform along with your Confluent Cloud™ Cluster
HCL
67
star
33

bincover

Easily measure code coverage of Golang binaries
Go
62
star
34

libserdes

Avro Serialization/Deserialization C/C++ library with Confluent schema-registry support
C
62
star
35

kafka-connect-blog

Demo for Kafka Connect with JDBC and HDFS Connectors
Shell
59
star
36

confluent-cli

Confluent Platform CLI
Shell
58
star
37

ksqldb-graphql

Node.js GraphQL integration for ksqlDB
TypeScript
56
star
38

terraform-provider-confluentcloud

Confluent Cloud Terraform Provider is deprecated in favor of Confluent Terraform Provider
Go
52
star
39

confluent-sigma

JavaScript
50
star
40

jmx-monitoring-stacks

📊 Monitoring examples for Confluent Cloud and Confluent Platform
C#
44
star
41

qcon-microservices

Example online orders app composed of event-driven microservices. Built for QCon workshop.
Java
38
star
42

securing-kafka-blog

Secure Kafka cluster (in a VM) for development and testing
Puppet
38
star
43

training-administration-src

Contains docker-compose file needed for Apache Kafka Administration by Confluent training
HTML
36
star
44

mox

A hybrid mock and proxy server - easily programmable and runs on express
JavaScript
35
star
45

terraform-state-s3

Terraform module to create the S3/DynamoDB backend to store the Terraform state+lock
HCL
34
star
46

common-docker

Confluent Commons with support for building and testing Docker images.
Java
34
star
47

ksql-recipes-try-it-at-home

Files needed to try out KSQL Recipes for yourself
Shell
34
star
48

cp-demo

Confluent Platform Demo including Apache Kafka, ksqlDB, Control Center, Schema Registry, Security, Schema Linking, and Cluster Linking
Shell
31
star
49

confluent-kubernetes-examples

Example scenario workflows for Confluent for Kubernetes
Shell
31
star
50

training-ksql-and-streams-src

Sample solutions for the exercises of the course KSQL & Kafka Streams
Java
30
star
51

schema-registry-images

Docker Images for Schema Registry
Python
29
star
52

confluent-docker-utils

Common Python utils for testing Confluent's Docker images
Python
28
star
53

cp-ansible

Ansible playbooks for the Confluent Platform
Jinja
28
star
54

ksql-images

KSQL platform docker images
Shell
27
star
55

proto-go-setter

Go
23
star
56

coding-in-motion

Source code for the "Coding in Motion" series.
Nix
23
star
57

online-inferencing-blog-application

Source code and application accompanying the online inferencing blog
Java
21
star
58

stream-me-up-scotty

A wide range of Digital Assets from Confluent's Solution Engineering team for Confluent Cloud
21
star
59

terraform-provider-confluent

Terraform Provider for Confluent
Go
21
star
60

training-fundamentals-src

Source code accompanying the course "Apache Kafka Technical Essentials"
Shell
19
star
61

infoq-kafka-ksql

Code samples to go with InfoQ article
Shell
17
star
62

kafka-rest-images

Docker Images for Kafka REST
Python
17
star
63

flink-cookbook

Java
17
star
64

confluent-kafka-javascript

Confluent's Apache Kafka JavaScript client
JavaScript
17
star
65

kafka-mqtt-images

Confluent Docker images for Kafka MQTT
Shell
16
star
66

demo-realtime-data-warehousing

Streaming data pipelines for real-time data warehousing. Includes fully managed connectors (PostgreSQL CDC, Snowflake).
HCL
14
star
67

training-cao-src

Source code accompanying the course "Monitoring, Troubleshooting and Tuning"
Java
13
star
68

event-streaming-patterns

A collection of Event Streaming Patterns, including problem statements, solutions, and implementation examples.
HTML
13
star
69

ccloud-connectivity

Setup and testing connectivity to Confluent Cloud
Shell
13
star
70

learn-building-flink-applications-in-java-exercises

Java
13
star
71

ksqldb-recipes

Makefile
11
star
72

ksql-workshop

KSQL Workshop
11
star
73

control-center-images

Docker images for enterprise control center images
Python
11
star
74

kafka-connect-http-demo

A demo target for running the Confluent HTTP sink connector
Java
11
star
75

castle

Castle is a test harness for Apache Kafka, Trogdor, and related projects.
Java
11
star
76

kafkacat-images

Docker Images for Kafkacat
10
star
77

demo-stream-designer

Current 2022 Confluent Keynote Demo covering Stream Designer, Stream Catalog, and Stream Sharing.
Python
10
star
78

confluent-kafka-go-dev

[EXPERIMENTAL] Development / WIP / exploratory / test fork of confluent-kafka-go
Go
10
star
79

demo-change-data-capture

This demo shows how to capture data changes from relational databases (Oracle and PostgreSQL) and stream them to Confluent Cloud, use ksqlDB for real-time stream processing, send enriched data to cloud data warehouses (Snowflake and Amazon Redshift).
HCL
10
star
80

confluent-hybrid-cloud-workshop

Confluent Hybrid Cloud Workshop
HCL
10
star
81

commercial-workshops

Confluent Commercial SE Team's Demo and Workshop Repository
Python
9
star
82

learn-kafka-courses

Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.
Shell
9
star
83

hackathons

Contains skeleton projects for hackathons.
Python
8
star
84

ksql-elasticsearch-demo

TSQL
8
star
85

learn-practical-event-modeling

Kotlin
8
star
86

strata-tutorials

Content for Spring 2016 Strata tutorials
Java
7
star
87

learn-apache-kafka-for-dotnet-developers-exercises

C#
7
star
88

confluent-oauth-extensions

Java
6
star
89

kafka-replicator-images

Docker images for Kafka Connect
Shell
6
star
90

etl

Code for ETL data pipelines
Python
6
star
91

operator-earlyaccess

Confluent Operator Early Access docs
6
star
92

schema-registry-workshop

JavaScript
6
star
93

support-metrics-common

Common utilities for metrics collection of proactive support
Java
6
star
94

confluent-kafka-go-example

Example application using the confluent-kafka-go client
Go
5
star
95

streaming-ops

Simulated production environment running Kubernetes targeting Apache Kafka and Confluent components on Confluent Cloud. Managed by declarative infrastructure and GitOps.
Shell
5
star
96

learn-kafka-connect

Shell
4
star
97

support-metrics-client

The client application that powers metrics collection for pro-active support
Java
4
star
98

avro-cpp-packaging

Avro C++ library packaging
C++
4
star
99

aws-confluent

Public repository for Confluent on AWS related material.
4
star
100

qcon-ai-workshop

Exercises for QCon Workshop
JavaScript
4
star