• Stars
    star
    143
  • Rank 251,332 (Top 6 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 5 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Connector that generates data for demos

Table of Contents

Overview

kafka-connect-datagen is a Kafka Connect connector for generating mock data for testing and is not suitable for production scenarios. It is available in Confluent Hub.

Versions

There are multiple released versions of this connector, starting with 0.1.0. The instructions below use version 0.4.0 as an example, but you can substitute any of the other released versions. In fact, unless specified otherwise, we recommend using the latest released version to get all of the features and bug fixes.

Usage

You can choose to install a released version of the kafka-connect-datagen from Confluent Hub or build it from source. For running the connector you can choose a local Confluent Platform Installation or in a Docker container.

Install the connector from Confluent Hub to a local Confluent Platform

Using the Confluent Hub Client you may install the kafka-connect-datagen connector from Confluent Hub.

To install a specific release version you can run:

confluent-hub install confluentinc/kafka-connect-datagen:0.4.0

or to install the latest released version:

confluent-hub install confluentinc/kafka-connect-datagen:latest

Build connector from latest code

Alternatively, you may build and install the kafka-connect-datagen connector from latest code. Here we use v0.4.0 to reference the git tag for the 0.4.0 version, but the same pattern works for all released versions.

git checkout v0.4.0
mvn clean package
confluent-hub install target/components/packages/confluentinc-kafka-connect-datagen-0.4.0.zip

Run connector in local install

Here is an example of how to run the kafka-connect-datagen on a local Confluent Platform after it's been installed. Configuration details are provided below.

confluent local start connect
confluent local config datagen-pageviews -- -d config/connector_pageviews.config
confluent local status connectors
confluent local consume test1 --value-format avro --max-messages 5 --property print.key=true --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer --from-beginning

Install the connector from Confluent Hub into a Kafka Connect based Docker image

A Docker image based on Kafka Connect with the kafka-connect-datagen plugin is already available in Dockerhub, and it is ready for you to use.

If you want to build a local copy of the Docker image with kafka-connect-datagen, this project provides a Dockerfile that you can reference.

You can create a Docker image packaged with the locally built source by running (for example with the 5.5.0 version of Confluent Platform):

make build-docker-from-local CP_VERSION=5.5.0

This will build the connector from source and create a local image with an aggregate version number. The aggregate version number is the kafka-connect-datagen connector version number and the Confluent Platform version number separated with a -. The local kafka-connect-datagen version number is defined in the pom.xml file, and the Confluent Platform version defined in the Makefile. An example of the aggregate version number might be: 0.4.0-6.1.0.

Alternatively, you can install the kafka-connect-datagen connector from Confluent Hub into a Docker image by running:

make build-docker-from-released CP_VERSION=5.5.0

The Makefile contains some default variables that affect the version numbers of both the installed kafka-connect-datagen as well as the base Confluent Platform version. The variables are located near the top of the Makefile with the following names and current default values:

CP_VERSION ?= 6.1.0

KAFKA_CONNECT_DATAGEN_VERSION ?= 0.4.0

These values can be overriden with variable declarations before the make command. For example:

KAFKA_CONNECT_DATAGEN_VERSION=0.3.2 make build-docker-from-released

Run connector in Docker Compose

Here is an example of how to run the kafka-connect-datagen with the provided docker-compose.yml file. If you wish to use a different Docker image tag, be sure to modify appropriately in the docker-compose.yml file.

docker-compose up -d --build
curl -X POST -H "Content-Type: application/json" --data @config/connector_pageviews.config http://localhost:8083/connectors
docker-compose exec connect kafka-console-consumer --topic pageviews --bootstrap-server kafka:29092  --property print.key=true --max-messages 5 --from-beginning

Configuration

Generic Kafka Connect Parameters

See all Kafka Connect configuration parameters.

kafka-connect-datagen Specific Parameters

Parameter Description Default
kafka.topic Topic to write to
max.interval Max interval between messages (ms) 500
iterations Number of messages to send from each task, or less than 1 for unlimited -1
schema.string The literal JSON-encoded Avro schema to use. Cannot be set with schema.filename or quickstart
schema.filename Filename of schema to use. Cannot be set with schema.string or quickstart
schema.keyfield Name of field to use as the message key
quickstart Name of quickstart to use. Cannot be set with schema.string or schema.filename

Sample configurations

See the config folder for sample configurations.

Supported data formats

Kafka Connect supports Converters which can be used to convert record key and value formats when reading from and writing to Kafka. As of the 5.5 release, Confluent Platform packages Avro, JSON, and Protobuf converters (earlier versions package just Avro converters).

For an example of using the the Protobuf converter with kafka-connect-datagen, see this example configuration. Take note of the required use of the SetSchemaMetadata Transformation which addresses a compatibility issue between schema names used by kafka-connect-datagen and Protobuf. See the Schema names are not compatible with Protobuf issue for details.

Use a bundled schema specification

There are a few quickstart schema specifications bundled with kafka-connect-datagen, and they are listed in this directory. To use one of these bundled schema, refer to this mapping and in the configuration file, set the parameter quickstart to the associated name. For example:

...
"quickstart": "users",
...

Define a new schema specification

You can also define your own schema specifications if you want to customize the fields and their values to be more domain specific or to match what your application is expecting. Under the hood, kafka-connect-datagen uses Avro Random Generator, so the only constraint in writing your own schema specification is that it is compatible with Avro Random Generator. To define your own schema:

  1. Create your own schema file /path/to/your_schema.avsc that is compatible with Avro Random Generator
  2. In the connector configuration, remove the configuration parameter quickstart and add the parameters schema.filename (which should be the absolute path) and schema.keyfield:
...
"schema.filename": "/path/to/your_schema.avsc",
"schema.keyfield": "<field representing the key>",
...

The custom schema can be used at runtime; it is not necessary to recompile the connector.

Record keys

You can control the keys that the connector publishes with its records via the schema.keyfield property. If it's set, the connector will look for a field with that name in the top-level Avro records that it generates, and use the value and schema of that field for the key of the message that it publishes to Kafka.

Keys can be any type (string, int, record, etc.) and can also be nullable. If no schema.keyfield is provided, the key will be null with an optional string schema.

Confusion about schemas and Avro

To define the set of "rules" for the mock data, kafka-connect-datagen uses Avro Random Generator. The configuration parameters quickstart or schema.filename specify the Avro schema, or the set of "rules", which declares a list of primitives or more complex data types, length of data, and other properties about the generated mock data. Examples of these schema files are listed in this directory.

Do not confuse the above terminology with Avro and schemas used in a different context as described below. The Avro schemas for generating mock data are independent of (1) the format of the data produced to Kafka and (2) the schema in Confluent Schema Registry.

  1. The format of data produced to Kafka may or may not be Avro. To define the format of the data produced to Kafka, you must set the format type in your connector configuration. The connector configuration parameters can be defined for the key or value. For example, to produce messages to Kafka where the message value format is Avro, set the value.converter and value.converter.schema.registry.url parameters:
...
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
...

Or to produce messages to Kafka where the message value format is JSON, set the value.converter parameter:

...
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
...
  1. The schema in Confluent Schema Registry declares the record fields and their types, and is used by Kafka clients when they are configured to produce or consume Avro data. As an example, consider the following "rule" in the schema specification to generate a field userid:
...
{"name": "userid", "type": {
    "type": "string",
    "arg.properties": {
        "regex": "User_[1-9]{0,1}"
    }
}},
...

If you are using Avro format for producing data to Kafka, here is the corresponding field in the registered schema in Confluent Schema Registry:

{"name": "userid", "type": ["null", "string"], "default": null},

If you are not using Avro format for producing data to Kafka, there will be no schema in Confluent Schema Registry.

Utility Headers

The Datagen Connector will capture details about the record's generation in the headers of the records it produces. The following fields are populated:

Header Key Header Value
task.generation Task generation number (starts at 0, incremented each time the task restarts)
task.id Task id number (0 up to tasks.max - 1)
current.iteration Record iteration number (starts at 0, incremented each time a record is generated)

Publishing Docker Images

Note: The following instructions are only relevant if you are an administrator of this repository and have push access to the https://hub.docker.com/r/cnfldemos/kafka-connect-datagen/ repository. The local Docker daemon must be logged into a proper Docker Hub account.

To release new versions of the Docker images to Dockerhub (https://hub.docker.com/r/cnfldemos/kafka-connect-datagen/ & https://hub.docker.com/r/cnfldemos/cp-server-connect-operator-with-datagen) use the respective targets in the Makefile.

The Makefile contains some default variables that affect the version numbers of both the installed kafka-connect-datagen as well as the base Confluent Platform version. The variables are located near the top of the Makefile with the following names and current default values:

CP_VERSION ?= 6.1.0
KAFKA_CONNECT_DATAGEN_VERSION ?= 0.4.0
OPERATOR_VERSION ?= 0 # Operator is a 'rev' version appended at the end of the CP version, like so: 5.5.0.0

To publish the https://hub.docker.com/r/cnfldemos/kafka-connect-datagen/ image:

make push-from-released

and to override the CP Version of the kafka-connect-datagen version you can run something similar to:

CP_VERSION=5.5.0 KAFKA_CONNECT_DATAGEN_VERSION=0.1.4 make publish-cp-kafka-connect-confluenthub

to override the CP Version and the Operator version, which may happen if Operator releases a patch version, you could run something similar to:

CP_VERSION=5.5.0 OPERATOR_VERSION=1 KAFKA_CONNECT_DATAGEN_VERSION=0.1.4 make push-cp-server-connect-operator-from-released

which would result in a docker image tagged as: cp-server-connect-operator-datagen:0.1.4-5.5.0.1 and pushed to DockerHub

More Repositories

1

librdkafka

The Apache Kafka C/C++ library
C
7,353
star
2

ksql

The database purpose-built for stream processing applications.
Java
5,533
star
3

confluent-kafka-go

Confluent's Apache Kafka Golang client
Go
4,402
star
4

confluent-kafka-python

Confluent's Kafka Python Client
Python
3,388
star
5

confluent-kafka-dotnet

Confluent's Apache Kafka .NET client
C#
2,560
star
6

kafka-streams-examples

Demo applications and code examples for Apache Kafka's Streams API.
Java
2,169
star
7

kafka-rest

Confluent REST Proxy for Kafka
Java
2,137
star
8

schema-registry

Confluent Schema Registry for Kafka
Java
2,022
star
9

examples

Apache Kafka and Confluent Platform examples and demos
Shell
1,859
star
10

bottledwater-pg

Change data capture from PostgreSQL into Kafka
C
1,521
star
11

demo-scene

👾Scripts and samples to support Confluent Demos and Talks. ⚠️Might be rough around the edges ;-) 👉For automated tutorials and QA'd code, see https://github.com/confluentinc/examples/
Shell
1,356
star
12

cp-docker-images

[DEPRECATED] Docker images for Confluent Platform.
Python
1,140
star
13

kafka-connect-jdbc

Kafka Connect connector for JDBC-compatible databases
Java
953
star
14

cp-all-in-one

docker-compose.yml files for cp-all-in-one , cp-all-in-one-community, cp-all-in-one-cloud, Apache Kafka Confluent Platform
Python
889
star
15

cp-helm-charts

The Confluent Platform Helm charts enable you to deploy Confluent Platform services on Kubernetes for development, test, and proof of concept environments.
Mustache
764
star
16

kafka-connect-elasticsearch

Kafka Connect Elasticsearch connector
Java
715
star
17

parallel-consumer

Parallel Apache Kafka client wrapper with per message ACK, client side queueing, a simpler consumer/producer API with key concurrency and extendable non-blocking IO processing.
Java
654
star
18

cp-demo

Confluent Platform Demo including Apache Kafka, ksqlDB, Control Center, Schema Registry, Security, Schema Linking, and Cluster Linking
Shell
540
star
19

cp-ansible

Ansible playbooks for the Confluent Platform
Jinja
471
star
20

kafka-connect-hdfs

Kafka Connect HDFS connector
Java
465
star
21

kafka-tutorials

Tutorials and Recipes for Apache Kafka
Java
296
star
22

kafka-images

Confluent Docker images for Apache Kafka
Python
295
star
23

ducktape

System integration and performance tests
Python
294
star
24

kafka-connect-storage-cloud

Kafka Connect suite of connectors for Cloud storage (Amazon S3)
Java
260
star
25

jmx-monitoring-stacks

📊 Monitoring examples for Confluent Cloud and Confluent Platform
Shell
242
star
26

confluent-kubernetes-examples

Example scenario workflows for Confluent for Kubernetes
Shell
150
star
27

kafka-rest-node

Node.js client for the Kafka REST proxy
JavaScript
146
star
28

confluent-platform-security-tools

Security tools for the Confluent Platform.
Shell
146
star
29

docker-images

DEPRECATED - Dockerfiles for Confluent Stream Data Platform
Shell
116
star
30

rest-utils

Utilities and a small framework for building REST services with Jersey, Jackson, and Jetty.
Java
111
star
31

terraform-provider-confluent

Terraform Provider for Confluent
Go
110
star
32

cli

CLI for Confluent Cloud and Confluent Platform
Go
103
star
33

confluent-sigma

JavaScript
93
star
34

openmessaging-benchmark

Java
89
star
35

camus

Mirror of Linkedin's Camus
Java
88
star
36

common

Common utilities library containing metrics, config and utils
Java
85
star
37

streaming-ops

Simulated production environment running Kubernetes targeting Apache Kafka and Confluent components on Confluent Cloud. Managed by declarative infrastructure and GitOps.
Shell
84
star
38

learn-kafka-courses

Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.
Shell
81
star
39

kafka-workshop

JavaScript
75
star
40

kafka-connect-storage-common

Shared software among connectors that target distributed filesystems and cloud storage.
Java
71
star
41

training-developer-src

Source Code accompanying the Confluent Kafka for Developers course
Java
70
star
42

ccloud-tools

Running Tools from Confluent Platform along with your Confluent Cloud™ Cluster
HCL
67
star
43

confluent-kafka-javascript

Confluent's Apache Kafka JavaScript client
JavaScript
67
star
44

bincover

Easily measure code coverage of Golang binaries
Go
62
star
45

libserdes

Avro Serialization/Deserialization C/C++ library with Confluent schema-registry support
C
62
star
46

commercial-workshops

Confluent Commercial SE Team's Demo and Workshop Repository
Python
59
star
47

kafka-connect-blog

Demo for Kafka Connect with JDBC and HDFS Connectors
Shell
59
star
48

data-mesh-demo

A Data Mesh proof-of-concept built on Confluent Cloud
Java
59
star
49

confluent-cli

Confluent Platform CLI
Shell
58
star
50

ksqldb-graphql

Node.js GraphQL integration for ksqlDB
TypeScript
56
star
51

terraform-provider-confluentcloud

Confluent Cloud Terraform Provider is deprecated in favor of Confluent Terraform Provider
Go
52
star
52

confluent-hybrid-cloud-workshop

Confluent Hybrid Cloud Workshop
HCL
43
star
53

qcon-microservices

Example online orders app composed of event-driven microservices. Built for QCon workshop.
Java
38
star
54

securing-kafka-blog

Secure Kafka cluster (in a VM) for development and testing
Puppet
38
star
55

operator-earlyaccess

Confluent Operator Early Access docs
37
star
56

training-administration-src

Contains docker-compose file needed for Apache Kafka Administration by Confluent training
HTML
36
star
57

mox

A hybrid mock and proxy server - easily programmable and runs on express
JavaScript
35
star
58

terraform-state-s3

Terraform module to create the S3/DynamoDB backend to store the Terraform state+lock
HCL
34
star
59

common-docker

Confluent Commons with support for building and testing Docker images.
Java
34
star
60

ksql-recipes-try-it-at-home

Files needed to try out KSQL Recipes for yourself
Shell
34
star
61

DotNetStreamProcessing

This repository explores stream processing with the Confluent .NET Producer, Consumer and the TPL library
C#
34
star
62

demo-database-modernization

This demo shows how to stream data to cloud databases with Confluent. It includes fully-managed connectors (Oracle CDC, RabbitMQ, MongoDB Atlas), ksqlDB/Flink SQL as stream processing engine.
HCL
34
star
63

ccloud-connectivity

Setup and testing connectivity to Confluent Cloud
Shell
31
star
64

training-ksql-and-streams-src

Sample solutions for the exercises of the course KSQL & Kafka Streams
Java
30
star
65

pmm

Java
30
star
66

schema-registry-images

Docker Images for Schema Registry
Python
29
star
67

confluent-docker-utils

Common Python utils for testing Confluent's Docker images
Python
28
star
68

ksql-images

KSQL platform docker images
Shell
27
star
69

live-labs

This repository contains demos created by Technical Marketing team and delivered as part of Confluent Live Labs program. The team members will continuously publish more demos over time, meanwhile we encourage you to create a Confluent Cloud account and try the existing content.
HTML
26
star
70

proto-go-setter

Go
23
star
71

coding-in-motion

Source code for the "Coding in Motion" series.
Nix
23
star
72

stream-me-up-scotty

A wide range of Digital Assets from Confluent's Solution Engineering team for Confluent Cloud
22
star
73

flink-cookbook

Java
22
star
74

online-inferencing-blog-application

Source code and application accompanying the online inferencing blog
Java
21
star
75

training-fundamentals-src

Source code accompanying the course "Apache Kafka Technical Essentials"
Shell
19
star
76

infoq-kafka-ksql

Code samples to go with InfoQ article
Shell
17
star
77

kafka-rest-images

Docker Images for Kafka REST
Python
17
star
78

kafka-mqtt-images

Confluent Docker images for Kafka MQTT
Shell
16
star
79

training-cao-src

Source code accompanying the course "Monitoring, Troubleshooting and Tuning"
Java
13
star
80

event-streaming-patterns

A collection of Event Streaming Patterns, including problem statements, solutions, and implementation examples.
HTML
13
star
81

learn-building-flink-applications-in-java-exercises

Java
13
star
82

demo-realtime-data-warehousing

Streaming data pipelines for real-time data warehousing. Includes fully managed connectors (PostgreSQL CDC, Snowflake).
HCL
13
star
83

learn-apache-flink-101-exercises

Dockerfile
12
star
84

kibosh

C
12
star
85

ksql-workshop

KSQL Workshop
11
star
86

control-center-images

Docker images for enterprise control center images
Python
11
star
87

kafka-connect-http-demo

A demo target for running the Confluent HTTP sink connector
Java
11
star
88

csid-secrets-providers

Enables use of external third-party systems for storing/retrieving key/value pairs with Confluent clusters.
Java
11
star
89

castle

Castle is a test harness for Apache Kafka, Trogdor, and related projects.
Java
11
star
90

kafkacat-images

Docker Images for Kafkacat
10
star
91

ksqldb-recipes

Makefile
10
star
92

demo-stream-designer

Current 2022 Confluent Keynote Demo covering Stream Designer, Stream Catalog, and Stream Sharing.
Python
10
star
93

confluent-kafka-go-dev

[EXPERIMENTAL] Development / WIP / exploratory / test fork of confluent-kafka-go
Go
10
star
94

apac-workshops

Pull Requests for GitHub repository settings
Jupyter Notebook
10
star
95

cfk-workshop

Java
10
star
96

policy-library-confluent-terraform

HCL
10
star
97

demo-change-data-capture

This demo shows how to capture data changes from relational databases (Oracle and PostgreSQL) and stream them to Confluent Cloud, use ksqlDB for real-time stream processing, send enriched data to cloud data warehouses (Snowflake and Amazon Redshift).
HCL
10
star
98

learn-kafka-kraft

KRaft mode playground
Shell
9
star
99

demo-application-modernization

Application modernization example including Confluent Cloud, ksqlDB, Postgres, and Elasticsearch.
JavaScript
9
star
100

hackathons

Contains skeleton projects for hackathons.
Python
8
star