• Stars
    star
    124
  • Rank 286,014 (Top 6 %)
  • Language
    Scala
  • Created over 9 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Apache Spark and Apache Kafka integration example

Apache Spark and Apache Kafka integration example

Build Status Coverage Status

This example shows how to send processing results from Spark Streaming to Apache Kafka in reliable way. The example follows Spark convention for integration with external data sinks:

// import implicit conversions
import org.mkuthan.spark.KafkaDStreamSink._

// send dstream to Kafka
dstream.sendToKafka(kafkaProducerConfig, topic)

Features

  • KafkaDStreamSink for sending streaming results to Apache Kafka in reliable way.
  • Stream processing fail fast, if the results could not be sent to Apache Kafka.
  • Stream processing is blocked (back pressure), if the Kafka producer is too slow.
  • Stream processing results are flushed explicitly from Kafka producer internal buffer.
  • Kafka producer is shared by all tasks on single JVM (see KafkaProducerFactory).
  • Kafka producer is properly closed when Spark executor is shutdown (see KafkaProducerFactory).
  • Twitter Bijection is used for encoding/decoding KafkaPayload from/into String or Avro.

Quickstart guide

Download latest Apache Kafka distribution and un-tar it.

Start ZooKeeper server:

./bin/zookeeper-server-start.sh config/zookeeper.properties

Start Kafka server:

./bin/kafka-server-start.sh config/server.properties

Create input topic:

./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic input

Create output topic:

./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic output

Start Kafka producer:

./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic input

Start Kafka consumer:

./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic output

Run example application:

sbt "runMain example.WordCountJob"

Publish a few words on input topic using Kafka console producer and check the processing result on output topic using Kafka console producer.

References

More Repositories

1

example-spark

Spark, Spark Streaming and Spark SQL unit testing strategies
Scala
213
star
2

example-ddd-cqrs-server

Example DDD/CQRS based on Implementing Domain Driven Design book written by Vaughn Vernon
Java
40
star
3

garmin-workouts

Command line tool for managing Garmin workouts.
Python
38
star
4

example-spring

Example Spring project
Java
38
star
5

example-axon

Example Axon Framework based implementation based on CQRS Journey essay.
Java
16
star
6

example-kafkastreams

Kafka Streams DSL vs Processor API
Scala
16
star
7

example-flink-kafka

Example Flink and Kafka integration project
Shell
14
star
8

example-jbehave

Example JBehave and Spring Framework setup.
Java
13
star
9

example-beam

Playground for Apache Beam and Scio experiments, driven by real-world use cases.
Scala
8
star
10

example-akka-http

Example Akka HTTP application
Shell
6
star
11

stream-processing

Learn how to develop and test stateful streaming and batch data pipelines
Scala
6
star
12

gcp-dataflow-tampermonkey

Tampermonkey script for GCP Dataflow console with enhanced view for finding job bottlenecks
JavaScript
5
star
13

raspberry-ansible

My Raspberry Pi installation at home.
Jinja
4
star
14

example-jpa

JPA best practices, tips and tricks.
Java
3
star
15

m4enterprise

Maven For Enterprise is an example how to configure and manage Maven projects in the enterprise environment.
Java
3
star
16

mkuthan.github.io

Blog, presentations and other public content published on https://mkuthan.github.io
HTML
3
star
17

example-netty-kafka

Example Http (Netty) producer for Apache Kafka
Java
3
star
18

design-nestedset

Nested Set design pattern for representing hierarchies in relational databases.
Java
3
star
19

design-fsm

Finite State Machine design pattern for simplified workflows in enterprise applications.
Java
2
star
20

example-jms

JMS best practices, tips and tricks
Java
1
star
21

kata-beam

https://beam.apache.org/blog/2019/05/30/beam-kata-release.html
Java
1
star
22

minecraft-gcp

Minecraft Server automation on GCP
HCL
1
star
23

example-avro-parquet

Example Apache Avro schemas with Parquet compatibility tests
Shell
1
star
24

design-best-effort-1pc

Best Effort Single Phase Commit
Java
1
star
25

example-spark-ml

Machine learning, feature engineering examples using Spark ML
Jupyter Notebook
1
star