• Stars
    star
    215
  • Rank 182,881 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 10 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

High performance Kafka consumer for InfluxDB. Supports collectd message formats.

Kafka-InfluxDB

Build Status Coverage Status Code Climate PyPi Version Scrutinizer

A Kafka consumer for InfluxDB written in Python.
Supports InfluxDB 0.9.x and up. For InfluxDB 0.8.x support, check out the 0.3.0 tag.

⚠️ The project should work as expected and bug fixes are very welcome, but activity on new functionality is quite low. For newer projects I recommend vector instead, which is both faster and more versatile.

Use cases

Kafka will serve as a buffer for your metric data during high load.
Also it's useful for sending metrics from offshore data centers with unreliable connections to your monitoring backend.

Quickstart

For a quick test, run kafka-influxdb inside a container alongside Kafka and InfluxDB. Some sample messages are generated automatically on startup (using kafkacat).

Python 2:

make
docker exec -it kafkainfluxdb
python -m kafka_influxdb -c config_example.yaml -s

Python 3:

make RUNTIME=py3
docker exec -it kafkainfluxdb
python -m kafka_influxdb -c config_example.yaml -s

PyPy 5.x

make RUNTIME=pypy
docker exec -it kafkainfluxdb
pypy3 -m kafka_influxdb -c config_example.yaml -s --kafka_reader=kafka_influxdb.reader.kafka_python

(Note that one additional flag is given: --kafka_reader=kafka_influxdb.reader.kafka_python. This is because PyPy is incompatible with the confluent kafka consumer which is a C-extension to librdkafka. Therefore we use the kafka_python library here, which is compatible with PyPy but a bit slower.)

Docker:

docker run mre0/kafka-influxdb

or simply

make run

Installation

pip install kafka_influxdb
kafka_influxdb -c config_example.yaml

Contributing

If you like to contribute, please create a pull request with your change.
Please run the tests before you submit the pull request make test.
If you're unsure, whether a change will be accepted, you can also create an issue first, to discuss.
Or look at the already existing issues for inspiration.

Thanks for contributing!

Performance

The following graph shows the number of messages/s read from Kafka for various Python versions and Kafka consumer plugins.
This is testing against a Kafka topic with 10 partitions and five message brokers. As you can see the best performance is achieved on Python 3 using the -O flag for bytecode optimization in combination with the confluent-kafka reader (default setup). Note that encoding and sending the data to InfluxDB might lower this maximum performance although you should still see a significant performance boost compared to logstash.

Benchmark results

Benchmark

For a quick benchmark, you can start a complete kafkacat -> Kafka -> kafka_influxdb -> Influxdb setup with the following command:

make

This will immediately start reading messages from Kafka and write them into InfluxDB. To see the output, you can use the InfluxDB cli.

docker exec -it docker_influxdb_1 bash # Double check your container name
influx
use metrics
show measurements

Supported formats

You can write a custom encoder to support any input and output format (even fancy things like Protobuf). Look at the examples inside the encoder directory to get started. The following formats are officially supported:

Input formats

mydatacenter.myhost.load.load.shortterm 0.45 1436357630
[{
    "values":[
       0.6
    ],
    "dstypes":[
       "gauge"
    ],
    "dsnames":[
       "value"
    ],
    "time":1444745144.824,
    "interval":10.000,
    "host":"xx.example.internal",
    "plugin":"cpu",
    "plugin_instance":"1",
    "type":"percent",
    "type_instance":"system"
 }]

Output formats

load_load_shortterm,datacenter=mydatacenter,host=myhost value="0.45" 1436357630

Custom encoders

If you are writing your custom encoder and you want to run it using the official docker image, you can simply mount it in the container:

docker run -v `pwd`/config.yaml:/usr/src/app/config.yaml -v `pwd`/myencoder.py:/usr/src/app/myencoder.py mre0/kafka-influxdb --encoder=myencoder

Another possibility is to create a custom Docker image that contains your encoder, for example:

FROM mre0/kafka-influxdb

ADD myencoder.py /usr/src/app/myencoder.py
ADD config.yaml /usr/src/app/

CMD python -m kafka_influxdb -c config.yaml -v --encoder=myencoder

Configuration

Take a look at the config-example.yaml to find out how to create a config file. You can overwrite the settings from the commandline. The following parameters are allowed:

Option Description
-h, --help Show help message and exit
--kafka_host KAFKA_HOST Hostname or IP of Kafka message broker (default: localhost)
--kafka_port KAFKA_PORT Port of Kafka message broker (default: 9092)
--kafka_topic KAFKA_TOPIC Topic for metrics (default: my_topic)
--kafka_group KAFKA_GROUP Kafka consumer group (default: my_group)
--kafka_reader KAFKA_READER Kafka client library to use (kafka_python or confluent) (default: kafka_influxdb.reader.confluent)
--influxdb_host INFLUXDB_HOST InfluxDB hostname or IP (default: localhost)
--influxdb_port INFLUXDB_PORT InfluxDB API port (default: 8086)
--influxdb_user INFLUXDB_USER InfluxDB username (default: root)
--influxdb_password INFLUXDB_PASSWORD InfluxDB password (default: root)
--influxdb_dbname INFLUXDB_DBNAME InfluxDB database to write metrics into (default: metrics)
--influxdb_use_ssl Use SSL connection for InfluxDB (default: False)
--influxdb_verify_ssl Verify the SSL certificate before connecting (default: False)
--influxdb_timeout INFLUXDB_TIMEOUT Max number of seconds to establish a connection to InfluxDB (default: 5)
--influxdb_use_udp Use UDP connection for InfluxDB (default: False)
--influxdb_retention_policy INFLUXDB_RETENTION_POLICY Retention policy for incoming metrics (default: autogen)
--influxdb_time_precision INFLUXDB_TIME_PRECISION Precision of incoming metrics. Can be one of 's', 'm', 'ms', 'u' (default: s)
--encoder ENCODER Input encoder which converts an incoming message to dictionary (default: collectd_graphite_encoder)
--buffer_size BUFFER_SIZE Maximum number of messages that will be collected before flushing to the backend (default: 1000)
-c CONFIGFILE, --configfile CONFIGFILE Configfile path (default: None)
-s, --statistics Show performance statistics (default: True)
-v, --verbose Set verbosity level. Increase verbosity by adding a v: -v -vv -vvv (default: 0)
--version Show version

Comparison with other tools

There is a Kafka input plugin and an InfluxDB output plugin for logstash. It supports Influxdb 0.9+. We've achieved a message throughput of around 5000 messages/second with that setup. Check out the configuration at docker/logstash/config.conf. You can run the benchmark yourself:

make RUNTIME=logstash
docker exec -it logstash
logstash -f config.conf

Please send a Pull Request if you know of other tools that can be mentioned here.

More Repositories

1

idiomatic-rust

πŸ¦€ A peer-reviewed collection of articles/talks/repos which teach concise, idiomatic Rust.
4,626
star
2

the-coding-interview

Programming exercises, code katas and puzzles for your job interview training - or just for fun.
Python
1,697
star
3

hyperjson

🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.
Python
503
star
4

cargo-inspect

Pssst!... see what Rust is doing behind the curtains πŸ•΅πŸ€«
Rust
384
star
5

fcat

A 3x faster implementation of cat, using splice
Rust
249
star
6

vscode-snippet

🐀 A Visual Studio Code extension for cheat.sh. Quickly and easily find code snippets for any language right inside your IDE.
TypeScript
236
star
7

timelapse

🎬 Native macOS app for recording timelapse videos of your desktop.
Python
214
star
8

prettyprint

Beautifully formatted output for your terminal 🌈
Rust
203
star
9

zerocal

πŸš«πŸ“† Serverless calendar built with shuttle.rs
Rust
164
star
10

futures-batch

An adapter for futures, which chunks up elements and flushes them after a timeout β€” or when the buffer is full. (Formerly known as tokio-batch.)
Rust
67
star
11

envy

πŸ’« Manage environment variables without cluttering your .zshrc.
Rust
64
star
12

PHPench

Realtime benchmarks for PHP code
PHP
53
star
13

teams-call

Shell script to detect when you're in a Microsoft Teams Call. Supports Linux and macOS.
Python
51
star
14

mos6502

MOS 6502 emulator written in Rust
Rust
42
star
15

Creamy

A simple CMS in the style of Perch.
PHP
32
star
16

endler.dev

HTML
26
star
17

rust-for-x

Rust for {Ruby, Haskell, C, ...} programmers
24
star
18

rustly

A toy Rustlang URL shortener using Rocket
Rust
21
star
19

pandoc-memoir

Convert Markdown to beautiful PDF using pandoc, LaTeX, and chocolate donuts.
TeX
16
star
20

riffraff

A commandline interface for Jenkins (like `fly` for Concourse).
Go
15
star
21

slack

A simple, idiomatic, fully documented Rust library for the Slack Web API
Rust
11
star
22

rate-limits

Parse HTTP Rate-Limit headers of different vendors
Rust
10
star
23

svg-metadata

Read metadata information of an SVG file (e.g. viewBox size)
Rust
9
star
24

beacon

A dedicated endpoint for real user monitoring. Works with boomerang, statsc and others.
JavaScript
9
star
25

karban

Jekyll for invoices. A static invoice compiler that generates PDFs and HTML from Markdown files.
PHP
9
star
26

spaceglenda

Vectorized, colorized version of Space Glenda, the Plan 9 mascot.
8
star
27

dotvim

My VIM configuration.
Vim Script
8
star
28

pygments-rs

Rust bindings for pygments, a powerful syntax highlighter
Rust
7
star
29

cargo-deliver

πŸ“¦ Run goreleaser with cargo to publish pre-built Rust binaries on Github/Dockerhub/Artifactory
Rust
7
star
30

dropix

A self-updating, caching Dropbox gallery for your website. Customizable, too!
PHP
7
star
31

menudon

πŸ˜πŸ’¬ Mastodon client for your macOS menubar
JavaScript
6
star
32

gh-stats

Async command-line tool for retrieving the stars of all your Github projects. 🌟
Rust
6
star
33

money

Sample code for my talk Idiomatic Rust, which I gave at FOSDEM 2018.
Rust
6
star
34

docker-php-phan

A Docker image for etsy's phan, the static analyzer for PHP
Shell
6
star
35

mre.github.io.v1

Version 1 of my blog, built with Jekyll
HTML
5
star
36

teapot

πŸ«– A super simple dummy server for testing HTTP clients. No dependencies. The little brother of https://github.com/mufeedvh/binserve.
Rust
5
star
37

cargo-inspect-vscode

A Visual Studio Code extension for cargo-inspect
TypeScript
5
star
38

tracker

Realtime hand gesture recognition to control your window manager
Python
5
star
39

awesome-static-analysis-ci

Continuous integration for the awesome static analysis project
Rust
5
star
40

PHP-Crawler

Web Crawler - with email/link scraping and proxy support
PHP
5
star
41

mre.github.io.v2

Version 2 of my blog, built with Cobalt (Rust). Good times
Liquid
5
star
42

jquery.ratingbar

Create graphical ratings (stars, bars) from text: 4/5 => β˜…β˜…β˜…β˜…β˜†
JavaScript
4
star
43

lqip

A boring commandline tool for generating lqip image thumbnails
Rust
4
star
44

rust-cli-workshop

Learn Rust by writing CLI tools!
PHP
4
star
45

engldict

Dabblings into language learning
Python
4
star
46

edgecast

A golang client for the Edgecast API
Go
3
star
47

mos6532

MOS 6532 Emulator
Rust
3
star
48

cudampi

Large hybrid CPU/GPU sorting network using CUDA and MPI
C++
3
star
49

kafka-benchmark

A simple kafka script for benchmarking kafka message throughput
Python
3
star
50

freq

πŸ—Ό A CLI term frequency analyzer. Counts the number of occurrences of each word in an input and creates formatted output or a histogram.
Rust
3
star
51

Process

A lightweight PHP wrapper for interactive subprocesses
PHP
3
star
52

rust-language-bindings

A list of available language bindings for Rust
2
star
53

mre

This is m(r)e
2
star
54

past

Rust
2
star
55

async-talk

Slides and additional material for my talk about asynchronous programming
HTML
2
star
56

swarm-demo

Demonstrates the networking and load-balancing power of Docker Swarm
Go
2
star
57

jbmc

Java Bounded Model Checker
Java
2
star
58

envsync

Keeps `.env` files in sync with `env.sample`.
Rust
2
star
59

jQuery.anchorfy

Create a linked list of all headers inside a container. Supports animations and is easy to adjust.
HTML
2
star
60

bromance

A friendly commandline tool for bro and tldr
Python
1
star
61

maxminddb-rust-bench

Rust
1
star
62

mre.github.io.v3

Version 3 of my blog, built with Zola (Rust). It was a great time.
HTML
1
star
63

stream-processors

Just a list of stream-processors
1
star
64

GnuPlot

A thin wrapper around GnuPlot
PHP
1
star
65

npm-quickCal

an easy to use booking appointment calendar
JavaScript
1
star
66

DrawRoom

Distraction free drawing
Python
1
star
67

peek

Experiments with an alternative remote desktop implementation
Python
1
star
68

voctokey

Control voctomix remotely via keyboard shortcuts
Python
1
star
69

accs

Parse menu of the ACCS cantine DΓΌsseldorf.
Python
1
star
70

llama

A silly game written in Python using cocos2d
Python
1
star
71

toypc

A shitty Rust implementation of http://adventofcode.com/day/23
Rust
1
star
72

ideas

Pretty please make them real.
1
star
73

sheldon

A wonderful directory of useful shell-scripts
Shell
1
star
74

collectd

A docker image for collectd compiled from source
1
star
75

Talks

...mostly Computer Science related.
JavaScript
1
star
76

live

Live programming / workshop material
HTML
1
star