• Stars
    star
    8,113
  • Rank 4,576 (Top 0.1 %)
  • Language
    Go
  • Created over 8 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fancy stream processing made operationally mundane

Benthos

godoc for benthosdev/benthos Build Status Discord invite Docs site

Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads.

It comes with a powerful mapping language, is easy to deploy and monitor, and ready to drop into your pipeline either as a static binary, docker image, or serverless function, making it cloud native as heck.

Benthos is declarative, with stream pipelines defined in as few as a single config file, allowing you to specify connectors and a list of processing stages:

input:
  gcp_pubsub:
    project: foo
    subscription: bar

pipeline:
  processors:
    - mapping: |
        root.message = this
        root.meta.link_count = this.links.length()
        root.user.age = this.user.age.number()

output:
  redis_streams:
    url: tcp://TODO:6379
    stream: baz
    max_in_flight: 20

Delivery Guarantees

Delivery guarantees can be a dodgy subject. Benthos processes and acknowledges messages using an in-process transaction model with no need for any disk persisted state, so when connecting to at-least-once sources and sinks it's able to guarantee at-least-once delivery even in the event of crashes, disk corruption, or other unexpected server faults.

This behaviour is the default and free of caveats, which also makes deploying and scaling Benthos much simpler.

Supported Sources & Sinks

AWS (DynamoDB, Kinesis, S3, SQS, SNS), Azure (Blob storage, Queue storage, Table storage), GCP (Pub/Sub, Cloud storage, Big query), Kafka, NATS (JetStream, Streaming), NSQ, MQTT, AMQP 0.91 (RabbitMQ), AMQP 1, Redis (streams, list, pubsub, hashes), Cassandra, Elasticsearch, HDFS, HTTP (server and client, including websockets), MongoDB, SQL (MySQL, PostgreSQL, Clickhouse, MSSQL), and you know what just click here to see them all, they don't fit in a README.

Connectors are being added constantly, if something you want is missing then open an issue.

Documentation

If you want to dive fully into Benthos then don't waste your time in this dump, check out the documentation site.

For guidance on how to configure more advanced stream processing concepts such as stream joins, enrichment workflows, etc, check out the cookbooks section.

For guidance on building your own custom plugins in Go check out the public APIs.

Visual Interface

Do you like looking at stuff? Get angry and smash things when you're forced to read? If you're looking for a visual interface for Benthos check out Benthos Studio, it's a config builder, linter, and deployment management solution all baked into a single application.

Install

Grab a binary for your OS from here. Or use this script:

curl -Lsf https://www.benthos.dev/sh/install | bash

Or pull the docker image:

docker pull ghcr.io/benthosdev/benthos

Benthos can also be installed via Homebrew:

brew install benthos

For more information check out the getting started guide.

Run

benthos -c ./config.yaml

Or, with docker:

# Using a config file
docker run --rm -v /path/to/your/config.yaml:/benthos.yaml ghcr.io/benthosdev/benthos

# Using a series of -s flags
docker run --rm -p 4195:4195 ghcr.io/benthosdev/benthos \
  -s "input.type=http_server" \
  -s "output.type=kafka" \
  -s "output.kafka.addresses=kafka-server:9092" \
  -s "output.kafka.topic=benthos_topic"

Monitoring

Health Checks

Benthos serves two HTTP endpoints for health checks:

  • /ping can be used as a liveness probe as it always returns a 200.
  • /ready can be used as a readiness probe as it serves a 200 only when both the input and output are connected, otherwise a 503 is returned.

Metrics

Benthos exposes lots of metrics either to Statsd, Prometheus, a JSON HTTP endpoint, and more.

Tracing

Benthos also emits open telemetry tracing events, which can be used to visualise the processors within a pipeline.

Configuration

Benthos provides lots of tools for making configuration discovery, debugging and organisation easy. You can read about them here.

Build

Build with Go (any currently supported version):

git clone [email protected]:benthosdev/benthos
cd benthos
make

Lint

Benthos uses golangci-lint for linting, which you can install with:

curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin

And then run it with make lint.

Plugins

It's pretty easy to write your own custom plugins for Benthos in Go, for information check out the API docs, and for inspiration there's an example repo demonstrating a variety of plugin implementations.

Extra Plugins

By default Benthos does not build with components that require linking to external libraries, such as the zmq4 input and outputs. If you wish to build Benthos locally with these dependencies then set the build tag x_benthos_extra:

# With go
go install -tags "x_benthos_extra" github.com/benthosdev/benthos/v4/cmd/benthos@latest

# Using make
make TAGS=x_benthos_extra

Note that this tag may change or be broken out into granular tags for individual components outside of major version releases. If you attempt a build and these dependencies are not present you'll see error messages such as ld: library not found for -lzmq.

Docker Builds

There's a multi-stage Dockerfile for creating a Benthos docker image which results in a minimal image from scratch. You can build it with:

make docker

Then use the image:

docker run --rm \
	-v /path/to/your/benthos.yaml:/config.yaml \
	-v /tmp/data:/data \
	-p 4195:4195 \
	benthos -c /config.yaml

Contributing

Contributions are welcome, please read the guidelines, come and chat (links are on the community page), and watch your back.

More Repositories

1

redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
C++
9,485
star
2

console

Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging.
Go
3,769
star
3

awesome-distributed-transactions

Awesome list of distributed transactions
725
star
4

kminion

KMinion is a feature-rich Prometheus exporter for Apache Kafka written in Go. It is lightweight and highly configurable so that it will meet your requirements.
Go
615
star
5

benthos

Go
205
star
6

redpanda-connect-plugin-example

Benthos plugin examples
Go
89
star
7

helm-charts

Redpanda Helm Chart
Go
75
star
8

deployment-automation

Cluster configuration best practices
HCL
64
star
9

redpanda-operator

Go
36
star
10

openmessaging-benchmark

Java
35
star
11

observability

Python
34
star
12

redpanda-connect-helm-chart

Helm 3 repository for benthosdev/benthos
Smarty
34
star
13

seastar-starter

Seastar boilerplate project with cmake
C++
30
star
14

redpanda-edge-agent

Lightweight internet of things agent that forwards events from the edge
Go
28
star
15

pglogicalstream

PostgreSQL Logical Replication CDC Module for Streaming Database Changes with Golang
Go
23
star
16

redpanda-labs

The home for Redpanda Labs projects.
Go
22
star
17

pg_stream

Postgres Logical Replication plugin for benthos
Go
18
star
18

redpanda-examples

A collection of examples to demonstrate how to interact with Redpanda from various clients and languages.
Go
9
star
19

redpanda-ansible-collection

Python
7
star
20

docs-extensions-and-macros

Extensions and macros developed for Redpanda documentation.
JavaScript
6
star
21

client-swarm

Test utility for running large numbers of concurrent client sessions
Rust
6
star
22

chaos

Python
6
star
23

terraform-provider-redpanda

Go
6
star
24

flink-kafka-examples

A repo of Java examples using Apache Flink with flink-connector-kafka
Java
6
star
25

rp-connect-docs

Documentation for Redpanda Connect
Shell
6
star
26

docs

Open source content for the Redpanda documentation
JavaScript
5
star
27

github-action

Shell
5
star
28

kgo-verifier

Test utility based on franz-go, for consistency checking of Redpanda reads vs. writes
Go
5
star
29

2023-stream-processing-apache-flink-redpanda

Repository containing all the code you need to build a simple streaming ETL pipeline from scratch.
Python
5
star
30

alien_thread

Tools for interoperating with native threads from Seastar
C++
4
star
31

common-go

Go
3
star
32

terraform-aws-redpanda-cluster

HCL
3
star
33

2023-build-time-series-data-stream-timescale-db

This is the GitHub repo for the Redpanda tutorial on building a time series data stream using TimescaleDB
Dockerfile
2
star
34

docs-ui

UI project for the Redpanda documentation site.
CSS
2
star
35

cloud-docs

Redpanda Cloud documentation
JavaScript
2
star
36

kubecon-na-2023

Different Redpanda in Kubernetes Deployment Examples
Shell
1
star
37

how-to-connect-code-snippets

1
star
38

databalancer

Go
1
star
39

2023-unify-log-data-parseable

1
star
40

homebrew-tap

Redpanda Homebrew Tap
Ruby
1
star
41

gcp-psc

HCL
1
star
42

kafka-fsync

Dockerfile
1
star
43

developer-diaries

This repository serves as a comprehensive resource for beginner and intermediate learners in the fields of data engineering and analytics. It contains a curated collection of sketch notes designed to simplify complex concepts and system designs through visual diagrams and concise summaries.
1
star