• This repository has been archived on 18/Sep/2023
  • Stars
    star
    100
  • Rank 340,703 (Top 7 %)
  • Language
    Clojure
  • License
    Eclipse Public Li...
  • Created over 10 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Safely archive data from Apache Kafka to S3 with no Hadoop dependencies :)

Thor wades rivers while the rest of the æsir ride across the bridge Bifröst as described in Grímnismál..

bifrost

Archive Kafka data safely to either Amazon's S3 or Microsoft's Azure Blob Storage.

bifrost is a daemon that connects to Kafka, consumes all topics and persists them to the cloud for long term storage and analysis. It uses the baldr file format to store Kafka messages. baldr-files are gzipped before they are written to disk.

There are other services for persisting messages from Kafka to S3, such as secor from pinterest. The main difference between bifrost and these is, that whereas they rely on the Hadoop sequence files for storage, bifrost uses the baldr format. This means that consumers of the persisted messages do not need to rely on often very large implementations of libraries for reading Hadoop sequence files. The baldr file format follows a minimal design that allows for easy and quick streaming with a very small code-footprint. It does not allow for arbitrary indexing.

Usage

bifrost can be run directly from a checkout of the project by using leiningen. The app requires some basic configuration, namely ZooKeeper configuration to connect to Kafka and cloud storage credentials to store baldr-files on the chosen cloud storage. The project contains an example configuration in etc/config.edn.example.

 $ lein run -- --config ./etc/config.edn

To run the app in production, we recomment building an uberjar and run that on the app server.

$ lein uberjar
$ java -jar target/*-standalone.jar --config /opt/uswitch/bifrost/etc/config.edn

The Java temp-dir is used for storing baldr-files locally before uploading them. Files are removed upon succesful upload and program exit. To change the temp-directory, override java.io.tmpdir.

Logging is done through logback. To configure logback, please set logback.configurationFile. The logback configuration is only respected for uberjars. sl4j is used in development.

Here's a complete example of configuring and running an uberjar in production.

$ java -Djava.io.tmpdir=/mnt/bifrost-tmp \
       -Dlogback.configurationFile=/opt/uswitch/bifrost/etc/logback.xml \
       -server \
       -jar /opt/uswitch/bifrost/lib/*-standalone.jar \
       --config /opt/uswitch/bifrost/etc/config.edn

License

Copyright © 2014 uSwitch

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

More Repositories

1

kiam

Integrate AWS IAM with Kubernetes
Go
1,135
star
2

lambada

A more passionate way to write AWS Lambda functions
Clojure
315
star
3

yggdrasil

Envoy Control Plane for Kubernetes Multi-cluster Ingress
Go
189
star
4

blueshift

Automate copying data from S3 into Amazon Redshift
Clojure
119
star
5

vault-creds

Sidecar container for requesting dynamic Vault database secrets
Go
86
star
6

nidhogg

Kubernetes Node taints based on Daemonset Pods
Go
73
star
7

terrafying

A small ruby dsl for terraform.
Ruby
58
star
8

torus-pong

A massive multiplayer online take on the arcade classic, written as a part of Clojure Cup 2013
Clojure
50
star
9

speculate

A library that transforms clojure.spec
Clojure
44
star
10

klint

A 'realtime' kubernetes resource linter
Go
41
star
11

sqs-autoscaler-controller

Kubernetes controller for scaling Deployments based on AWS SQS queue length
Go
31
star
12

serverless-hooks-plugin

A plugin to run arbitrary commands on any lifecycle event in serverless
JavaScript
30
star
13

vault-webhook

Kubernetes Mutating Webhook to inject Vault-Creds Sidecar into pods
Go
26
star
14

baldr-old-dead

Pure and light binary records
Clojure
23
star
15

surtr

AWS Kubernetes Node Terminator
Go
22
star
16

heimdall

Generate PrometheusRule CRDs from Ingress annotations and Go templates
Go
22
star
17

big-replicate

Replicates data between Google Cloud BigQuery projects
Clojure
21
star
18

journald-forwarder

Forward systemd journals to Loggly
Go
21
star
19

ustyle

A living styleguide and pattern library by uSwitch.
Smarty
19
star
20

syslogger

Forwards syslog messages to Kafka
Go
16
star
21

koa-core

🎾 Core libraries and example project of how to use @uswitch/koa libraries
JavaScript
16
star
22

ej

a tool to convert from EDN to JSON
Haskell
16
star
23

opencensus-clojure

Clojure
15
star
24

transducers-workshop

Transducers workshop slides and labs
Clojure
15
star
25

incident-app

Incident Management Slack Bot
Ruby
13
star
26

analytij

Clojure client library to interact with the Google Analytics API
Clojure
12
star
27

bqshift

Export data from Redshift to BigQuery
Go
12
star
28

hermod

The Messenger of the Gods
Go
11
star
29

clj-rad

Clojure wrapper of Netflix Surus Robust Anomaly Detection
Clojure
9
star
30

k8s-podmon

A service to monitor failing jobs and pods
Go
9
star
31

loglet

Forward journald log messages to Kafka
Go
8
star
32

adworj

Clojure library to make it easier to interact with Google AdWords
Clojure
8
star
33

rest-client-logger

Adds logging of RestClient requests to the Rails debug log
Ruby
7
star
34

ustyle-react

uStyle + React for the people.
JavaScript
7
star
35

node-problem-detector

Custom plugins for node-problem-detector used in uSwitch
Shell
7
star
36

ssi-loader

Webpack ssi loader
JavaScript
7
star
37

kf

kafka follow
Go
6
star
38

dagr

runs programs every day (in Norse mythology, Dagr is day personified)
Go
6
star
39

etcd-experiment

An experiment in zero downtime clojure app deployments using etcd
Clojure
6
star
40

koa-access

💀DEPRECATED 👌A Koa middleware for reporting JSON access logs
JavaScript
6
star
41

vault-tokens

Generates Vault tokens for a User based off their AD groups
Go
5
star
42

ontology

Ruby
4
star
43

fsnotify

Cross-platform file system notifications for Go. https://fsnotify.org
Go
4
star
44

elastic-log-lag

Calculates the log lag on Elasticsearch indexes
Go
3
star
45

terraform-provider-segment

A Terraform provider to manage Segment resources via code.
Go
3
star
46

terrafying-components

Ruby
3
star
47

blueshift-riemann-metrics

Riemann metric publishing for Blueshift
Clojure
3
star
48

kubernetes-google-auth

Go
2
star
49

log4-clj-layout

Clojure
2
star
50

labs-window-functions

Provides a docker environment for playing with window functions in PostgreSQL
Shell
2
star
51

rack-ssi

Rack middleware for SSI processing, based on nginx HttpSsiModule
Ruby
2
star
52

terraform-aws-to-gcp-vpn

Creates infrastructure in AWS and GCP for HA VPNs between the two.
HCL
2
star
53

fads

Clojure
1
star
54

browser-tools

Suite of favelets, chrome extensions, audits and snippets to help test and debug the uSwitch website in the browser.
JavaScript
1
star
55

trustyle

React components with style
TypeScript
1
star
56

hypermq

Hypermedia (AtomPub like) message queue
Clojure
1
star
57

rabbitmq-worker

A small wrapper around the Langohr RabbitMQ client
Clojure
1
star
58

fed-convert

Converts Kubernetes resource files into Federated resources
Go
1
star
59

bdcat

decode baldr files in Go
Go
1
star
60

ontology-ui

JavaScript
1
star
61

stdout-fs

Python
1
star
62

instance-signals

Go
1
star
63

riemann-redis-info

Ruby
1
star
64

baldrcat

Tool to print contents of .baldr archives on S3
Clojure
1
star
65

uswitch-academy

Stuff related to courses, internal training and logistic thereof.
JavaScript
1
star
66

airship-aio-ticketing

composite github action for ticketing
1
star
67

bqstream

Stream newline-delimited JSON into BigQuery from STDIN
Go
1
star
68

ookla-netgauge-server

Docker image definition for Ookla Netgauge Server
1
star
69

dockerfiles-etcd-srv-bootstrap

etcd and helper script to do DNS/SRV discovery
Python
1
star