• Stars
    star
    830
  • Rank 54,934 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Service to replicate InfluxDB data for high availability

InfluxDB Relay

This project adds a basic high availability layer to InfluxDB. With the right architecture and disaster recovery processes, this achieves a highly available setup.

NOTE: influxdb-relay must be built with Go 1.5+

Usage

To build from source and run:

$ # Install influxdb-relay to your $GOPATH/bin
$ go get -u github.com/influxdata/influxdb-relay
$ # Edit your configuration file
$ cp $GOPATH/src/github.com/influxdata/influxdb-relay/sample.toml ./relay.toml
$ vim relay.toml
$ # Start relay!
$ $GOPATH/bin/influxdb-relay -config relay.toml

Configuration

[[http]]
# Name of the HTTP server, used for display purposes only.
name = "example-http"

# TCP address to bind to, for HTTP server.
bind-addr = "127.0.0.1:9096"

# Enable HTTPS requests.
ssl-combined-pem = "/etc/ssl/influxdb-relay.pem"

# Array of InfluxDB instances to use as backends for Relay.
output = [
    # name: name of the backend, used for display purposes only.
    # location: full URL of the /write endpoint of the backend
    # timeout: Go-parseable time duration. Fail writes if incomplete in this time.
    # skip-tls-verification: skip verification for HTTPS location. WARNING: it's insecure. Don't use in production.
    { name="local1", location="http://127.0.0.1:8086/write", timeout="10s" },
    { name="local2", location="http://127.0.0.1:7086/write", timeout="10s" },
]

[[udp]]
# Name of the UDP server, used for display purposes only.
name = "example-udp"

# UDP address to bind to.
bind-addr = "127.0.0.1:9096"

# Socket buffer size for incoming connections.
read-buffer = 0 # default

# Precision to use for timestamps
precision = "n" # Can be n, u, ms, s, m, h

# Array of InfluxDB instances to use as backends for Relay.
output = [
    # name: name of the backend, used for display purposes only.
    # location: host and port of backend.
    # mtu: maximum output payload size
    { name="local1", location="127.0.0.1:8089", mtu=512 },
    { name="local2", location="127.0.0.1:7089", mtu=1024 },
]

Description

The architecture is fairly simple and consists of a load balancer, two or more InfluxDB Relay processes and two or more InfluxDB processes. The load balancer should point UDP traffic and HTTP POST requests with the path /write to the two relays while pointing GET requests with the path /query to the two InfluxDB servers.

The setup should look like this:

        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                 
        โ”‚writes & queries โ”‚                 
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                 
                 โ”‚                          
                 โ–ผ                          
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                  
         โ”‚               โ”‚                  
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚ Load Balancer โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        
โ”‚        โ”‚               โ”‚         โ”‚        
โ”‚        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ”‚        
โ”‚               โ”‚ โ”‚                โ”‚        
โ”‚               โ”‚ โ”‚                โ”‚        
โ”‚        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚        
โ”‚        โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚       โ”‚โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚        โ”‚ โ”‚/write or UDPโ”‚ โ”‚       โ”‚โ”‚/queryโ”‚
โ”‚        โ–ผ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ–ผ       โ”‚โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚        
โ”‚  โ”‚ InfluxDB โ”‚      โ”‚ InfluxDB โ”‚  โ”‚        
โ”‚  โ”‚ Relay    โ”‚      โ”‚ Relay    โ”‚  โ”‚        
โ”‚  โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”˜  โ”‚        
โ”‚     โ”‚    |              |  โ”‚     โ”‚        
โ”‚     |  โ”Œโ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  |     โ”‚        
โ”‚     โ”‚  โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚     โ”‚        
โ”‚     โ–ผ  โ–ผ                โ–ผ  โ–ผ     โ”‚        
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚        
โ”‚  โ”‚          โ”‚      โ”‚          โ”‚  โ”‚        
โ””โ”€โ–ถโ”‚ InfluxDB โ”‚      โ”‚ InfluxDB โ”‚โ—€โ”€โ”˜        
   โ”‚          โ”‚      โ”‚          โ”‚           
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜           

The relay will listen for HTTP or UDP writes and write the data to each InfluxDB server via the HTTP write or UDP endpoint, as appropriate. If the write is sent via HTTP, the relay will return a success response as soon as one of the InfluxDB servers returns a success. If any InfluxDB server returns a 4xx response, that will be returned to the client immediately. If all servers return a 5xx, a 5xx will be returned to the client. If some but not all servers return a 5xx that will not be returned to the client. You should monitor each instance's logs for 5xx errors.

With this setup a failure of one Relay or one InfluxDB can be sustained while still taking writes and serving queries. However, the recovery process might require operator intervention.

Buffering

The relay can be configured to buffer failed requests for HTTP backends. The intent of this logic is reduce the number of failures during short outages or periodic network issues.

This retry logic is NOT sufficient for for long periods of downtime as all data is buffered in RAM

Buffering has the following configuration options (configured per HTTP backend):

  • buffer-size-mb -- An upper limit on how much point data to keep in memory (in MB)
  • max-batch-kb -- A maximum size on the aggregated batches that will be submitted (in KB)
  • max-delay-interval -- the max delay between retry attempts per backend. The initial retry delay is 500ms and is doubled after every failure.

If the buffer is full then requests are dropped and an error is logged. If a requests makes it into the buffer it is retried until success.

Retries are serialized to a single backend. In addition, writes will be aggregated and batched as long as the body of the request will be less than max-batch-kb If buffered requests succeed then there is no delay between subsequent attempts.

If the relay stays alive the entire duration of a downed backend server without filling that server's allocated buffer, and the relay can stay online until the entire buffer is flushed, it would mean that no operator intervention would be required to "recover" the data. The data will simply be batched together and written out to the recovered server in the order it was received.

NOTE: The limits for buffering are not hard limits on the memory usage of the application, and there will be additional overhead that would be much more challenging to account for. The limits listed are just for the amount of point line protocol (including any added timestamps, if applicable). Factors such as small incoming batch sizes and a smaller max batch size will increase the overhead in the buffer. There is also the general application memory overhead to account for. This means that a machine with 2GB of memory should not have buffers that sum up to almost 2GB.

Recovery

InfluxDB organizes its data on disk into logical blocks of time called shards. We can use this to create a hot recovery process with zero downtime.

The length of time that shards represent in InfluxDB are typically 1 hour, 1 day, or 7 days, depending on the retention duration, but can be explicitly set when creating the retention policy. For the sake of our example, let's assume shard durations of 1 day.

Let's say one of the InfluxDB servers goes down for an hour on 2016-03-10. Once midnight UTC rolls over, all InfluxDB processes are now writing data to the shard for 2016-03-11 and the file(s) for 2016-03-10 have gone cold for writes. We can then restore things using these steps:

  1. Tell the load balancer to stop sending query traffic to the server that was down (this should be done as soon as an outage is detected to prevent partial or inconsistent query returns.)
  2. Create backup of 2016-03-10 shard from a server that was up the entire day
  3. Restore the backup of the shard from the good server to the server that had downtime
  4. Tell the load balancer to resume sending queries to the previously downed server

During this entire process the Relays should be sending current writes to all servers, including the one with downtime.

Sharding

It's possible to add another layer on top of this kind of setup to shard data. Depending on your needs you could shard on the measurement name or a specific tag like customer_id. The sharding layer would have to service both queries and writes.

As this relay does not handle queries, it will not implement any sharding logic. Any sharding would have to be done externally to the relay.

Caveats

While influxdb-relay does provide some level of high availability, there are a few scenarios that need to be accounted for:

  • influxdb-relay will not relay the /query endpoint, and this includes schema modification (create database, DROPs, etc). This means that databases must be created before points are written to the backends.
  • Continuous queries will still only write their results locally. If a server goes down, the continuous query will have to be backfilled after the data has been recovered for that instance.
  • Overwriting points is potentially unpredictable. For example, given servers A and B, if B is down, and point X is written (we'll call the value X1) just before B comes back online, that write is queued behind every other write that occurred while B was offline. Once B is back online, the first buffered write succeeds, and all new writes are now allowed to pass-through. At this point (before X1 is written to B), X is written again (with value X2 this time) to both A and B. When the relay reaches the end of B's buffered writes, it will write X (with value X1) to B... At this point A now has X2, but B has X1.
    • It is probably best to avoid re-writing points (if possible). Otherwise, please be aware that overwriting the same field for a given point can lead to data differences.
    • This could potentially be mitigated by waiting for the buffer to flush before opening writes back up to being passed-through.

Building

The recommended method for building influxdb-relay is to use Docker and the included Dockerfile_build_ubuntu64 Dockerfile, which includes all of the necessary dependencies.

To build the docker image, you can run:

docker build -f Dockerfile_build_ubuntu64 -t influxdb-relay-builder:latest .

And then to build the project:

docker run --rm -v $(pwd):/root/go/src/github.com/influxdata/influxdb-relay influxdb-relay-builder

NOTE By default, builds will be for AMD64 Linux (since the container is running AMD64 Linux), but to change the target platform or architecture, use the --platform and --arch CLI options.

Which should immediately call the included build.py build script, and leave any build output in the ./build directory. To see a list of available build commands, append a --help to the command above.

docker run -v $(pwd):/root/go/src/github.com/influxdata/influxdb-relay influxdb-relay-builder --help

Packages

To build system packages for Linux (deb, rpm, etc), use the --package option:

docker run -v $(pwd):/root/go/src/github.com/influxdata/influxdb-relay influxdb-relay-builder --package

To build packages for other platforms or architectures, use the --platform and --arch options. For example, to build an amd64 package for Mac OS X, use the options --package --platform darwin.

More Repositories

1

influxdb

Scalable datastore for metrics, events, and real-time analytics
Rust
28,401
star
2

telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
Go
14,568
star
3

kapacitor

Open source framework for processing, monitoring, and alerting on time series data
Go
2,310
star
4

influxdb-python

Python client for InfluxDB
Python
1,689
star
5

chronograf

Open source monitoring and visualization UI for the TICK stack
TypeScript
1,480
star
6

influxdb-java

Java client for InfluxDB
Java
1,178
star
7

flux

Flux is a lightweight scripting language for querying databases (like InfluxDB) and working with data. It's part of InfluxDB 1.7 and 2.0, but can be run independently of those.
FLUX
767
star
8

influxdb-client-python

InfluxDB 2.0 python client
Python
709
star
9

influxdb-client-go

InfluxDB 2 Go Client
Go
599
star
10

go-syslog

Blazing fast syslog parser
Go
478
star
11

sandbox

A sandbox for the full TICK stack
Shell
475
star
12

influxdb-client-java

InfluxDB 2 JVM Based Clients
Java
433
star
13

influxdb-php

influxdb-php: A PHP Client for InfluxDB, a time series database
PHP
431
star
14

influxdb-client-csharp

InfluxDB 2.x C# Client
C#
357
star
15

community-templates

InfluxDB Community Templates: Quickly collect & analyze time series data from a range of sources: Kubernetes, MySQL, Postgres, AWS, Nginx, Jenkins, and more.
Python
350
star
16

influxdb-client-js

InfluxDB 2.0 JavaScript client
TypeScript
326
star
17

influxdata-docker

Official docker images for the influxdata stack
Shell
314
star
18

influxdb-comparisons

Code for comparison write ups of InfluxDB and other solutions
Go
306
star
19

rskafka

A minimal Rust client for Apache Kafka
Rust
292
star
20

docs.influxdata.com-ARCHIVE

ARCHIVE - 1.x docs for InfluxData
Less
252
star
21

helm-charts

Official Helm Chart Repository for InfluxData Applications
Mustache
226
star
22

influxdb-rails

Ruby on Rails bindings to automatically write metrics into InfluxDB
Ruby
212
star
23

influxdb-csharp

A .NET library for efficiently sending points to InfluxDB 1.x
C#
198
star
24

influxdb1-client

The old clientv2 for InfluxDB 1.x
Go
190
star
25

giraffe

A foundation for visualizations in the InfluxDB UI
TypeScript
183
star
26

influxql

Package influxql implements a parser for the InfluxDB query language.
Go
168
star
27

influxdb-client-php

InfluxDB (v2+) Client Library for PHP
PHP
149
star
28

tdigest

An implementation of Ted Dunning's t-digest in Go.
Go
133
star
29

influx-stress

New tool for generating artificial load on InfluxDB
Go
118
star
30

ui

UI for InfluxDB
TypeScript
93
star
31

tick-charts

A repository for Helm Charts for the full TICK Stack
Smarty
90
star
32

pbjson

Auto-generate serde implementations for prost types
Rust
89
star
33

telegraf-operator

telegraf-operator helps monitor application on Kubernetes with Telegraf
Go
80
star
34

inch

An InfluxDB benchmarking tool.
Go
78
star
35

influxdata-operator

A k8s operator for InfluxDB
Go
76
star
36

docs-v2

InfluxData Documentation that covers InfluxDB Cloud, InfluxDB OSS 2.x, InfluxDB OSS 1.x, InfluxDB Enterprise, Telegraf, Chronograf, Kapacitor, and Flux.
SCSS
72
star
37

wirey

Manage local wireguard interfaces in a distributed system
Go
66
star
38

influx-cli

CLI for managing resources in InfluxDB v2
Go
63
star
39

influxdb-go

61
star
40

terraform-aws-influx

Reusable infrastructure modules for running TICK stack on AWS
HCL
51
star
41

influxdb2-sample-data

Sample data for InfluxDB 2.0
JavaScript
46
star
42

influxdb-observability

Go
46
star
43

influxdb-client-ruby

InfluxDB 2.0 Ruby Client
Ruby
45
star
44

clockface

UI Kit for building Chronograf
TypeScript
44
star
45

grade

Track Go benchmark performance over time by storing results in InfluxDB
Go
43
star
46

influxdb-r

R library for InfluxDB
R
43
star
47

nginx-influxdb-module

C
39
star
48

nifi-influxdb-bundle

InfluxDB Processors For Apache NiFi
Java
36
star
49

line-protocol

Go
36
star
50

tensorflow-influxdb

Jupyter Notebook
34
star
51

iot-center-flutter

InlfuxDB 2.0 dart client flutter demo
Dart
34
star
52

whisper-migrator

A tool for migrating data from Graphite Whisper files to InfluxDB TSM files (version 0.10.0).
Go
33
star
53

flightsql-dbapi

DB API 2 interface for Flight SQL with SQLAlchemy extras.
Python
32
star
54

kube-influxdb

Configuration to monitor Kubernetes with the TICK stack
Shell
31
star
55

k8s-kapacitor-autoscale

Demonstration of using Kapacitor to autoscale a k8s deployment
Go
30
star
56

terraform-aws-influxdb

Deploys InfluxDB Enterprise to AWS
HCL
29
star
57

catslack

Shell -> Slack the easy way
Go
28
star
58

flux-lsp

Implementation of Language Server Protocol for the flux language
Rust
27
star
59

influxdb-operator

The Kubernetes operator for InfluxDB and the TICK stack.
Go
27
star
60

influxdb3_core

InfluxData's core functionality for InfluxDB Edge and IOx
Rust
26
star
61

influxdb-client-swift

InfluxDB (v2+) Client Library for Swift
Swift
26
star
62

influxdb-client-dart

InfluxDB (v2+) Client Library for Dart and Flutter
Dart
25
star
63

kapacitor-course

25
star
64

influxdb-c

C
25
star
65

vsflux

Flux language extension for VSCode
TypeScript
25
star
66

grafana-flightsql-datasource

Grafana plugin for Flight SQL APIs.
TypeScript
25
star
67

ansible-chrony

A role to manage chrony on Linux systems
Ruby
24
star
68

influxdb-scala

Scala client for InfluxDB
Scala
22
star
69

cron

A fast, zero-allocation cron parser in ragel and golang
Go
21
star
70

influxdb-plugin-fluent

A buffered output plugin for Fluentd and InfluxDB 2
Ruby
21
star
71

terraform-google-influx

Reusable infrastructure modules for running TICK stack on GCP
Shell
20
star
72

iot-api-python

Python
18
star
73

openapi

An OpenAPI specification for influx (cloud/oss) apis.
Shell
17
star
74

influxdb-university

InfluxDB University
Python
16
star
75

influxdb-client-r

InfluxDB (v2+) Client R Package
R
14
star
76

kafka-connect-influxdb

InfluxDB 2 Connector for Kafka
Scala
13
star
77

cd-gitops-reference-architecture

Details of the CD/GitOps architecture in use at InfluxData
Shell
13
star
78

iot-api-ui

Common React UI for iot-api-<js, python, etc.> example apps designed for InfluxDB client library tutorials.
TypeScript
13
star
79

oats

An OpenAPI to TypeScript generator.
TypeScript
12
star
80

awesome

SCSS
12
star
81

windows-packager

Create a windows installer
Shell
12
star
82

influxdb-gds-connector

Google Data Studio Connector for InfluxDB.
JavaScript
11
star
83

promql

Go
11
star
84

object_store_rs

Rust
10
star
85

yarpc

Yet Another RPC for Go
Go
10
star
86

ansible-influxdb-enterprise

Ansible role for deploying InfluxDB Enterprise.
10
star
87

influxdb-sample-data

Sample time series data used to test InfluxDB
9
star
88

ingen

ingen is a tool for directly generating TSM data
Go
9
star
89

parquet-bloom-filter-analysis

Generate Parquet Files
Rust
8
star
90

ansible-kapacitor

Official Kapacitor Ansible Role for Linux
Jinja
7
star
91

wlog

Simple log level based Go logger.
Go
7
star
92

iot-api-js

An example IoT app built with NextJS (NodeJS + React) and the InfluxDB API client library for Javascript.
JavaScript
7
star
93

influxdb-iox-client-go

InfluxDB/IOx Client for Go
Go
7
star
94

influxdb-templates

This repo is a collection of dashboard templates used in the InfluxDB UI.
JavaScript
7
star
95

k8s-jsonnet-libs

Jsonnet Libs repo - mostly generated with jsonnet-libs/k8s project
Jsonnet
7
star
96

google-deployment-manager-influxdb-enterprise

GCP Deployment Manager templates for InfluxDB Enterprise.
HTML
6
star
97

jaeger-influxdb

Go
6
star
98

influxdb-action

A GitHub action for setting up and configuring InfluxDB and the InfluxDB Cloud CLI
Shell
6
star
99

influxdb-fsharp

A F# client library for InfluxDB, a time series database http://influxdb.com
F#
6
star
100

qprof

A tool for profiling the performance of InfluxQL queries
Go
6
star