• This repository has been archived on 18/Feb/2021
  • Stars
    star
    101
  • Rank 327,471 (Top 7 %)
  • Language
    C
  • License
    Other
  • Created almost 10 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A consistent-hashing relay for statsd and carbon metrics

Statsrelay

(This project is deprecated and not maintained.)

Statsrelay is a consistent-hashing relay for statsd and carbon metrics.

Build Status Coverity Status Mailing List

License

MIT License Copyright (c) 2016 Uber Technologies, Inc.

Build

Dependencies:

  • automake
  • pkg-config
  • libev (>= 4.11)
  • libyaml

On Debian/Ubuntu:

apt-get install automake pkg-config libev-dev libyaml-devel

./autogen.sh
./configure
make clean
make
make check
make install

Use

Usage: statsrelay [options]
  -h, --help                   Display this message
  -v, --verbose                Write log messages to stderr in addition to syslog
                               syslog
  -l, --log-level              Set the logging level to DEBUG, INFO, WARN, or ERROR
                               (default: INFO)
  -c, --config=filename        Use the given hashring config file
                               (default: /etc/statsrelay.yaml)
  -t, --check-config=filename  Check the config syntax
                               (default: /etc/statsrelay.yaml)
  --version                    Print the version
statsrelay --config=/path/to/statsrelay.yaml

This process will run in the foreground. If you need to daemonize, use start-stop-script, daemontools, supervisord, upstart, systemd, or your preferred service watchdog.

By default statsrelay binds to 127.0.0.1:8125 for statsd proxying, and it binds to 127.0.0.1:2003 for carbon proxying.

For each line that statsrelay receives in the statsd format "statname.foo.bar:1|c\n", the key will be hashed to determine which backend server the stat will be relayed to. If no connection to that backend is open, the line is queued and a connection attempt is started. Once a connection is established, all queued metrics are relayed to the backend and the queue is emptied. If the backend connection fails, the queue persists in memory and the connection will be retried after one second. Any stats received for that backend during the retry window are added to the queue.

Each backend has its own send queue. If a send queue reaches max-send-queue bytes (default: 128MB) in size, new incoming stats are dropped until a backend connection is successful and the queue begins to drain.

All log messages are sent to syslog with the INFO priority.

Upon SIGHUP, the config file will be reloaded and all backend connections closed. Note that any stats in the send queue at the time of SIGHUP will be dropped.

If SIGINT or SIGTERM are caught, all connections are killed, send queues are dropped, and memory freed. statsrelay exits with return code 0 if all went well.

To retrieve server statistics, connect to TCP port 8125 and send the string "status" followed by a newline '\n' character. The end of the status output is denoted by two consecutive newlines "\n\n"

stats example:

$ echo status | nc localhost 8125

global bytes_recv_udp gauge 0
global bytes_recv_tcp gauge 41
global total_connections gauge 1
global last_reload timestamp 0
global malformed_lines gauge 0
backend:127.0.0.2:8127:tcp bytes_queued gauge 27
backend:127.0.0.2:8127:tcp bytes_sent gauge 27
backend:127.0.0.2:8127:tcp relayed_lines gauge 3
backend:127.0.0.2:8127:tcp dropped_lines gauge 0

Config Options

There are a few options you can use to control the behavior of statsrelay, which can be set in /etc/statsrelay.yaml. Here is a minimal config:

carbon:
  bind: 127.0.0.1:9085
  tcp_cork: true
  validate: true
  shard_map:
    0: 10.10.10.10:9085

Besides the bind and shard_map settings (explained elsewhere), here is what the boolean options do:

  • tcp_cork enabled the TCP_CORK option on TCP sockets; it's enabled by default and in many cases will significantly decrease the number of small TCP sockets that statsrelay emits (at a small penalty of up to 200ms latency in some cases).
  • validate tries to validate incoming data before forwarding it to statsd or carbon; it's on by default

Scaling With Virtual Shards

Statsrelay implements a virtual sharding scheme, which allows you to easily scale your statsd and carbon backends by reassigning virtual shards to actual statsd/carbon instance or servers. This technique also applies to alternative statsd implementations like statsite.

Consider the following simplified example with this config file:

statsd:
  bind: 127.0.0.1:8125
  validate: true
  shard_map:
    0: 10.0.0.1:9000
    1: 10.0.0.1:9000
    2: 10.0.0.1:9001
    3: 10.0.0.1:9001
    4: 10.0.0.2:9000
    5: 10.0.0.2:9000
    6: 10.0.0.2:9001
    7: 10.0.0.2:9001
carbon:
  ...

In this file we've defined two actual backend hosts (10.0.0.1 and 10.0.0.2). Each of these hosts is running two statsd instances, one on port 9000 and one on port 9001 (this is a good way to scale statsd, since statsd and alternative implementations like statsite are typically single threaded). In a real setup, you'd likely be running more statsd instances on each server, and you'd likely have more repeated lines to assign more virtual shards to each statsd instance. At Uber we use 4096 virtual shards, with a much smaller number of actual backend instances.

Internally statsrelay assigns a zero-indexed virtual shard to each line in the file; so 10.0.0.1:9000 has virtual shards 0 and 1, 10.0.0.1:9001 has virtual shards 2 and 3, and so on.

Let's say that the backend server 10.0.0.1 has become overloaded, and we want to add a new server to the configuration. We might do that like this:

statsd:
  bind: 127.0.0.1:8125
  validate: true
  shard_map:
    0: 10.0.0.1:9000
    1: 10.0.0.3:9000
    2: 10.0.0.1:9001
    3: 10.0.0.3:9001
    4: 10.0.0.2:9000
    5: 10.0.0.2:9000
    6: 10.0.0.2:9001
    7: 10.0.0.2:9001
carbon:
  ...

In the new configuration we've moved one of the two virtual shards for 10.0.0.1:9000 to 10.0.0.3:9000, and we've moved one of the two virtual shards for 10.0.0.1:9001 to 10.0.0.3:9001. In other words, we've reassigned the mapping for virtual shard 1 and virtual shard 3. Note that when you do this, you want to maintain the same number of virtual shards always, so you probably want to pick a large number of virtual shards to start (say, 1024 virtual shards, meaning the configuration file should have 1024 lines). You should have many duplicated lines in the config file when you do this.

To do optimal shard assignment, you'll want to write a program that looks at the CPU usage of your shards and figures out the optimal distribution of shards. How you do that is up to you, but a good technique is to start by generating a statsrelay config that has many virtual shards evenly assigned, and then periodically have a script that finds which actual backends are overloaded and reassigns some of the virtual shards on those hosts to less loaded hosts (or to new hosts).

If you don't initially assign enough virtual shards and then later expand to more, everything will work, but data migration for carbon will be a bit trickier; see below.

A Note On Carbon Scaling

Statsrelay can do relaying for carbon lines just like statsd. The strategy for scaling carbon using virtual shards is exactly the same. One important difference, however, is that when you move a carbon shard you'll want to move the associated whisper files as well. You can do this using the stathasher binary that is built by statsrelay. By pointing that command at your statsrelay config, you can send it key names on stdin and have the virtual shard ids printed to stdout.

Using this technique you can script the reassignment of whisper files. The general idea is to walk the filesystem and gather all of the unique keys stored in carbon backends on a host. You can then get an idea for how expensive each virtual shard is based on the storage space, number of whisper files, and possibly I/O metrics for each virtual shard. By gathering the weights for each virtual shard on a host, you can figure out the optimal way to redistribute the mapping of virtual shards to actual carbon backends.

Note that when you move carbon instances, you also probably want to migrate the whisper files as well. This ensures that you retain historical data, and that graphite will get the right answer if it queries multiple carbon backends. You can migrate the whisper files by rsyncing the files you've identified as belonging to a moved virtual shard using the stathasher binary described above. Remember to take care to ensure that the old whisper files are deleted on the old host.

More Repositories

1

go-torch

Stochastic flame graph profiler for Go programs
Go
3,958
star
2

pyflame

πŸ”₯ Pyflame: A Ptracing Profiler For Python. This project is deprecated and not maintained.
C++
2,974
star
3

image-diff

Create image differential between two images
JavaScript
2,453
star
4

makisu

Fast and flexible Docker image building tool, works in unprivileged containerized environments like Mesos and Kubernetes.
Go
2,407
star
5

cpustat

high frequency performance measurements for Linux. This project is deprecated and not maintained.
Go
1,664
star
6

cherami-server

Distributed, scalable, durable, and highly available message queue system. This project is deprecated and not maintained.
Go
1,417
star
7

AthenaX

SQL-based streaming analytics platform at scale
Java
1,224
star
8

plato-research-dialogue-system

This is the Plato Research Dialogue System, a flexible platform for developing conversational AI agents.
Python
975
star
9

npm-shrinkwrap

A consistent shrinkwrap tool
JavaScript
775
star
10

chaperone

A Kafka audit system
Java
640
star
11

coding-challenge-tools

Uber's tools team coding challenge
562
star
12

hyperbahn

Service discovery and routing for large scale microservice operations
JavaScript
394
star
13

sql-differential-privacy

Dataflow analysis & differential privacy for SQL queries. This project is deprecated and not maintained.
Scala
391
star
14

phabricator-jenkins-plugin

Jenkins plugin to integrate with Phabricator, Harbormaster, and Uberalls
Java
367
star
15

ohana-ios

Contacts simplified. This project is deprecated and not maintained.
Objective-C
362
star
16

rave

A data model validation framework that uses java annotation processing.
Java
352
star
17

jetstream-ios

An elegant model framework written in Swift
Swift
333
star
18

node-stap

Tools for analyzing Node.js programs with SystemTap. This project is deprecated and not maintained.
JavaScript
291
star
19

r-dom

React DOM wrapper
JavaScript
263
star
20

focuson

A tool to surface security issues in python code
Python
228
star
21

cherami-client-go

Go Client Implementation of Cherami - A distributed, scalable, durable, and highly available message queue system. This project is deprecated and not maintained.
Go
207
star
22

viewport-mercator-project

NOTE: The viewport-mercator-project repo is archived and code has moved to
JavaScript
137
star
23

infer-plugin

Gradle plugin that allows easy integration with the infer static analyzer.
Groovy
126
star
24

express-statsd

Statsd route monitoring middleware for connect/express
JavaScript
126
star
25

android-build-environment

Docker repository for android build environment
122
star
26

in-n-out

A library to perform point-in-geofence searches.
JavaScript
106
star
27

buck-http-cache

An Implementation of Buck's HTTP Cache API as a distributed cache service. This project is deprecated and not maintained.
Shell
101
star
28

hacheck

HAproxy healthcheck proxying service
Python
86
star
29

potter

a CLI to create node.js services
JavaScript
83
star
30

opentracing-go

A general-purpose instrumentation API for distributed tracing systems
Go
82
star
31

idl

A CLI for managing Thrift IDL files
JavaScript
78
star
32

jetstream

Jetstream Sync server framework
JavaScript
73
star
33

canduit

Node.js Phabricator Conduit API client. This project is deprecated and not maintained.
JavaScript
65
star
34

kafka-spraynozzle

A nozzle to spray a kafka topic at an HTTP endpoint. This project is deprecated and not maintained.
Java
49
star
35

usb2fac

Enabling 2fac confirmation for newly connected USB devices
Python
44
star
36

nanny

Cluster management for Node processes
JavaScript
40
star
37

auto-value-bundle

Extends Autovalue to extract data from a bundle into a value object.
Java
36
star
38

node-flame

Tools for analyzing Node.js programs with ptrace. This project is deprecated and not maintained.
JavaScript
29
star
39

Bug-Bounty-Page

A repo to make our changes more transparent to bug bounty researchers in our program (so they can see commits, etc).
29
star
40

paranoid-request

An SSRF-preventing wrapper around Node's request module
JavaScript
26
star
41

lint-trap

JavaScript linter module for Uber projects
JavaScript
26
star
42

thriftify

JavaScript implementation of Thrift encoding and decoding
JavaScript
25
star
43

HackerOneAlchemy

A tool to generate statistics and help manage bug bounty reports in HackerOne.
Python
23
star
44

express-translate

Add simple translation support to Express
JavaScript
21
star
45

cherami-thrift

Thrift APIs for Cherami - A distributed, scalable, durable, and highly available message queue system. This project is deprecated and not maintained.
Go
20
star
46

h1-python

A HackerOne API client for Python
Python
19
star
47

cidrtrie

Trie implementation of a CIDR lookup table
Python
19
star
48

ios-template

This template provides a starting point for open source iOS projects at Uber.
Ruby
18
star
49

tcheck

TChannel health check utility
Go
17
star
50

job_progress

Store the progress of a job
Python
16
star
51

java-code-styles

IntelliJ IDEA code style settings for Uber's Java and Android projects.
15
star
52

fixed-server

Server for HTTP fixtures
JavaScript
14
star
53

vis-academy

A set of tutorials on how our frameworks make effective data visualization applications.
JavaScript
13
star
54

shared-docs

Shared Markdown Documents from Uber Engineering
12
star
55

typed-request-stack

Middleware stack runner for typed HTTP requests
JavaScript
11
star
56

cherami-client-python

Python Client for Cherami - A distributed, scalable, durable, and highly available message queue system. This project is deprecated and not maintained.
Python
11
star
57

failpointsjs

JavaScript
10
star
58

vagrant-aws

Use Vagrant to manage your EC2 and VPC instances.
Ruby
10
star
59

instafork

JavaScript
8
star
60

py-find-unicode

Find incorrect unicode() invocations
Python
8
star
61

shallow-settings

Shallow inheritance-based settings for your application
JavaScript
7
star
62

clusto-query

Silly CLI for querying clusto more quickly
Python
7
star
63

gg

Go dependency debugger
Go
7
star
64

connect-csrf-lite

CSRF validation middleware for Connect/Express
JavaScript
7
star
65

javax-extras

(DEPRECATED) Extra utilities for javax
Java
6
star
66

fixtures-fs

Create a temporary fs with JSON fixtures
JavaScript
6
star
67

redis-delete-pattern

Delete a set of keys from a pattern in Redis
6
star
68

opentracing-python

NOTE: This repository has been retired. The latest OpenTracing APIs can be found in the official repository.
Python
5
star
69

tchannel-gen

Scaffolding for new TChannel w/ Hyperbahn applications
JavaScript
5
star
70

node-dot-arcanist

Uber's .arcanist folder as an npm module
PHP
5
star
71

cherami-client-java

Java Client for Cherami. This project is deprecated and not maintained.
Java
5
star
72

pyrehol

Python wrapper for Firehol
Python
4
star
73

dubstep

This repo is DEPRECATED. See https://github.com/dubstepjs/core
JavaScript
4
star
74

mattermost-webapp

Webapp of Mattermost server: https://github.com/mattermost/mattermost-server
JavaScript
4
star
75

ottr

Easy, robust end-to-end UI tests for web apps
JavaScript
3
star
76

clouseau

A Node.js performance profiler by Uber
JavaScript
3
star
77

fusion-orchestrate

Tools and scripts for working across multiple fusion repos at once
JavaScript
2
star
78

vertica-aesgcm-udx

C++
2
star
79

stacked

Go
2
star
80

request-redis-cache

Make requests and cache them in Redis
JavaScript
2
star
81

nodesol-write

Kafka producer.
JavaScript
2
star
82

uLeak

DEPRECATED: This is continued in https://github.com/behroozkhorashadi/uLeak
Java
2
star
83

request-mocha

Request utilities for Mocha
JavaScript
2
star
84

UberBuilder

Make building flexible, immutable objects a simple task
Objective-C
2
star
85

uberclass-clouseau

A subclass of uberclass that adds profiling support
JavaScript
1
star
86

deck.gl-data-osm

OSM data for the data visualization library deck.gl examples (https://uber.github.io/deck.gl/#/)
1
star
87

backbone-api-client

Backbone mixin built for interacting with API clients
JavaScript
1
star
88

fusion-release

Releases and verifies FusionJS packages
JavaScript
1
star
89

cache-redis

An ES6 Map-like cache with redis backing
JavaScript
1
star
90

redis-broadcast

Write redis commands to a set of redises efficiently
JavaScript
1
star