• This repository has been archived on 14/Nov/2019
  • Stars
    star
    1,185
  • Rank 37,834 (Top 0.8 %)
  • Language
    C
  • License
    MIT License
  • Created almost 9 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Statsd-compatible metrics aggregator

Brubeck (unmaintained)

Brubeck is a statsd-compatible stats aggregator written in C. Brubeck is currently unmaintained.

List of known maintained forks

What is statsd?

Statsd is a metrics aggregator for Graphite (and other data storage backends). This technical documentation assumes working knowledge of what statsd is and how it works; please read the statsd documentation for more details.

Statsd is a good idea, and if you're using Graphite for metrics collection in your infrastructure, you probably want a statsd-compatible aggregator in front of it.

Tradeoffs

  • Brubeck is missing many of the features of the original StatsD. We've only implemented what we felt was necessary for our metrics stack.

  • Brubeck only runs on Linux. It won't even build on Mac OS X.

  • Some of the performance features require a (moderately) recent version of the kernel that you may not have.

Building

Brubeck has the following dependencies:

  • A Turing-complete computing device running a modern version of the Linux kernel (the kernel needs to be at least 2.6.33 in order to use multiple recvmsg support)

  • A compiler for the C programming language

  • Jansson (libjansson-dev on Debian) to load the configuration (version 2.5+ is required)

  • OpenSSL (libcrypto) if you're building StatsD-Secure support

  • libmicrohttpd (libmicrohttpd-dev) to have an internal HTTP stats endpoint. Build with BRUBECK_NO_HTTP to disable this.

Build brubeck by typing:

./script/bootstrap

Other operating systems or kernels can probably build Brubeck too. More specifically, Brubeck has been seen to work under FreeBSD and OpenBSD, but this is not supported.

Supported Metric Types

Brubeck supports most of the metric types from statsd and many other implementations.

  • g - Gauges
  • c - Meters
  • C - Counters
  • h - Histograms
  • ms - Timers (in milliseconds)

Client-sent sampling rates are ignored.

Visit the statsd docs for more information on metric types.

Interfacing

The are several ways to interact with a running Brubeck daemon.

Signals

Brubeck answers to the following signals:

  • SIGINT, SIGTERM: shutdown cleanly
  • SIGHUP: reopen the log files (in case you're using logrotate or an equivalent)
  • SIGUSR2: dump a newline-separated list of all the metrics currently aggregated by the daemon and their types.

HTTP Endpoint

If enabled on the config file, Brubeck can provide an HTTP API to poll its status. The following routes are available:

  • GET /ping: return a short JSON payload with the current status of the daemon (just to check it's up)
  • GET /stats: get a large JSON payload with full statistics, including active endpoints and throughputs
  • GET /metric/{{metric_name}}: get the current status of a metric, if it's being aggregated
  • POST /expire/{{metric_name}}: expire a metric that is no longer being reported to stop it from being aggregated to the backend

Configuration

The configuration for Brubeck is loaded through a JSON file, passed on the commandline.

./brubeck --config=my.config.json

If no configuration file is passed to the daemon, it will load config.default.json, which contains useful defaults for local development/testing.

The JSON file can contain the following sections:

  • server_name: a string identifying the name for this specific Brubeck instance. This will be used by the daemon when reporting its internal metrics.

  • dumpfile: a path where to store the metrics list when triggering a dump (see the section on Interfacing with the daemon)

  • http: if existing, this string sets the listen address and port for the HTTP API

  • backends: an array of the different backends to load. If more than one backend is loaded, brubeck will function in sharding mode, distributing aggregation load evenly through all the different backends through constant-hashing.

    • carbon: a backend that aggregates data into a Carbon cache. The backend sends all the aggregated data once every frequency seconds. By default the data is sent to the port 2003 of the Carbon cache (plain text protocol), but the pickle wire protocol can be enabled by setting pickle to true and changing the port accordingly.

      {
        "type" : "carbon",
        "address" : "0.0.0.0",
        "port" : 2003,
        "frequency" : 10,
        "pickle: true
      }
      

      We strongly encourage you to use the pickle wire protocol instead of plaintext, because carbon-relay.py is not very performant and will choke when parsing plaintext under enough load. Pickles are much softer CPU-wise on the Carbon relays, aggregators and caches.

      Hmmmm pickles. Now I'm hungry. Lincoln when's lunch?

  • samplers: an array of the different samplers to load. Samplers run on parallel and gather incoming metrics from the network.

    • statsd: the default statsd-compatible sampler. It listens on an UDP port for metrics packets. You can have more than one statsd sampler on the same daemon, but Brubeck was designed to support a single sampler taking the full metrics load on a single port.

      {
        "type" : "statsd",
        "address" : "0.0.0.0",
        "port" : 8126,
      }
      

      The StatsD sampler has the following options (and default values) for performance tuning:

      • "workers" : 4 number of worker threads that will service the StatsD socket endpoint. More threads means emptying the socket faster, but the context switching and cache smashing will affect performance. In general, you can saturate your NIC as long as you have enough worker threads (one per core) and a fast enough CPU. Set this to 1 if you want to run the daemon in event-loop mode. But that'd be silly. This is not Node.

      • "multisock" : false if set to true, Brubeck will use the SO_REUSEPORT flag available since Linux 3.9 to create one socket per worker thread and bind it to the same address/port. The kernel will then round-robin between the threads without forcing them to race for the socket. This improves performance by up to 30%, try benchmarking this if your Kernel is recent enough.

      • "multimsg" : 1 if set to greater than one, Brubeck will use the recvmmsg syscall (available since Linux 2.6.33) to read several UDP packets (the specified amount) in a single call and reduce the amount of context switches. This doesn't improve performance much with several worker threads, but may have an effect in a limited configuration with only one thread. Make it a power of two for better results. As always, benchmark. YMMV.

    • statsd-secure: like StatsD, but each packet has a HMAC that verifies its integrity. This is hella useful if you're running infrastructure in The Cloud (TM) (C) and you want to send back packets back to your VPN without them being tampered by third parties.

      {
        "type" : "statsd-secure",
        "address" : "0.0.0.0",
        "port" : 9126,
        "max_drift" : 3,
        "hmac_key" : "750c783e6ab0b503eaa86e310a5db738",
        "replay_len" : 8000
      }
      

      The address and port parts are obviously the same as in statsd.

      • max_drift defines the maximum time (in seconds) that packets can be delayed since they were sent from the origin. All metrics come with a timestamp, so metrics that drift more than this value will silently be discared.

      • hmac_key is the shared HMAC secret. The client sending the metrics must also know this in order to sign them.

      • replay_len is the size of the bloom filter that will be used to prevent replay attacks. We use a rolling bloom filter (one for every drift second), so replay_len should roughly be the amount of unique metrics you expect to receive in a 1s interval.

      NOTE: StatsD-secure doesn't run with multiple worker threads because verifying signatures is already slow enough. Don't use this in performance critical scenarios.

      NOTE: StatsD-secure uses a bloom filter to prevent replay attacks, so a small percentage of metrics will be dropped because of false positives. Take this into consideration.

      NOTE: An HMAC does not encrypt the packets, it just verifies its integrity. If you need to protect the content of the packets from eavesdropping, get those external machines in your VPN.

      NOTE: StatsD-secure may or may not be a good idea. If you have the chance to send all your metrics inside a VPN, I suggest you do that instead.

Testing

There's some tests in the test folder for key parts of the system (such as packet parsing, and all concurrent data access); besides that we test the behavior of the daemon live on staging and production systems.

  • Small changes are deployed into production as-is, straight from their feature branch. Deployment happens in 3 seconds for all the Brubeck instances in our infrastructure, so we can roll back into the master branch immediately if something fails.

  • For critical changes, we multiplex a copy of the metrics stream into an Unix domain socket, so we can have two instances of the daemon (old and new) aggregating to the production cluster and a staging cluster, and verify that the metrics flow into the two clusters is equivalent.

  • Benchmarking is performed on real hardware in our datacenter. The daemon is spammed with fake metrics across the network and we ensure that there are no regressions (particularly in the linear scaling between cores for the statsd sampler).

When in doubt, please refer to the part of the MIT license that says "THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED". We use Brubeck in production and have been doing so for years, but we cannot make any promises regarding availability or performance.

FAQ

  • I cannot hit 4 million UDP metrics per second. I want my money back.

Make sure receiver-side scaling is properly configured in your kernel and that IRQs are being serviced by different cores, and that the daemon's threads are not pinned to a specific core. Make sure you're running the daemon in a physical machine and not a cheap cloud VPS. Make sure your NIC has the right drivers and it's not bottlenecking. Install a newer kernel and try running with SO_REUSEPORT.

If nothing works, refunds are available upon request. Just get mad at me on Twitter.

More Repositories

1

gitignore

A collection of useful .gitignore templates
156,154
star
2

copilot-docs

Documentation for GitHub Copilot
23,177
star
3

docs

The open-source repo for docs.github.com
JavaScript
14,053
star
4

opensource.guide

πŸ“š Community guides for open source creators
HTML
12,947
star
5

gh-ost

GitHub's Online Schema-migration Tool for MySQL
Go
11,302
star
6

linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
Ruby
10,684
star
7

semantic

Parsing, analyzing, and comparing source code across many languages
Haskell
8,827
star
8

copilot.vim

Neovim plugin for GitHub Copilot
Vim Script
7,500
star
9

roadmap

GitHub public roadmap
7,393
star
10

scientist

πŸ”¬ A Ruby library for carefully refactoring critical paths.
Ruby
7,295
star
11

personal-website

Code that'll help you kickstart a personal website that showcases your work as a software developer.
HTML
7,243
star
12

codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
CodeQL
7,092
star
13

markup

Determines which markup library to use to render a content file (e.g. README) on GitHub
Ruby
5,678
star
14

dmca

Repository with text of DMCA takedown notices as received. GitHub does not endorse or adopt any assertion contained in the following notices. Users identified in the notices are presumed innocent until proven guilty. Additional information about our DMCA policy can be found at
DIGITAL Command Language
5,312
star
15

swift-style-guide

**Archived** Style guide & coding conventions for Swift projects
4,770
star
16

gemoji

Emoji images and names.
Ruby
4,280
star
17

training-kit

Open source courseware for Git and GitHub
HTML
4,125
star
18

explore

Community-curated topic and collection pages on GitHub
Ruby
3,840
star
19

hubot-scripts

DEPRECATED, see https://github.com/github/hubot-scripts/issues/1113 for details - optional scripts for hubot, opt in via hubot-scripts.json
CoffeeScript
3,538
star
20

mona-sans

Mona Sans, a variable font from GitHub
3,379
star
21

choosealicense.com

A site to provide non-judgmental guidance on choosing a license for your open source project
Ruby
3,379
star
22

git-sizer

Compute various size metrics for a Git repository, flagging those that might cause problems
Go
3,160
star
23

secure_headers

Manages application of security headers with many safe defaults
Ruby
3,104
star
24

gov-takedowns

Text of government takedown notices as received. GitHub does not endorse or adopt any assertion contained in the following notices.
3,033
star
25

archive-program

The GitHub Archive Program & Arctic Code Vault
2,997
star
26

scripts-to-rule-them-all

Set of boilerplate scripts describing the normalized script pattern that GitHub uses in its projects.
Shell
2,859
star
27

hotkey

Trigger an action on an element with a keyboard shortcut.
JavaScript
2,851
star
28

relative-time-element

Web component extensions to the standard <time> element.
JavaScript
2,799
star
29

janky

Continuous integration server built on top of Jenkins and Hubot
Ruby
2,757
star
30

github-elements

GitHub's Web Component collection.
JavaScript
2,523
star
31

renaming

Guidance for changing the default branch name for GitHub repositories
2,383
star
32

view_component

A framework for building reusable, testable & encapsulated view components in Ruby on Rails.
Ruby
2,370
star
33

VisualStudio

GitHub Extension for Visual Studio
C#
2,349
star
34

glb-director

GitHub Load Balancer Director and supporting tooling.
C
2,255
star
35

SoftU2F

Software U2F authenticator for macOS
Swift
2,201
star
36

accessibilityjs

Client side accessibility error scanner.
JavaScript
2,180
star
37

balanced-employee-ip-agreement

GitHub's employee intellectual property agreement, open sourced and reusable
2,105
star
38

CodeSearchNet

Datasets, tools, and benchmarks for representation learning of code.
Jupyter Notebook
2,078
star
39

github-services

Legacy GitHub Services Integration
Ruby
1,902
star
40

platform-samples

A public place for all platform sample projects.
Shell
1,851
star
41

pages-gem

A simple Ruby Gem to bootstrap dependencies for setting up and maintaining a local Jekyll environment in sync with GitHub Pages
Ruby
1,782
star
42

hubot-sans

Hubot Sans, a variable font from GitHub
1,754
star
43

india

GitHub resources and information for the developer community in India
Ruby
1,749
star
44

objective-c-style-guide

**Archived** Style guide & coding conventions for Objective-C projects
1,682
star
45

government.github.com

Gather, curate, and feature stories of public servants and civic hackers using GitHub as part of their open government innovations
HTML
1,670
star
46

site-policy

Collaborative development on GitHub's site policies, procedures, and guidelines
1,652
star
47

covid19-dashboard

A site that displays up to date COVID-19 stats, powered by fastpages.
Jupyter Notebook
1,644
star
48

advisory-database

Security vulnerability database inclusive of CVEs and GitHub originated security advisories from the world of open source software.
1,595
star
49

haikus-for-codespaces

EJS
1,550
star
50

lightcrawler

Crawl a website and run it through Google lighthouse
JavaScript
1,471
star
51

feedback

Public feedback discussions for: GitHub for Mobile, GitHub Discussions, GitHub Codespaces, GitHub Sponsors, GitHub Issues and more!
1,359
star
52

developer.github.com

GitHub Developer site
Ruby
1,314
star
53

rest-api-description

An OpenAPI description for GitHub's REST API
1,304
star
54

catalyst

Catalyst is a set of patterns and techniques for developing components within a complex application.
TypeScript
1,183
star
55

backup-utils

GitHub Enterprise Backup Utilities
Shell
1,167
star
56

securitylab

Resources related to GitHub Security Lab
C
1,150
star
57

opensourcefriday

🚲 Contribute to the open source community every Friday
HTML
1,143
star
58

graphql-client

A Ruby library for declaring, composing and executing GraphQL queries
Ruby
1,139
star
59

Rebel

Cocoa framework for improving AppKit
Objective-C
1,127
star
60

dev

Press the . key on any repo
1,085
star
61

codeql-action

Actions for running CodeQL analysis
TypeScript
1,015
star
62

gh-actions-importer

GitHub Actions Importer helps you plan and automate the migration of Azure DevOps, Bamboo, Bitbucket, CircleCI, GitLab, Jenkins, and Travis CI pipelines to GitHub Actions.
C#
949
star
63

licensed

A Ruby gem to cache and verify the licenses of dependencies
Ruby
942
star
64

.github

Community health files for the @GitHub organization
795
star
65

swordfish

EXPERIMENTAL password management app. Don't use this.
Ruby
740
star
66

details-dialog-element

A modal dialog that's opened with <details>.
JavaScript
739
star
67

github-ds

A collection of Ruby libraries for working with SQL on top of ActiveRecord's connection
Ruby
667
star
68

vulcanizer

GitHub's ops focused Elasticsearch library
Go
657
star
69

codeql-cli-binaries

Binaries for the CodeQL CLI
657
star
70

email_reply_parser

Small library to parse plain text email content
Ruby
646
star
71

webauthn-json

πŸ” A small WebAuthn API wrapper that translates to/from pure JSON using base64url.
TypeScript
638
star
72

stack-graphs

Rust implementation of stack graphs
Rust
626
star
73

rubocop-github

Code style checking for GitHub's Ruby projects
Ruby
616
star
74

github-ospo

Helping open source program offices get started
599
star
75

dat-science

Replaced by https://github.com/github/scientist
Ruby
582
star
76

maven-plugins

Official GitHub Maven Plugins
Java
581
star
77

details-menu-element

A menu opened with <details>.
JavaScript
554
star
78

trilogy

Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.
C
543
star
79

freno

freno: cooperative, highly available throttler service
Go
534
star
80

smimesign

An S/MIME signing utility for use with Git
Go
519
star
81

codespaces-jupyter

Explore machine learning and data science with Codespaces
Jupyter Notebook
518
star
82

gh-valet

Valet helps facilitate the migration of Azure DevOps, CircleCI, GitLab CI, Jenkins, and Travis CI pipelines to GitHub Actions.
C#
513
star
83

include-fragment-element

A client-side includes tag.
JavaScript
508
star
84

safe-settings

JavaScript
505
star
85

covid-19-repo-data

Data archive of identifiable COVID-19 related public projects on GitHub
491
star
86

Archimedes

Geometry functions for Cocoa and Cocoa Touch
Objective-C
466
star
87

codeql-go

The CodeQL extractor and libraries for Go.
462
star
88

vscode-github-actions

GitHub Actions extension for VS Code
TypeScript
443
star
89

vscode-codeql-starter

Starter workspace to use with the CodeQL extension for Visual Studio Code.
CodeQL
441
star
90

open-source-survey

The Open Source Survey
431
star
91

how-engineering-communicates

A community version of the "common API" for how the GitHub Engineering organization communicates
431
star
92

synsanity

netfilter (iptables) target for high performance lockless SYN cookies for SYN flood mitigation
C
424
star
93

brasil

Recursos e informaçáes do GitHub para a comunidade de desenvolvedores no Brasil.
Ruby
418
star
94

entitlements-app

The Ruby Gem that Powers Entitlements - GitHub's Identity and Access Management System
Ruby
393
star
95

gh-copilot

Ask for assistance right in your terminal.
383
star
96

roskomnadzor

deprecated archive β€” moved to https://github.com/github/gov-takedowns/tree/master/Russia
376
star
97

clipboard-copy-element

Copy element text content or input values to the clipboard.
JavaScript
374
star
98

MVG

MVG = Minimum Viable Governance
364
star
99

pycon2011

Python
353
star
100

vscode-codeql

An extension for Visual Studio Code that adds rich language support for CodeQL
TypeScript
349
star