• This repository has been archived on 19/Jun/2019
  • Stars
    star
    774
  • Rank 56,331 (Top 2 %)
  • Language
    Scala
  • License
    Apache License 2.0
  • Created over 13 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A stats collector & reporter for Scala servers (deprecated)

Ostrich

Build status Codecov branch Project status

Ostrich is a library for scala servers that makes it easy to:

  • load & reload per-environment configuration
  • collect runtime statistics (counters, gauges, metrics, and labels)
  • report those statistics through a simple web interface (optionally with graphs) or into log files
  • interact with the server over HTTP to check build versions or shut it down

The idea is that it should be simple and straightforward, allowing you to plug it in and get started quickly.

Status

This library is deprecated, and users should migrate to Commons Metrics. Please see TwitterServer's migration guide for details.

Building

Use sbt (simple-build-tool) to build:

$ sbt clean update package-dist

The finished jar will be in dist/.

Counters, Gauges, Metrics, and Labels

There are four kinds of statistics that ostrich captures:

  • counters

    A counter is a value that never decreases. Examples might be "widgets_sold" or "births". You just increment the counter each time a countable event happens, and graphing utilities usually graph the deltas over time. To increment a counter, use:

      Stats.incr("births")
    

    or

      Stats.incr("widgets_sold", 5)
    
  • gauges

    A gauge is a value that has a discrete value at any given moment, like "heap_used" or "current_temperature". It's usually a measurement that you only need to take when someone asks. To define a gauge, stick this code somewhere in the server initialization:

      Stats.addGauge("current_temperature") { myThermometer.temperature }
    

    A gauge method must always return a double.

  • metrics

    A metric is tracked via distribution, and is usually used for timings, like so:

      Stats.time("translation") {
        document.translate("de", "en")
      }
    

    But you can also add metrics directly:

      Stats.addMetric("query_results", results.size)
    

    Metrics are collected by tracking the count, min, max, mean (average), and a simple bucket-based histogram of the distribution. This distribution can be used to determine median, 90th percentile, etc.

  • labels

    A label is just a key/value pair of strings, usually used to report a subsystem's state, like "boiler=offline". They're set with:

      Stats.setLabel("boiler", "online")
    

    They have no real statistical value, but can be used to raise flags in logging and monitoring.

RuntimeEnvironment

If you build with standard-project https://github.com/twitter/standard-project, RuntimeEnvironment can pull build and environment info out of the build.properties file that's tucked into your jar. Typical use is to pass your server object (or any object from your jar) and any command-line arguments you haven't already parsed:

val runtime = RuntimeEnvironment(this, args)

The command-line argument parsing is optional, and supports only:

  • --version to print out the jar's build info (name, version, build)

  • -f <filename> to specify a config file manually

  • --validate to validate that your config file can be compiled

Your server object is used as the home jar of the build.properties file. Then the classpath is scanned to find that jar's home and the config files that are located nearby.

Quick Start

A good example server is created by the scala-bootstrapper project here: https://github.com/twitter/scala-bootstrapper

Define a server config class:

class MyServerConfig extends ServerConfig[MyServer] {
  var serverPort: Int = 9999

  def apply(runtime: RuntimeEnvironment) = {
    new MyServer(serverPort)
  }
}

A ServerConfig class contains things you want to configure on your server, as vars, and an apply method that turns a RuntimeEnvironment into your server. ServerConfig is actually a helper for Config that adds logging configuration, sets up the optional admin HTTP server if it was configured, and registers your service with the ServiceTracker so that it will be shutdown when the admin port receives a shutdown command.

Next, make a simple config file for development:

import com.twitter.conversions.time._
import com.twitter.logging.config._
import com.twitter.ostrich.admin.config._
import com.example.config._

new MyServerConfig {
  serverPort = 9999
  admin.httpPort = 9900

  loggers = new LoggerConfig {
    level = Level.INFO
    handlers = new ConsoleHandlerConfig()
  }
}

The config file will be evaluated at runtime by this code in your Main class:

object Main {
  val log = Logger.get(getClass.getName)

  def main(args: Array[String]) {
    val runtime = RuntimeEnvironment(this, args)
    val server = runtime.loadRuntimeConfig[MyServer]()
    log.info("Starting my server!")
    try {
      server.start()
    } catch {
      case e: Exception =>
        e.printStackTrace()
        log.error(e, "Unexpected exception: %s", e.getMessage)
        System.exit(0)
    }
  }
}

Your MyServer class should implement the Service interface so it can be started and shutdown. The runtime environment will find your config file and evaluate it, returning the MyServer object to you so you can start it. And you're set!

Stats API

The base trait of the stats API is StatsProvider, which defines methods for setting and getting each type of collected stat. The concrete implementation is StatsCollection, which stores them all in java concurrent hash maps.

To log or report stats, attach a StatsReporter to a StatsCollection. A StatsReporter keeps its own state, and resets that state each time it reports. You can attach multiple StatsReporters to track independent state without affecting the StatsCollection.

The simplest (and most common) pattern is to use the global singleton named Stats, like so:

import com.twitter.ostrich.stats.Stats

Stats.incr("cache_misses")
Stats.time("memcache_timing") {
  memcache.set(key, value)
}

Stat names can be any string, though conventionally they contain only letters, digits, underline (_), and dash (-), to make it easier for reporting.

You can immediately see any reported stats on the admin web server, if you've activated it, through the "stats" command:

curl localhost:PPPP/stats.txt

(where PPPP is your configured admin port)

ServiceTracker

The global "shutdown" and "quiesce" commands work by talking to a global ServiceTracker object. This is just a set of running Service objects.

Each Service knows how to start and shutdown, so registering a service with the global ServiceTracker will cause it to be shutdown when the server as a whole is shutdown:

ServiceTracker.register(this)

Some helper classes like BackgroundProcess and PeriodicBackgroundProcess implement Service, so they can be used to build simple background tasks that will be automatically shutdown when the server exits.

Admin web service

The easiest way to start the admin service is to construct an AdminServiceConfig with desired configuration, and call apply on it.

To reduce boilerplate in the common case of configuring a server with an admin port and logging, a helper trait called ServerConfig is defined with both:

var loggers: List[LoggerConfig] = Nil
var admin = new AdminServiceConfig()

The apply method on ServerConfig will create and start the admin service if a port is defined, and setup any configured logging.

You can also build an admin service directly from its config:

val adminConfig = new AdminServiceConfig {
  httpPort = 8888
  statsNodes = new StatsConfig {
    reporters = new TimeSeriesCollectorConfig
  }
}
val runtime = RuntimeEnvironment(this, Nil)
val admin = adminConfig()(runtime)

If httpPort isn't set, the admin service won't start, and admin will be None. Otherwise it will be an Option[AdminHttpService].

statsNodes can attach a list of reporters to named stats collections. In the above example, a time-series collector is added to the global Stats object. This is used to provide the web graphs described below under "Web graphs".

Web/socket commands

Commands over the admin interface take the form of an HTTP "get" request:

GET /<command>[/<parameters...>][.<type>]

which can be performed using 'curl' or 'wget':

$ curl http://localhost:PPPP/shutdown

The result body may be json or plain-text, depending on . The default is json, but you can ask for text like so:

$ curl http://localhost:PPPP/stats.txt

For simple commands like shutdown, the response body may simply be the JSON encoding of the string "ok". For others like stats, it may be a nested structure.

The commands are:

  • ping

    Verify that the admin interface is working; server should say "pong" back.

  • reload

    Reload the server config file for any services that support it (most do not).

  • shutdown

    Immediately shutdown the server.

  • quiesce

    Close any listening sockets, stop accepting new connections, and shutdown the server as soon as the last client connection is done.

  • stats

    Dump server statistics as 4 groups: counters, gauges, metrics, and labels.

    • If the period query parameter is specified (e.g. /stats.json?period=10), a StatsListener is acquired for that time period, and all requests with this period value will receive the same stats values throughout that period.
    • Otherwise, if the namespace argument is provided (e.g. /stats.json?namespace=ganglia), a StatsListener is acquired for that namespace, and each request with this namespace value will reset the stats listener, effectively returning the delta since the prior request with that namespace. (See src/scripts/json_stats_fetcher.rb for an example.)
    • If neither period nor namespace parameters are specified, the main stats object will be fetched, returning non-differerential counters and metrics over the life-time of the process.
  • server_info

    Dump server info (server name, version, build, and git revision).

  • threads

    Dump stack traces and stats about each currently running thread.

  • gc

    Force a garbage collection cycle.

Web graphs

If TimeSeriesCollector is attached to a stats collection, the web interface will include a small graph server that can be used to look at the last hour of data on collected stats.

The url

http://localhost:PPPP/graph/

(where PPPP is your admin httpPort) will give a list of currently-collected stats, and links to the current hourly graph for each stat. The graphs are generated in javascript using flot.

Profiling

If you're using heapster, you can generate a profile suitable for reading with google perftools

Example use:

curl -s 'localhost:9990/pprof/heap?pause=10' >| /tmp/prof

This will result in a file that you can be read with pprof

Credits

This started out as several smaller projects that began to overlap so much, we decided to merge them. Major contributers include, in alphabetical order:

  • Alex Payne
  • John Corwin
  • John Kalucki
  • Marius Eriksen
  • Nick Kallen
  • Oliver Gould
  • Pankaj Gupta
  • Robey Pointer
  • Steve Jenson

If you make a significant change, please add your name to the list!

License

This library is released under the Apache Software License, version 2, which should be included with the source in a file named LICENSE.

More Repositories

1

snowflake

Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.
Scala
7,566
star
2

diffy

Find potential bugs in your services with Diffy
Scala
3,827
star
3

flockdb

A distributed, fault-tolerant graph database
Scala
3,326
star
4

kestrel

simple, distributed message queue system (inactive)
Scala
2,780
star
5

twui

A UI framework for Mac based on Core Animation
Objective-C
2,750
star
6

CocoaSPDY

SPDY for iOS and OS X
Objective-C
2,396
star
7

gizzard

[Archived] A flexible sharding framework for creating eventually-consistent distributed datastores
Scala
2,255
star
8

distributedlog

A high performance replicated log service. (The development is moved to Apache Incubator)
Java
2,227
star
9

recess

A simple and attractive code quality tool for CSS built on top of LESS
CSS
2,190
star
10

commons

Twitter common libraries for python and the JVM (deprecated)
Java
2,102
star
11

iago

A load generator, built for engineers
Scala
1,351
star
12

twitter-text-js

A JavaScript implementation of Twitter's text processing library
1,212
star
13

ambrose

A platform for visualization and real-time monitoring of data workflows
Java
1,180
star
14

twitter-kit-android

Twitter Kit for Android
Java
827
star
15

twitter-kit-ios

Twitter Kit is a native SDK to include Twitter content inside mobile apps.
Objective-C
684
star
16

twitter-text-rb

A library that does auto linking and extraction of usernames, lists and hashtags in tweets
617
star
17

mysos

Cotton (formerly known as Mysos)
592
star
18

twitter-text-objc

An Objective-C implementation of Twitter's text processing library
587
star
19

torch-autograd

Autograd automatically differentiates native Torch code
Lua
555
star
20

ospriet

An example audience moderation app built on Twitter
JavaScript
408
star
21

cloudhopper-smpp

Efficient, scalable, and flexible Java implementation of the Short Messaging Peer to Peer Protocol (SMPP)
Java
384
star
22

twitter-text-java

A Java implementation of Twitter's text processing library
363
star
23

jvmgcprof

A simple utility for profile allocation and garbage collection activity in the JVM
C
342
star
24

css-flip

A CSS BiDi flipper
JavaScript
313
star
25

clockworkraven

Human-Powered Data Analysis with Mechanical Turk
Ruby
299
star
26

torch-twrl

Torch-twrl is a package that enables reinforcement learning in Torch.
Lua
251
star
27

cassie

A Scala client for Cassandra
Scala
243
star
28

twemperf

A tool for measuring memcached server performance
C
242
star
29

hdfs-du

Visualize your HDFS cluster usage
JavaScript
231
star
30

pycascading

A Python wrapper for Cascading
Python
223
star
31

RTLtextarea

Automatically detects RTL and configures a text input
JavaScript
170
star
32

haplocheirus

A Redis-backed storage engine for timelines
Scala
133
star
33

standard-project

A slightly more standard sbt project plugin library
Scala
132
star
34

torch-decisiontree

This project implements random forests and gradient boosted decision trees (GBDT). The latter uses gradient tree boosting. Both use ensemble learning to produce ensembles of decision trees (that is, forests).
Lua
125
star
35

torch-ipc

A set of primitives for parallel computation in Torch
C
96
star
36

elephant-twin

Elephant Twin is a framework for creating indexes in Hadoop
Java
96
star
37

torch-distlearn

A set of distributed learning algorithms for Torch
Lua
95
star
38

libcrunch

A lightweight mapping framework that maps data objects to a number of nodes, subject to constraints
Java
90
star
39

scribe

A Ruby client library for Scribe
Ruby
89
star
40

sbt-package-dist

sbt 11 plugin codifying best practices for building, packaging, and publishing
Scala
88
star
41

twisitor

A simple and spectacular photo-tweeting birdhouse
JavaScript
84
star
42

code-of-conduct

Open Source Code of Conduct at Twitter
83
star
43

flockdb-client

A Ruby client library for FlockDB
Ruby
83
star
44

twitter-text-conformance

Conformance testing data for the twitter-text-* repositories
77
star
45

torch-dataset

An extensible and high performance method of reading, sampling and processing data for Torch
Lua
77
star
46

naggati2

Protocol builder for netty using scala (DEPRECATED)
Scala
74
star
47

cdk

CDK is a tool to quickly generate single-file html slide presentations from AsciiDoc
CSS
73
star
48

twitter-kit-unity

Twitter Kit for Unity
C#
71
star
49

plumage.js

Batteries Included App Framework for Data Intensive UIs
JavaScript
66
star
50

gozer

Prototype mesos framework using new low-level API built in Go
Go
61
star
51

bookkeeper

Twitter's fork of Apache BookKeeper (will push changes upstream eventually)
Java
60
star
52

grabby-hands

A JVM Kestrel client that aggregates queues from multiple servers. Implemented in Scala with Java bindings. In use at Twitter for all JVM Search and Streaming Kestrel interactions.
Scala
56
star
53

gizzmo

A command-line client for Gizzard
Ruby
54
star
54

thrift

Twitter's out-of-date, forked thrift
C++
52
star
55

libkestrel

libkestrel
Scala
47
star
56

time_constants

Time constants, in seconds, so you don't have to use slow ActiveSupport helpers
Ruby
46
star
57

sbt-scrooge

An SBT plugin that adds a mixin for doing Thrift code auto-generation during your compile phase
Scala
44
star
58

cli-guide.js

CLI Guide JQuery Plugin
JavaScript
41
star
59

sbt-thrift

sbt rules for generating source stubs out of thrift IDLs, for java & scala
Ruby
37
star
60

jaqen

A type-safe heterogenous Map or a Named field Tuple
Scala
35
star
61

spitball

A very simple gem package generation tool built on bundler
Ruby
33
star
62

torch-thrift

A Thrift codec for Torch
C
30
star
63

jsr166e

JSR166e for Twitter
Java
27
star
64

unishark

Unishark: Another unittest extension for Python
Python
26
star
65

raggiana

A simple standalone Finagle stats viewer
JavaScript
21
star
66

sekhmet

foundational tools and building blocks for gaining insights and diagnosing system health in real-time
20
star
67

periscope-live-engagement-unity-sdk

Periscope Live Engagement Unity SDK
C#
20
star
68

twitterActors

Improved Scala actors library; used internally at Twitter
Scala
18
star
69

finatra-activator-http-seed

Typesafe activator template for constructing a Finatra HTTP server application:
Scala
18
star
70

killdeer

Killdeer is a simple server for replaying a sample of responses to sythentically recreate production response characteristics.
Scala
15
star
71

bittern

Bittern Cache uses nvdimm to speed up block io operations
C
14
star
72

elephant-twin-lzo

Elephant Twin LZO uses Elephant Twin to create LZO block indexes
Java
14
star
73

finatra-activator-thrift-seed

Typesafe activator template for constructing a Finatra Thrift server application: https://twitter.github.io/finatra/user-guide/ โ€”
Scala
11
star
74

chainsaw

A thin Scala wrapper for SLF4J
Scala
9
star
75

PerfTracepoint

Perf tracepoint support for the JVM
Java
7
star
76

oscon-puzzles

OSCON 2014 Puzzle
JavaScript
7
star
77

scala-json

JSON in Scala (deprecated)
Scala
5
star
78

scala-csp-config

A Scala library for configuring Content Security Policy headers for HTTP responses.
Scala
4
star
79

finatra-misc

Miscellaneous libraries and utils used by Finatra
Scala
3
star
80

.github

2
star
81

autolog-clustering

USF Capstone Project for Auto-log Clustering
Python
1
star