• This repository has been archived on 18/Sep/2021
  • Stars
    star
    2,784
  • Rank 15,385 (Top 0.4 %)
  • Language
    Scala
  • License
    Other
  • Created almost 16 years ago
  • Updated about 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

simple, distributed message queue system (inactive)

Kestrel

Project status

Kestrel is based on Blaine Cook's "starling" simple, distributed message queue, with added features and bulletproofing, as well as the scalability offered by actors and the JVM.

Each server handles a set of reliable, ordered message queues. When you put a cluster of these servers together, with no cross communication, and pick a server at random whenever you do a set or get, you end up with a reliable, loosely ordered message queue.

In many situations, loose ordering is sufficient. Dropping the requirement on cross communication makes it horizontally scale to infinity and beyond: no multicast, no clustering, no "elections", no coordination at all. No talking! Shhh!

For more information about what it is and how to use it, check out the included guide.

Kestrel has a mailing list here: [email protected]

Author's address: Robey Pointer <[email protected]>

Status

We've deprecated Kestrel because internally we've shifted our attention to an alternative project based on DistributedLog, and we no longer have the resources to contribute fixes or accept pull requests. While Kestrel is a great solution up to a certain point (simple, fast, durable, and easy to deploy), it hasn't been able to cope with Twitter's massive scale (in terms of number of tenants, QPS, operability, diversity of workloads etc.) or operating environment (an Aurora cluster without persistent storage).

Features

Kestrel is:

  • fast

    It runs on the JVM so it can take advantage of the hard work people have put into java performance.

  • small

    Currently about 2500 lines of scala, because it relies on Netty (a rough equivalent of Danger's ziggurat or Ruby's EventMachine) -- and because Scala is extremely expressive.

  • durable

    Queues are stored in memory for speed, but logged into a journal on disk so that servers can be shutdown or moved without losing any data.

  • reliable

    A client can ask to "tentatively" fetch an item from a queue, and if that client disconnects from kestrel before confirming ownership of the item, the item is handed to another client. In this way, crashing clients don't cause lost messages.

Anti-Features

Kestrel is not:

  • strongly ordered

    While each queue is strongly ordered on each machine, a cluster will appear "loosely ordered" because clients pick a machine at random for each operation. The end result should be "mostly fair".

  • transactional

    This is not a database. Item ownership is transferred with acknowledgement, but kestrel does not support grouping multiple operations into an atomic unit.

Downloading it

The latest release is always on the homepage here:

Or the latest development versions & branches are on github:

Building it

Kestrel requires java 6 and sbt 0.11.2. Presently some sbt plugins used by kestrel depend on that exact version of sbt. On OS X 10.5, you may have to hard-code an annoying JAVA_HOME to use java 6:

$ export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home

Building from source is easy:

$ sbt clean update package-dist

Scala libraries and dependencies will be downloaded from maven repositories the first time you do a build. The finished distribution will be in dist.

Running it

You can run kestrel by hand, in development mode, via:

$ ./dist/kestrel-VERSION/scripts/devel.sh

Like all ostrich-based servers, it uses the "stage" property to determine which config file to load, so devel.sh sets -Dstage=development.

When running it as a server, a startup script is provided in dist/kestrel-VERSION/scripts/kestrel.sh. The script assumes you have daemon, a standard daemonizer for Linux, but also available here for all common unix platforms.

The created archive kestrel-VERSION.zip can be expanded into a place like /usr/local (or wherever you like) and executed within its own folder as a self-contained package. All dependent jars are included. The current startup script, however, assumes that kestrel has been deployed to /usr/local/kestrel/current (e.g., as if by capistrano), and the startup script loads kestrel from that path.

The default configuration puts logfiles into /var/log/kestrel/ and queue journal files into /var/spool/kestrel/.

The startup script logs extensive GC information to a file named stdout in the log folder. If kestrel has problems starting up (before it can initialize logging), it will usually appear in error in the same folder.

Configuration

Queue configuration is described in detail in docs/guide.md (an operational guide). Scala docs for the config variables are here.

Performance

Several performance tests are included. To run them, first start up a kestrel instance locally.

$ sbt clean update package-dist
$ ./dist/kestrel-VERSION/scripts/devel.sh

Put-many

This test just spams a kestrel server with "put" operations, to see how quickly it can absorb and journal them.

A sample run on a 2010 MacBook Pro:

$ ./dist/kestrel/scripts/load/put-many -n 100000
Put 100000 items of 1024 bytes to localhost:22133 in 1 queues named spam
  using 100 clients.
Finished in 6137 msec (61.4 usec/put throughput).
Transactions: min=71.00; max=472279.00 472160.00 469075.00;
  median=3355.00; average=5494.69 usec
Transactions distribution: 5.00%=485.00 10.00%=1123.00 25.00%=2358.00
  50.00%=3355.00 75.00%=4921.00 90.00%=7291.00 95.00%=9729.00
  99.00%=50929.00 99.90%=384638.00 99.99%=467899.00

Many-clients

This test has one producer that trickles out one item at a time, and a pile of consumers fighting for each item. It usually takes exactly as long as the number of items times the delay, but is useful as a validation test to make sure kestrel works as advertised without blowing up.

A sample run on a 2010 MacBook Pro:

$ ./dist/kestrel/scripts/load/many-clients
many-clients: 100 items to localhost using 100 clients, kill rate 0%,
  at 100 msec/item
Received 100 items in 11046 msec.

This test always takes about 11 seconds -- it's a load test instead of a speed test.

Flood

This test starts up one producer and one consumer, and just floods items through kestrel as fast as it can.

A sample run on a 2010 MacBook Pro:

$ ./dist/kestrel/scripts/load/flood
flood: 1 threads each sending 10000 items of 1kB through spam
Finished in 1563 msec (156.3 usec/put throughput).
Consumer(s) spun 0 times in misses.

Packing

This test starts up one producer and one consumer, seeds the queue with a bunch of items to cause it to fall behind, then does cycles of flooding items through the queue, separated by pauses. It's meant to test kestrel's behavior with a queue that's fallen behind and stays behind indefinitely, to make sure the journal files are packed periodically without affecting performance too badly.

A sample run on a 2010 MacBook Pro:

$ ./dist/kestrel/scripts/load/packing -c 10 -q small
packing: 25000 items of 1kB with 1 second pauses
Wrote 25000 items starting at 0.
cycle: 1
Wrote 25000 items starting at 25000.
Read 25000 items in 5279 msec. Consumer spun 0 times in misses.
cycle: 2
Wrote 25000 items starting at 50000.
Read 25000 items in 4931 msec. Consumer spun 0 times in misses.
...
cycle: 10
Wrote 25000 items starting at 250000.
Read 25000 items in 5304 msec. Consumer spun 0 times in misses.
Read 25000 items in 3370 msec. Consumer spun 0 times in misses.

You can see the journals being packed in the kestrel log. Like "many-clients", this test is a load test instead of a speed test.

Leaky-reader

This test starts a producer and several consumers, with the consumers occasionally "forgetting" to acknowledge an item that they've read. It verifies that the un-acknowledged items are eventually handed off to another consmer.

A sample run:

$ ./dist/kestrel/scripts/load/leaky-reader -n 100000 -t 10
leaky-reader: 10 threads each sending 100000 items through spam
Flushing queues first.
1000
2000
100000
Finished in 40220 msec (40.2 usec/put throughput).
Completed all reads

Like "many-clients", it's just a load test.

More Repositories

1

snowflake

Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.
Scala
7,526
star
2

diffy

Find potential bugs in your services with Diffy
Scala
3,827
star
3

flockdb

A distributed, fault-tolerant graph database
Scala
3,324
star
4

twui

A UI framework for Mac based on Core Animation
Objective-C
2,754
star
5

CocoaSPDY

SPDY for iOS and OS X
Objective-C
2,395
star
6

gizzard

[Archived] A flexible sharding framework for creating eventually-consistent distributed datastores
Scala
2,255
star
7

distributedlog

A high performance replicated log service. (The development is moved to Apache Incubator)
Java
2,234
star
8

recess

A simple and attractive code quality tool for CSS built on top of LESS
CSS
2,194
star
9

commons

Twitter common libraries for python and the JVM (deprecated)
Java
2,106
star
10

iago

A load generator, built for engineers
Scala
1,355
star
11

twitter-text-js

A JavaScript implementation of Twitter's text processing library
1,212
star
12

ambrose

A platform for visualization and real-time monitoring of data workflows
Java
1,183
star
13

twitter-kit-android

Twitter Kit for Android
Java
825
star
14

ostrich

A stats collector & reporter for Scala servers (deprecated)
Scala
776
star
15

twitter-kit-ios

Twitter Kit is a native SDK to include Twitter content inside mobile apps.
Objective-C
681
star
16

twitter-text-rb

A library that does auto linking and extraction of usernames, lists and hashtags in tweets
616
star
17

mysos

Cotton (formerly known as Mysos)
592
star
18

twitter-text-objc

An Objective-C implementation of Twitter's text processing library
587
star
19

torch-autograd

Autograd automatically differentiates native Torch code
Lua
555
star
20

ospriet

An example audience moderation app built on Twitter
JavaScript
408
star
21

cloudhopper-smpp

Efficient, scalable, and flexible Java implementation of the Short Messaging Peer to Peer Protocol (SMPP)
Java
382
star
22

twitter-text-java

A Java implementation of Twitter's text processing library
363
star
23

jvmgcprof

A simple utility for profile allocation and garbage collection activity in the JVM
C
344
star
24

css-flip

A CSS BiDi flipper
JavaScript
314
star
25

clockworkraven

Human-Powered Data Analysis with Mechanical Turk
Ruby
299
star
26

torch-twrl

Torch-twrl is a package that enables reinforcement learning in Torch.
Lua
251
star
27

cassie

A Scala client for Cassandra
Scala
242
star
28

twemperf

A tool for measuring memcached server performance
C
241
star
29

hdfs-du

Visualize your HDFS cluster usage
JavaScript
231
star
30

pycascading

A Python wrapper for Cascading
Python
223
star
31

RTLtextarea

Automatically detects RTL and configures a text input
JavaScript
170
star
32

haplocheirus

A Redis-backed storage engine for timelines
Scala
133
star
33

standard-project

A slightly more standard sbt project plugin library
Scala
132
star
34

torch-decisiontree

This project implements random forests and gradient boosted decision trees (GBDT). The latter uses gradient tree boosting. Both use ensemble learning to produce ensembles of decision trees (that is, forests).
Lua
127
star
35

elephant-twin

Elephant Twin is a framework for creating indexes in Hadoop
Java
97
star
36

torch-ipc

A set of primitives for parallel computation in Torch
C
96
star
37

torch-distlearn

A set of distributed learning algorithms for Torch
Lua
95
star
38

libcrunch

A lightweight mapping framework that maps data objects to a number of nodes, subject to constraints
Java
91
star
39

scribe

A Ruby client library for Scribe
Ruby
89
star
40

sbt-package-dist

sbt 11 plugin codifying best practices for building, packaging, and publishing
Scala
88
star
41

twisitor

A simple and spectacular photo-tweeting birdhouse
JavaScript
84
star
42

flockdb-client

A Ruby client library for FlockDB
Ruby
83
star
43

code-of-conduct

Open Source Code of Conduct at Twitter
80
star
44

twitter-text-conformance

Conformance testing data for the twitter-text-* repositories
77
star
45

torch-dataset

An extensible and high performance method of reading, sampling and processing data for Torch
Lua
76
star
46

naggati2

Protocol builder for netty using scala (DEPRECATED)
Scala
75
star
47

cdk

CDK is a tool to quickly generate single-file html slide presentations from AsciiDoc
CSS
73
star
48

twitter-kit-unity

Twitter Kit for Unity
C#
71
star
49

plumage.js

Batteries Included App Framework for Data Intensive UIs
JavaScript
66
star
50

gozer

Prototype mesos framework using new low-level API built in Go
Go
61
star
51

bookkeeper

Twitter's fork of Apache BookKeeper (will push changes upstream eventually)
Java
59
star
52

grabby-hands

A JVM Kestrel client that aggregates queues from multiple servers. Implemented in Scala with Java bindings. In use at Twitter for all JVM Search and Streaming Kestrel interactions.
Scala
56
star
53

gizzmo

A command-line client for Gizzard
Ruby
52
star
54

thrift

Twitter's out-of-date, forked thrift
C++
52
star
55

libkestrel

libkestrel
Scala
47
star
56

time_constants

Time constants, in seconds, so you don't have to use slow ActiveSupport helpers
Ruby
47
star
57

sbt-scrooge

An SBT plugin that adds a mixin for doing Thrift code auto-generation during your compile phase
Scala
45
star
58

cli-guide.js

CLI Guide JQuery Plugin
JavaScript
41
star
59

sbt-thrift

sbt rules for generating source stubs out of thrift IDLs, for java & scala
Ruby
37
star
60

jaqen

A type-safe heterogenous Map or a Named field Tuple
Scala
35
star
61

spitball

A very simple gem package generation tool built on bundler
Ruby
33
star
62

torch-thrift

A Thrift codec for Torch
C
30
star
63

jsr166e

JSR166e for Twitter
Java
28
star
64

unishark

Unishark: Another unittest extension for Python
Python
26
star
65

raggiana

A simple standalone Finagle stats viewer
JavaScript
21
star
66

sekhmet

foundational tools and building blocks for gaining insights and diagnosing system health in real-time
20
star
67

periscope-live-engagement-unity-sdk

Periscope Live Engagement Unity SDK
C#
19
star
68

finatra-activator-http-seed

Typesafe activator template for constructing a Finatra HTTP server application:
Scala
18
star
69

twitterActors

Improved Scala actors library; used internally at Twitter
Scala
17
star
70

killdeer

Killdeer is a simple server for replaying a sample of responses to sythentically recreate production response characteristics.
Scala
15
star
71

bittern

Bittern Cache uses nvdimm to speed up block io operations
C
14
star
72

elephant-twin-lzo

Elephant Twin LZO uses Elephant Twin to create LZO block indexes
Java
14
star
73

finatra-activator-thrift-seed

Typesafe activator template for constructing a Finatra Thrift server application: https://twitter.github.io/finatra/user-guide/ โ€”
Scala
11
star
74

chainsaw

A thin Scala wrapper for SLF4J
Scala
9
star
75

oscon-puzzles

OSCON 2014 Puzzle
JavaScript
7
star
76

PerfTracepoint

Perf tracepoint support for the JVM
Java
7
star
77

scala-json

JSON in Scala (deprecated)
Scala
4
star
78

scala-csp-config

A Scala library for configuring Content Security Policy headers for HTTP responses.
Scala
4
star
79

finatra-misc

Miscellaneous libraries and utils used by Finatra
Scala
3
star
80

.github

2
star
81

autolog-clustering

USF Capstone Project for Auto-log Clustering
Python
1
star