• This repository has been archived on 18/Sep/2021
  • Stars
    star
    3,326
  • Rank 12,873 (Top 0.3 %)
  • Language
    Scala
  • License
    Other
  • Created about 14 years ago
  • Updated about 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A distributed, fault-tolerant graph database

STATUS

Twitter is no longer maintaining this project or responding to issues or PRs.

FlockDB

FlockDB is a distributed graph database for storing adjancency lists, with goals of supporting:

  • a high rate of add/update/remove operations
  • potientially complex set arithmetic queries
  • paging through query result sets containing millions of entries
  • ability to "archive" and later restore archived edges
  • horizontal scaling including replication
  • online data migration

Non-goals include:

  • multi-hop queries (or graph-walking queries)
  • automatic shard migrations

FlockDB is much simpler than other graph databases such as neo4j because it tries to solve fewer problems. It scales horizontally and is designed for on-line, low-latency, high throughput environments such as web-sites.

Twitter uses FlockDB to store social graphs (who follows whom, who blocks whom) and secondary indices. As of April 2010, the Twitter FlockDB cluster stores 13+ billion edges and sustains peak traffic of 20k writes/second and 100k reads/second.

It does what?

If, for example, you're storing a social graph (user A follows user B), and it's not necessarily symmetrical (A can follow B without B following A), then FlockDB can store that relationship as an edge: node A points to node B. It stores this edge with a sort position, and in both directions, so that it can answer the question "Who follows A?" as well as "Whom is A following?"

This is called a directed graph. (Technically, FlockDB stores the adjacency lists of a directed graph.) Each edge has a 64-bit source ID, a 64-bit destination ID, a state (normal, removed, archived), and a 32-bit position used for sorting. The edges are stored in both a forward and backward direction, meaning that an edge can be queried based on either the source or destination ID.

For example, if node 134 points to node 90, and its sort position is 5, then there are two rows written into the backing store:

forward: 134 -> 90 at position 5
backward: 90 <- 134 at position 5

If you're storing a social graph, the graph might be called "following", and you might use the current time as the position, so that a listing of followers is in recency order. In that case, if user 134 is Nick, and user 90 is Robey, then FlockDB can store:

forward: Nick follows Robey at 9:54 today
backward: Robey is followed by Nick at 9:54 today

The (source, destination) must be unique: only one edge can point from node A to node B, but the position and state may be modified at any time. Position is used only for sorting the results of queries, and state is used to mark edges that have been removed or archived (placed into cold sleep).

Building

In theory, building is as simple as

$ sbt clean update package-dist

but there are some pre-requisites. You need:

  • java 1.6
  • sbt 0.7.4
  • thrift 0.5.0

If you haven't used sbt before, this page has a quick setup: http://code.google.com/p/simple-build-tool/wiki/Setup. My ~/bin/sbt looks like this:

#!/bin/bash
java -server -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256m -Xmx1024m -jar `dirname $0`/sbt-launch-0.7.4.jar "$@"

Apache Thrift 0.5.0 is pre-requisite for building java stubs of the thrift IDL. It can't be installed via jar, so you'll need to install it separately before you build. It can be found on the apache thrift site: http://thrift.apache.org/. You can find the download for 0.5.0 here: http://archive.apache.org/dist/incubator/thrift/0.5.0-incubating/.

In addition, the tests require a local mysql instance to be running, and for DB_USERNAME and DB_PASSWORD env vars to contain login info for it. You can skip the tests if you want (but you should feel a pang of guilt):

$ NO_TESTS=1 sbt package-dist

Running

Check out the demo for instructions on how to start up a local development instance of FlockDB. It also shows how to add edges, query them, etc.

Community

Contributors

  • Nick Kallen @nk
  • Robey Pointer @robey
  • John Kalucki @jkalucki
  • Ed Ceaser @asdf

More Repositories

1

snowflake

Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.
Scala
7,566
star
2

diffy

Find potential bugs in your services with Diffy
Scala
3,827
star
3

kestrel

simple, distributed message queue system (inactive)
Scala
2,780
star
4

twui

A UI framework for Mac based on Core Animation
Objective-C
2,750
star
5

CocoaSPDY

SPDY for iOS and OS X
Objective-C
2,395
star
6

gizzard

[Archived] A flexible sharding framework for creating eventually-consistent distributed datastores
Scala
2,255
star
7

distributedlog

A high performance replicated log service. (The development is moved to Apache Incubator)
Java
2,227
star
8

recess

A simple and attractive code quality tool for CSS built on top of LESS
CSS
2,190
star
9

commons

Twitter common libraries for python and the JVM (deprecated)
Java
2,102
star
10

iago

A load generator, built for engineers
Scala
1,351
star
11

twitter-text-js

A JavaScript implementation of Twitter's text processing library
1,212
star
12

ambrose

A platform for visualization and real-time monitoring of data workflows
Java
1,180
star
13

twitter-kit-android

Twitter Kit for Android
Java
827
star
14

ostrich

A stats collector & reporter for Scala servers (deprecated)
Scala
774
star
15

twitter-kit-ios

Twitter Kit is a native SDK to include Twitter content inside mobile apps.
Objective-C
684
star
16

twitter-text-rb

A library that does auto linking and extraction of usernames, lists and hashtags in tweets
617
star
17

mysos

Cotton (formerly known as Mysos)
592
star
18

twitter-text-objc

An Objective-C implementation of Twitter's text processing library
587
star
19

torch-autograd

Autograd automatically differentiates native Torch code
Lua
555
star
20

ospriet

An example audience moderation app built on Twitter
JavaScript
408
star
21

cloudhopper-smpp

Efficient, scalable, and flexible Java implementation of the Short Messaging Peer to Peer Protocol (SMPP)
Java
384
star
22

twitter-text-java

A Java implementation of Twitter's text processing library
363
star
23

jvmgcprof

A simple utility for profile allocation and garbage collection activity in the JVM
C
342
star
24

css-flip

A CSS BiDi flipper
JavaScript
313
star
25

clockworkraven

Human-Powered Data Analysis with Mechanical Turk
Ruby
299
star
26

torch-twrl

Torch-twrl is a package that enables reinforcement learning in Torch.
Lua
251
star
27

cassie

A Scala client for Cassandra
Scala
243
star
28

twemperf

A tool for measuring memcached server performance
C
242
star
29

hdfs-du

Visualize your HDFS cluster usage
JavaScript
231
star
30

pycascading

A Python wrapper for Cascading
Python
223
star
31

RTLtextarea

Automatically detects RTL and configures a text input
JavaScript
170
star
32

haplocheirus

A Redis-backed storage engine for timelines
Scala
133
star
33

standard-project

A slightly more standard sbt project plugin library
Scala
132
star
34

torch-decisiontree

This project implements random forests and gradient boosted decision trees (GBDT). The latter uses gradient tree boosting. Both use ensemble learning to produce ensembles of decision trees (that is, forests).
Lua
125
star
35

torch-ipc

A set of primitives for parallel computation in Torch
C
96
star
36

elephant-twin

Elephant Twin is a framework for creating indexes in Hadoop
Java
96
star
37

torch-distlearn

A set of distributed learning algorithms for Torch
Lua
95
star
38

libcrunch

A lightweight mapping framework that maps data objects to a number of nodes, subject to constraints
Java
90
star
39

scribe

A Ruby client library for Scribe
Ruby
89
star
40

sbt-package-dist

sbt 11 plugin codifying best practices for building, packaging, and publishing
Scala
88
star
41

twisitor

A simple and spectacular photo-tweeting birdhouse
JavaScript
84
star
42

code-of-conduct

Open Source Code of Conduct at Twitter
83
star
43

flockdb-client

A Ruby client library for FlockDB
Ruby
83
star
44

twitter-text-conformance

Conformance testing data for the twitter-text-* repositories
77
star
45

torch-dataset

An extensible and high performance method of reading, sampling and processing data for Torch
Lua
77
star
46

naggati2

Protocol builder for netty using scala (DEPRECATED)
Scala
74
star
47

cdk

CDK is a tool to quickly generate single-file html slide presentations from AsciiDoc
CSS
73
star
48

twitter-kit-unity

Twitter Kit for Unity
C#
71
star
49

plumage.js

Batteries Included App Framework for Data Intensive UIs
JavaScript
66
star
50

gozer

Prototype mesos framework using new low-level API built in Go
Go
61
star
51

bookkeeper

Twitter's fork of Apache BookKeeper (will push changes upstream eventually)
Java
60
star
52

grabby-hands

A JVM Kestrel client that aggregates queues from multiple servers. Implemented in Scala with Java bindings. In use at Twitter for all JVM Search and Streaming Kestrel interactions.
Scala
56
star
53

gizzmo

A command-line client for Gizzard
Ruby
54
star
54

thrift

Twitter's out-of-date, forked thrift
C++
52
star
55

libkestrel

libkestrel
Scala
47
star
56

time_constants

Time constants, in seconds, so you don't have to use slow ActiveSupport helpers
Ruby
46
star
57

sbt-scrooge

An SBT plugin that adds a mixin for doing Thrift code auto-generation during your compile phase
Scala
44
star
58

cli-guide.js

CLI Guide JQuery Plugin
JavaScript
41
star
59

sbt-thrift

sbt rules for generating source stubs out of thrift IDLs, for java & scala
Ruby
37
star
60

jaqen

A type-safe heterogenous Map or a Named field Tuple
Scala
35
star
61

spitball

A very simple gem package generation tool built on bundler
Ruby
33
star
62

torch-thrift

A Thrift codec for Torch
C
30
star
63

jsr166e

JSR166e for Twitter
Java
27
star
64

unishark

Unishark: Another unittest extension for Python
Python
26
star
65

raggiana

A simple standalone Finagle stats viewer
JavaScript
21
star
66

sekhmet

foundational tools and building blocks for gaining insights and diagnosing system health in real-time
20
star
67

periscope-live-engagement-unity-sdk

Periscope Live Engagement Unity SDK
C#
20
star
68

twitterActors

Improved Scala actors library; used internally at Twitter
Scala
18
star
69

finatra-activator-http-seed

Typesafe activator template for constructing a Finatra HTTP server application:
Scala
18
star
70

killdeer

Killdeer is a simple server for replaying a sample of responses to sythentically recreate production response characteristics.
Scala
15
star
71

bittern

Bittern Cache uses nvdimm to speed up block io operations
C
14
star
72

elephant-twin-lzo

Elephant Twin LZO uses Elephant Twin to create LZO block indexes
Java
14
star
73

finatra-activator-thrift-seed

Typesafe activator template for constructing a Finatra Thrift server application: https://twitter.github.io/finatra/user-guide/ —
Scala
11
star
74

chainsaw

A thin Scala wrapper for SLF4J
Scala
9
star
75

PerfTracepoint

Perf tracepoint support for the JVM
Java
7
star
76

oscon-puzzles

OSCON 2014 Puzzle
JavaScript
7
star
77

scala-json

JSON in Scala (deprecated)
Scala
5
star
78

scala-csp-config

A Scala library for configuring Content Security Policy headers for HTTP responses.
Scala
4
star
79

finatra-misc

Miscellaneous libraries and utils used by Finatra
Scala
3
star
80

.github

2
star
81

autolog-clustering

USF Capstone Project for Auto-log Clustering
Python
1
star