• Stars
    star
    511
  • Rank 86,473 (Top 2 %)
  • Language
    Scala
  • License
    BSD 2-Clause "Sim...
  • Created over 9 years ago
  • Updated about 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Distributed NoSQL Database

Created by Stephen McDonald

CurioDB is a distributed and persistent Redis clone, built with Scala and Akka. Please note that despite the fancy logo, this is a toy project, hence the name "Curio", and any suitability as a drop-in replacement for Redis is purely incidental. :-)

Installation

I've been using sbt to build the project, which you can install on OS X, Linux or Windows. With that done, you just need to clone this repository and run it:

$ git clone git://github.com/stephenmcd/curiodb.git
$ cd curiodb
$ sbt "~re-start --config=path/to/config.file"

You can also build a binary (executable JAR file):

$ sbt assembly
$ ./target/scala-2.11/curiodb-0.0.1 --config=path/to/config.file

Note that the --config=path/to/config.file argument is optional, see the Configuration section below for more detail.

Overview

Why build a Redis clone? Well, I'd been learning Scala and Akka and wanted a nice project I could take them further with. I've used Redis heavily in the past, and Akka gave me some really cool ideas for implementing a clone, based on each key/value pair (or KV pair) in the system being implemented as an actor:

Concurrency

Since each KV pair in the system is an actor, CurioDB will happily use all your CPU cores, so you can run 1 server using 32 cores instead of 32 servers each using 1 core (or use all 1,024 cores of your 32 server cluster, why not). Each actor operates on its own value atomically, so the atomic nature of Redis commands is still present, it just occurs at the individual KV level instead of in the context of an entire running instance of Redis.

Distributed by Default

Since each KV pair in the system is an actor, the interaction between multiple KV pairs works the same way when they're located across the network as it does when they're located on different processes on a single machine. This negates the need for features of Redis like "hash tagging", and allows commands that deal with multiple keys (SUNION, SINTER, MGET, MSET, etc) to operate seamlessly across a cluster.

Virtual Memory

Since each KV pair in the system is an actor, the unit of disk storage is the individual KV pair, not a single instance's entire data set. This makes Redis' abandoned virtual memory feature a lot more feasible. With CurioDB, an actor can simply persist its value to disk after some criteria occurs, and shut itself down until requested again.

Simple Implementation

Scala is concise, you get a lot done with very little code, but that's just the start - CurioDB leverages Akka very heavily, taking care of clustering, concurrency, persistence, and a whole lot more. This means the bulk of CurioDB's code mostly deals with implementing all of the Redis commands, so far weighing in at only a paltry 1,000 lines of Scala! Currently, the majority of commands have been fully implemented, as well as the Redis wire protocol itself, so existing client libraries can be used. Some commands have been purposely omitted where they don't make sense, such as cluster management, and things specific to Redis' storage format.

Pluggable Storage

Since Akka Persistence is used for storage, many strange scenarios become available. Want to use PostgreSQL or Cassandra for storage, with CurioDB as the front-end interface for Redis commands? This should be possible! By default, CurioDB uses Akka's built-in LevelDB storage.

Design

Here's a bad diagram representing one server in the cluster, and the flow of a client sending a command:

  • An outside client sends a command to the server actor (Server.scala). There's at most one per cluster node (which could be used to support load balancing), and at least one per cluster (not all nodes need to listen for outside clients).
  • Upon receiving a new outside client connection, the server actor will create a Client Node actor (System.scala), it's responsible for the life-cycle of a single client connection, as well as parsing the incoming and writing the outgoing protocol, such as the Redis protocol for TCP clients, or JSON for HTTP clients (Server.scala).
  • Key Node actors (System.scala) manage the key space for the entire system, which are distributed across the entire cluster using consistent hashing. A Client Node will forward the command to the matching Key Node for its key.
  • A Key Node is then responsible for creating, removing, and communicating with each KV Node actor, which are the actual actors that store the underlying value for each key, such as a strings, hashes, and sorted sets, and perform the operations on them for each of their respective commands. (Data.scala).
  • The KV Node then sends a response back to the originating Client Node, which returns it to the outside client.

Not diagrammed, but in addition to the above:

  • Some commands require coordination with multiple KV Nodes, in which case a temporary Aggregate actor (Aggregation.scala) is created by the Client Node, which coordinates the results for multiple commands via Key Nodes and KV Nodes in the same way a Client Node does.
  • PubSub is implemented by adding behavior to Key Nodes and Client Nodes, which act as PubSub servers and clients respectively (PubSub.scala).
  • Lua scripting is fully supported (Scripting.scala) thanks to LuaJ, and is implemented similarly to PubSub, where behavior is added to Key Nodes which store and run compiled Lua scripts (via EVALSHA), and Client Nodes which can run uncompiled scripts directly (via EVAL).

Transactions

Distributed transactions are fully supported, both by way of the MULTI and EXEC commands, and for Lua scripts with the EVAL and EVALSHA commands. Transactions are implemented using basic two-phase commit (2PC) with rollback support, multiversion concurrency control (MVCC), and configurable isolation levels.

2PC

A Client Node acts as a transaction coordinator in 2PC parlance. It is responsible for coordinating initial agreement with each Node that will participate in the transaction, aggregating responses for all executed (but uncommitted) commands, and then finally coordinating the commit phase for each participating Node. Given the use of MVCC, performing rollback on errors during a transaction is fully supported, and is the default behavior. This differs however from the way Redis deals with errors, as it does not support transaction rollbacks, therefore in CurioDB the behavior can be configured by changing the curiodb.transactions.on-error setting from rollback to commit, if this level of compatibility with Redis is required.

MVCC

Each KV Node in the system stores multiple versions of its underlying value internally, using a map that contains each transaction's version, as well as the current committed version of the value, or "main" value. When a transaction begins, the main value is copied into the map, stored against its transaction ID, and from that point, all commands received within the transaction will read and write to the transaction version until the transaction is commited, at which point the transaction version is copied back to the main value.

Isolation

Three levels of transaction isolation are available, which can be configured by the curiodb.transactions.isolation setting, to control how a key's value is read during a command:

  • repeatable (default): Inside a transaction, only the transaction's version will be read, otherwise when outside of a transaction, the current committed version will be read.
  • committed: Inside or outside of a transaction, the current committed version will be read.
  • uncommitted: Inside or outside of a transaction, the most recently written version will be read, even if uncommitted.

Note there is no serializable isolation level typically found in SQL databases, since neither Redis nor CurioDB have a notion of range queries.

Configuration

Here are the few configuration settings and their default values that CurioDB implements, along with the large range of settings provided by Akka itself, which both use typesafe-config - consult those projects for detailed information on configuration implementation.

curiodb {

  // Addresses listening for clients.
  listen = [
    "tcp://127.0.0.1:6379"    // TCP server using Redis protocol.
    "http://127.0.0.1:2600"   // HTTP server using JSON.
    "ws://127.0.0.1:6200"     // WebSocket server, also using JSON.
  ]

  // Duration settings (either time value, or "off").
  persist-after = 1 second    // Like "save" in Redis.
  sleep-after   = 10 seconds  // Virtual memory threshold.
  expire-after  = off         // Automatic key expiry.

  transactions {
    timeout   = 3 seconds     // Max time a transaction may take to run.
    isolation = repeatable    // "repeatable", "committed", or "uncommitted".
    on-error  = rollback      // "commit" or "rollback".
  }

  commands {
    timeout  = 1 second       // Max time a command may take to run.
    disabled = [SHUTDOWN]     // List of disabled commands.
    debug    = off            // Print debug info for every command run.
  }

  // Cluster nodes.
  nodes = {
    node1: "tcp://127.0.0.1:9001"
    // node2: "tcp://127.0.0.1:9002"
    // node3: "tcp://127.0.0.1:9003"
  }

  // Current cluster node (from the "nodes" keys above).
  node = node1

}

You can also optionally provide your own configuration file, using the --config=path/to/config.file command-line argument. Your configuration file need only define the values you wish to override. For example, suppose you wanted to only listen over TCP, and disable extra commands:

curiodb.listen = ["tcp://127.0.0.1:3333"]

curiodb.commands.disabled = [SHUTDOWN, DEL, FLUSHDB, FLUSHALL]

The sleep-after and expire-after settings are worth some explanation. Each of these configure a time duration that starts each time a command is run against a key, and elapses if no further commands run against that key within the duration. Once the duration elapses, an action is performed. After the sleep-after duration elapses for a key, it will persist its value to disk, and shut down, essentially going to sleep - this is how virtual memory is implemented. expire-after is similar, but after its duration elapses, the key is deleted entirely, just as if the EXPIRE command was used.

HTTP/WebSocket JSON API

As alluded to in the configuration example above, CurioDB also supports a HTTP/WebSocket JSON API, as well as the same wire protocol that Redis implements over TCP. Commands are issued with HTTP POST requests, or WebSocket messages, containing a JSON Object with a single args key, containing an Array of arguments. Responses are returned as a JSON Object with a single result key.

HTTP:

$ curl -X POST -d '{"args": ["SET", "foo", "bar"]}' http://127.0.0.1:2600
{"result":"OK"}

$ curl -X POST -d '{"args": ["MGET", "foo", "baz"]}' http://127.0.0.1:2600
{"result":["bar",null]}

WebSocket:

var socket = new WebSocket('ws://127.0.0.1:6200');

socket.onmessage = function(response) {
  console.log(JSON.parse(response.data));
};

socket.send(JSON.stringify({args: ["DEL", "foo"]}));

SUBSCRIBE and PSUBSCRIBE commands work as expected over WebSockets, and are also fully supported by the HTTP API, by using chunked transfer encoding to allow a single HTTP connection to receive a stream of published messages over an extended period of time.

In the case of errors such as invalid arguments to a command, WebSocket connections will transmit a JSON Object with a single error key containing the error message, while HTTP requests will return a response with a 400 status, containing the error message in the response body.

Disadvantages compared to Redis

  • I haven't measured it, but it's safe to say memory consumption is much poorer due to the JVM. Somewhat alleviated by the virtual memory feature.
  • It's slower, but not by as much as you'd expect. Without any optimization, it's roughly about half the speed of Redis. See the performance section below.
  • PubSub pattern matching may perform poorly. PubSub channels are distributed throughout the cluster using consistent hashing, which makes pattern matching impossible. To work around this, patterns get stored on every node in the cluster, and the PSUBSCRIBE and PUNSUBSCRIBE commands get broadcast to all of them. This needs rethinking!

Mainly though, Redis is an extremely mature and battle-tested project that's been developed by many over the years, while CurioDB is a one-man hack project worked on over a few months. As much as this document attempts to compare them, they're really not comparable in that light. That said, it's been tons of fun building it, and it has some cool ideas thanks to Akka. I hope others can get something out of it too.

Performance

These are the results of redis-benchmark -q on an early 2014 MacBook Air running OS X 10.9 (the numbers are requests per second):

Benchmark Redis CurioDB %
PING_INLINE 57870.37 46296.29 79%
PING_BULK 55432.37 44326.24 79%
SET 50916.50 33233.63 65%
GET 53078.56 38580.25 72%
INCR 57405.28 33875.34 59%
LPUSH 45977.01 28082.00 61%
LPOP 56369.79 23894.86 42%
SADD 59101.65 25733.40 43%
SPOP 50403.23 33886.82 67%
LRANGE_100 22246.94 11228.38 50%
LRANGE_300 9984.03 6144.77 61%
LRANGE_500 6473.33 4442.67 68%
LRANGE_600 5323.40 3511.11 65%
MSET 34554.25 15547.26 44%

Generated with the bundled benchmark.py script.

Further Reading

These are some articles I published on developing CurioDB:

License

BSD.

More Repositories

1

mezzanine

CMS framework for Django
Python
4,757
star
2

django-socketio

WebSockets for Django
Python
1,318
star
3

cartridge

Ecommerce for Mezzanine
Python
708
star
4

django-forms-builder

Let users build forms in Django admin
Python
689
star
5

drum

Reddit / Hacker News clone for Mezzanine
Python
387
star
6

hot-redis

Rich Python data types for Redis
Python
290
star
7

gnotty

IRC web client and bot framework
Python
159
star
8

django-overextends

Circular template inheritance for Django
Python
108
star
9

two-queues

Benchmarking Redis and ZeroMQ pub-sub, using Python and Go
Go
105
star
10

gunicorn-console

A curses application for managing gunicorn processes.
Python
90
star
11

django-postgres-fuzzycount

Fast / fuzzy PostgreSQL counts for Django
Python
89
star
12

django-email-extras

PGP encrypted / multipart templated emails for Django
Python
75
star
13

filebrowser-safe

File manager for Mezzanine
Python
41
star
14

hg-github

A Mercurial extension for working with GitHub repositories.
Python
40
star
15

vimeo-deck

Synchronize Speaker Deck presentations with Vimeo videos.
JavaScript
28
star
16

gamblor

An online casino app built for Django Dash 2012
JavaScript
26
star
17

grappelli-safe

Admin skin for Mezzanine
CSS
25
star
18

snazzymaps-browser

Android app for searching and browsing Snazzy Maps
Java
24
star
19

grillode

A web-based chat application written in CoffeeScript for Node.js
CoffeeScript
23
star
20

drawnby

Drawn By is a collaborative real-time sketching app built for the 2011 Django Dash.
JavaScript
23
star
21

babbler

A Twitter bot that polls an RSS feed and posts its entries as tweets, with auto-generated hashtags.
Python
21
star
22

tastypie-msgpack

MsgPack support for Django Tastypie.
Python
21
star
23

django-shotgun

Test entire Django sites
Python
17
star
24

linkedout

Build PDF resumes with the LinkedIn API
Ruby
15
star
25

sphinx-me

Wraps your README-only projects in a dynamic Sphinx shell for hosting on http://readthedocs.org
Python
12
star
26

readertray

Cross platform desktop notifications for RSS
Python
9
star
27

indexed-tree-map

JDK's enhanced red-black tree map algorithm to support access by index
Java
9
star
28

otr

Combined GitHub and Bitbucket API.
Ruby
8
star
29

aspchat

Classic ASP chat application
ASP
8
star
30

bitbucket-batch

A tool for bulk updating access to bitbucket.org repos.
Python
8
star
31

grillo

A terminal based chat server and client.
Python
7
star
32

mezzanine.jupo.org

Mezzanine/Cartridge project and demo site
Python
6
star
33

ratemyflight

A web app built for the Django Dash hackathon 2010.
JavaScript
6
star
34

klout-feed

Receive your daily Klout score via RSS.
Ruby
6
star
35

slbuddy

Tool for tracking sales data in Secondlife.com
Python
3
star
36

virtualboxing

Utilities for comparing timings on bulk operations across databases in a distributed environment.
Ruby
3
star
37

cryptopals

Matasano Crypto Challenges
Go
3
star
38

ghetto-life-stream

Parses items for a Google Buzz feed from various sources.
PHP
2
star
39

teamcity-client

Team City HTTP client.
Ruby
2
star
40

sydjango-damm

Code for my SyDjango talk "Django Admin: The Missing Manual"
Python
2
star
41

jquery-squeezebox

Replacement for jquery.ui.accordion to avoid dealing with jquery.ui theming.
JavaScript
2
star
42

lime

JavaScript framework, predates jQuery
JavaScript
1
star
43

cmdsvr

Toy web application server
Python
1
star
44

gedit-ftp-browser

FTP plugin for Gedit
Python
1
star