• Stars
    star
    8,111
  • Rank 4,584 (Top 0.1 %)
  • Language
    Rust
  • License
    Apache License 2.0
  • Created almost 9 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

the champagne of beta embedded databases
key value
buy a coffee for us to convert into databases
documentation
chat about databases with us

sled - it's all downhill from here!!!

An embedded database.

let tree = sled::open("/tmp/welcome-to-sled")?;

// insert and get, similar to std's BTreeMap
let old_value = tree.insert("key", "value")?;

assert_eq!(
  tree.get(&"key")?,
  Some(sled::IVec::from("value")),
);

// range queries
for kv_result in tree.range("key_1".."key_9") {}

// deletion
let old_value = tree.remove(&"key")?;

// atomic compare and swap
tree.compare_and_swap(
  "key",
  Some("current_value"),
  Some("new_value"),
)?;

// block until all operations are stable on disk
// (flush_async also available to get a Future)
tree.flush()?;

If you would like to work with structured data without paying expensive deserialization costs, check out the structured example!

features

  • API similar to a threadsafe BTreeMap<[u8], [u8]>
  • serializable (ACID) transactions for atomically reading and writing to multiple keys in multiple keyspaces.
  • fully atomic single-key operations, including compare and swap
  • zero-copy reads
  • write batches
  • subscribe to changes on key prefixes
  • multiple keyspaces
  • merge operators
  • forward and reverse iterators over ranges of items
  • a crash-safe monotonic ID generator capable of generating 75-125 million unique ID's per second
  • zstd compression (use the compression build feature, disabled by default)
  • cpu-scalable lock-free implementation
  • flash-optimized log-structured storage
  • uses modern b-tree techniques such as prefix encoding and suffix truncation for reducing the storage costs of long keys with shared prefixes. If keys are the same length and sequential then the system can avoid storing 99%+ of the key data in most cases, essentially acting like a learned index

expectations, gotchas, advice

  • Maybe one of the first things that seems weird is the IVec type. This is an inlinable Arced slice that makes some things more efficient.
  • Durability: sled automatically fsyncs every 500ms by default, which can be configured with the flush_every_ms configurable, or you may call flush / flush_async manually after operations.
  • Transactions are optimistic - do not interact with external state or perform IO from within a transaction closure unless it is idempotent.
  • Internal tree node optimizations: sled performs prefix encoding on long keys with similar prefixes that are grouped together in a range, as well as suffix truncation to further reduce the indexing costs of long keys. Nodes will skip potentially expensive length and offset pointers if keys or values are all the same length (tracked separately, don't worry about making keys the same length as values), so it may improve space usage slightly if you use fixed-length keys or values. This also makes it easier to use structured access as well.
  • sled does not support multiple open instances for the time being. Please keep sled open for the duration of your process's lifespan. It's totally safe and often quite convenient to use a global lazy_static sled instance, modulo the normal global variable trade-offs. Every operation is threadsafe, and most are implemented under the hood with lock-free algorithms that avoid blocking in hot paths.

performance

  • LSM tree-like write performance with traditional B+ tree-like read performance
  • over a billion operations in under a minute at 95% read 5% writes on 16 cores on a small dataset
  • measure your own workloads rather than relying on some marketing for contrived workloads

a note on lexicographic ordering and endianness

If you want to store numerical keys in a way that will play nicely with sled's iterators and ordered operations, please remember to store your numerical items in big-endian form. Little endian (the default of many things) will often appear to be doing the right thing until you start working with more than 256 items (more than 1 byte), causing lexicographic ordering of the serialized bytes to diverge from the lexicographic ordering of their deserialized numerical form.

  • Rust integral types have built-in to_be_bytes and from_be_bytes methods.
  • bincode can be configured to store integral types in big-endian form.

interaction with async

If your dataset resides entirely in cache (achievable at startup by setting the cache to a large enough value and performing a full iteration) then all reads and writes are non-blocking and async-friendly, without needing to use Futures or an async runtime.

To asynchronously suspend your async task on the durability of writes, we support the flush_async method, which returns a Future that your async tasks can await the completion of if they require high durability guarantees and you are willing to pay the latency costs of fsync. Note that sled automatically tries to sync all data to disk several times per second in the background without blocking user threads.

We support async subscription to events that happen on key prefixes, because the Subscriber struct implements Future<Output=Option<Event>>:

let sled = sled::open("my_db").unwrap();

let mut sub = sled.watch_prefix("");

sled.insert(b"a", b"a").unwrap();

extreme::run(async move {
    while let Some(event) = (&mut sub).await {
        println!("got event {:?}", event);
    }
});

minimum supported Rust version (MSRV)

We support Rust 1.62 and up.

architecture

lock-free tree on a lock-free pagecache on a lock-free log. the pagecache scatters partial page fragments across the log, rather than rewriting entire pages at a time as B+ trees for spinning disks historically have. on page reads, we concurrently scatter-gather reads across the log to materialize the page from its fragments. check out the architectural outlook for a more detailed overview of where we're at and where we see things going!

philosophy

  1. don't make the user think. the interface should be obvious.
  2. don't surprise users with performance traps.
  3. don't wake up operators. bring reliability techniques from academia into real-world practice.
  4. don't use so much electricity. our data structures should play to modern hardware's strengths.

known issues, warnings

  • if reliability is your primary constraint, use SQLite. sled is beta.
  • if storage price performance is your primary constraint, use RocksDB. sled uses too much space sometimes.
  • if you have a multi-process workload that rarely writes, use LMDB. sled is architected for use with long-running, highly-concurrent workloads such as stateful services or higher-level databases.
  • quite young, should be considered unstable for the time being.
  • the on-disk format is going to change in ways that require manual migrations before the 1.0.0 release!

priorities

  1. A full rewrite of sled's storage subsystem is happening on a modular basis as part of the komora project, in particular the marble storage engine. This will dramatically lower both the disk space usage (space amplification) and garbage collection overhead (write amplification) of sled.
  2. The memory layout of tree nodes is being completely rewritten to reduce fragmentation and eliminate serialization costs.
  3. The merge operator feature will change into a trigger feature that resembles traditional database triggers, allowing state to be modified as part of the same atomic writebatch that triggered it for retaining serializability with reactive semantics.

fund feature development

Like what we're doing? Help us out via GitHub Sponsors!

More Repositories

1

tla-rust

writing correct lock-free and distributed stateful systems in Rust, assisted by TLA+
TLA
1,040
star
2

rio

pure rust io_uring library, built on libc, thread & async friendly, misuse resistant
Rust
934
star
3

extreme

extremely boring async function runner!
Rust
151
star
4

paxos

simple CASPaxos implementation written in rust on top of a simulator for finding bugs quickly
Rust
139
star
5

loghisto

counters and logarithmically bucketed histograms for distributed systems
Go
82
star
6

rasputin

(getting to be a) hard to kill scalable linearizabe store
Rust
76
star
7

seaslug

scraps of a potential language
Rust
36
star
8

rsdb

rust database engineering toolkit
Rust
33
star
9

model

model testing sugar for testing interactions on structures over time
Rust
27
star
10

crack

building blocks and testing tools for reliable systems
Rust
27
star
11

historian

Rust
22
star
12

mesos-rs

Mesos bindings using the v1 HTTP API
Rust
17
star
13

paranoia

Rust
16
star
14

assert_panic_free

Rust
15
star
15

tx

software transactional memory in rust
Rust
13
star
16

sludge

speedy web micro-framework using sled, io_uring and SIMD
Rust
12
star
17

acid-state

rust transactional state library
Rust
11
star
18

quickcheck-tut

Rust
10
star
19

state-guide

overviews of consistency, availability and durability
8
star
20

slides

7
star
21

icefall

eventually consistent location-agnostic serverless and mobile database
7
star
22

deterministic

utilities for imposing determinism on random number generators, thread execution, etc...
Rust
7
star
23

triple

embeddable/standalone rust triple store
Rust
6
star
24

test-idioms

Rust
5
star
25

lathe

performant self-healing distributed micro-framework in rust
4
star
26

dots

Haskell
4
star
27

hardware-effects-rs

A Rust port of Kobzol/hardware-effects demonstrating performance issues
4
star
28

inlinable-box

auto-box things that fit in a usize
Rust
4
star
29

lfsr

simple linear feedback shift registers for test case generation
Rust
4
star
30

programming-in-haskell

exercises, experiments, examples
Haskell
4
star
31

llrb-rs

left-leaning red-black tree
Rust
4
star
32

idgen.go

very fast monotonic ID generator
3
star
33

dist-sys

well-tested flexible distributed systems building blocks
3
star
34

eurorack

patches for the droid eurorack modular sequencer
3
star
35

rust-reactive-log

performant transactional composable persistence
Rust
3
star
36

zk-glove

command line tools for distributed orchestration and discovery
Go
3
star
37

hemlock

transactional distributed store with SSI on mesos
C++
2
star
38

rust-srv

simple portable libresolv-backed SRV record resolution
Rust
2
star
39

rust-futurepool

simple future pool library for predictable concurrency
Rust
2
star
40

tikv-client

TiKV client powered by Tokio
Rust
2
star
41

belasitsa

a tiny green WSGI back end for Mongrel 2
Python
2
star
42

ak47

extremely reliable networked systems toolkit
2
star
43

quark

distributed liveness telemetry
Erlang
2
star
44

determine-disk-block-size

2
star
45

sitecache

transparent distributed caching proxy for distribution of static files in a datacenter
Go
2
star
46

googleapis-rs

mechanically generated rust gRPC bindings for the google APIs
Rust
2
star
47

quick

property testing with shrinking, drop-in compatible with testing/quick
Go
2
star
48

cyborg-training-camp

C
1
star
49

faultcheck

quickcheck + black box fault injection
1
star
50

disruptors

dirsuptor patterns in different languages
1
star
51

id-gen

Distributed system simulation: ID generator
Rust
1
star
52

cavemon

external process instrumentation from the terminal
1
star