• Stars
    star
    521
  • Rank 82,030 (Top 2 %)
  • Language
    Rust
  • License
    Apache License 2.0
  • Created almost 4 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A persistent storage engine for Multi-Raft log

Raft Engine

Rust codecov Docs crates.io

Raft Engine is a persistent embedded storage engine with a log-structured design similar to bitcask. It is built for TiKV to store Multi-Raft logs.

Features

  • APIs for storing and retrieving protobuf log entries with consecutive indexes
  • Key-value storage for individual Raft Groups
  • Minimum write amplification
  • Collaborative garbage collection
  • Supports lz4 compression over log entries
  • Supports file system extension

Design

Raft Engine consists of two basic constructs: memtable and log file.

In memory, each Raft Group holds its own memtable, containing all the key value pairs and the file locations of all log entries. On storage, user writes are sequentially written to the active log file, which is periodically rotated below a configurable threshold. Different Raft Groups share the same log stream.

Write

Similar to RocksDB, Raft Engine provides atomic writes. Users can stash the changes into a log batch before submitting.

The writing of one log batch can be broken down into three steps:

  1. Optionally compress the log entries
  2. Write to log file
  3. Apply to memtable

At step 2, to group concurrent requests, each writing thread must enter a queue. The first in line automatically becomes the queue leader, responsible for writing the entire group to the log file.

Both synchronous and non-sync writes are supported. When one write in a batch is marked synchronous, the batch leader will call fdatasync() after writing. This way, buffered data is guaranteed to be flushed out onto the storage.

After its data is written, each writing thread will proceed to apply the changes to memtable on their own.

Garbage Collection

After changes are applied to the local state machine, the corresponding log entries can be compacted from Raft Engine, logically. Because multiple Raft Groups share the same log stream, these truncated logs will punch holes in the log files. During garbage collection, Raft Engine scans for these holes and compacts log files to free up storage space. Only at this point, the unneeded log entries are deleted physically.

Raft Engine carries out garbage collection in a collaborative manner.

First, its timing is controlled by the user. Raft Engine consolidates and removes its log files only when the user voluntarily calls the purge_expired_files() routine. For reference, TiKV calls it every 10 seconds by default.

Second, it sends useful feedback to the user. Each time the GC routine is called, Raft Engine will examine itself and return a list of Raft Groups that hold particularly old log entries. Those log entries block the GC progress and should be compacted by the user.

Using this crate

Put this in your Cargo.toml:

[dependencies]
raft-engine = "0.3.0"

Available Cargo features:

  • scripting: Compiles with Rhai. This enables script debugging utilities including unsafe_repair.
  • nightly: Enables nightly-only features including test.
  • internals: Re-exports key components internal to Raft Engine. Enabled when building for docs.rs.
  • failpoints: Enables fail point testing powered by tikv/fail-rs.
  • swap: Use SwappyAllocator to limit the memory usage of Raft Engine. The memory budget can be configured with "memory-limit". Depending on the nightly feature.

See some basic use cases under the examples directory.

Contributing

Contributions are always welcome! Here are a few tips for making a PR:

  • All commits must be signed off (with git commit -s) to pass the DCO check.
  • Tests are automatically run against the changes, some of them can be run locally:
# run tests with nightly features
make
# run tests on stable toolchain
make WITH_STABLE_TOOLCHAIN=force
# filter a specific test case
make test EXTRA_CARGO_ARGS=<testname>
  • For changes that might induce performance effects, please quote the targeted benchmark results in the PR description. In addition to micro-benchmarks, there is a standalone stress test tool which you can use to demonstrate the system performance.
cargo +nightly bench --all-features <bench-case-name>
cargo run --release --package stress -- --help

License

Copyright (c) 2017-present, PingCAP, Inc. Released under the Apache 2.0 license. See LICENSE for details.

More Repositories

1

tikv

Distributed transactional key-value database, originally created to complement TiDB
Rust
14,548
star
2

raft-rs

Raft distributed consensus algorithm implemented in Rust.
Rust
2,785
star
3

grpc-rs

The gRPC library for Rust built on C Core library and futures
Rust
1,775
star
4

pprof-rs

A Rust CPU profiler implemented with the help of backtrace-rs
Rust
1,198
star
5

pd

Placement driver for TiKV
Go
1,022
star
6

rust-prometheus

Prometheus instrumentation library for Rust applications
Rust
1,017
star
7

agatedb

A persistent key-value storage in rust.
Rust
793
star
8

minitrace-rust

Extremely fast tracing library for Rust
Rust
659
star
9

titan

A RocksDB plugin for key-value separation, inspired by WiscKey.
C++
466
star
10

client-rust

Rust Client for TiKV.
Rust
368
star
11

fail-rs

Fail points for rust
Rust
322
star
12

rust-rocksdb

rust wrapper for rocksdb
Rust
273
star
13

client-go

Go client for TiKV
Go
268
star
14

minstant

Performant time measuring in Rust
Rust
151
star
15

yatp

Yet another thread pool in rust for both callbacks or futures.
Rust
129
star
16

client-java

TiKV Java Client
Java
106
star
17

deep-dive-tikv

How do we build a distributed, transactional key-value database - TiKV?
HTML
97
star
18

rfcs

RFCs for changes to TiKV and its ecosystem
76
star
19

auto-tikv

Tool to tune TiKV with ML method
Python
66
star
20

sig-transaction

Resources for the transaction SIG
61
star
21

async-speed-limit

Asynchronously speed-limiting multiple byte streams
Rust
53
star
22

minitrace-go

A high-performance timeline tracing library for Golang, used by TiDB
Go
45
star
23

community

TiKV community content
43
star
24

client-c

The C++ TiKV client used by TiFlash.
C++
41
star
25

crc64fast

SIMD accelerated CRC-64-ECMA computation
Rust
40
star
26

migration

Migration tools for TiKV, e.g. online bulk load.
Go
34
star
27

tikv-dev-guide

The TiKV development/contribution guide
34
star
28

client-py

Rust
26
star
29

importer

tikv-importer is a front-end to help ingesting large number of KV pairs into a TiKV cluster
Rust
20
star
30

website

Website for tikv.org
HTML
19
star
31

tikv-operator

Go
19
star
32

protobuf-build

Rust
17
star
33

client-cpp

TiKV Client for C++
Rust
15
star
34

client-node

Rust
11
star
35

mur3

Rust implementation of MurmurHash3.
Rust
10
star
36

copr-test

Go
9
star
37

mock-tikv

A mocked TiKV server for testing clients that written in different languages.
Go
6
star
38

slog-global

Global loggers for slog-rs. Similar to slog-scope but more simple.
Rust
5
star
39

match-template

match-template is a procedural macro that generates repeated match arms by pattern.
Rust
5
star
40

jepsen-test

Clojure
5
star
41

terraform-tikv-bench

An Orcestrated TiKV benchmark. Not for production deployment.
HCL
4
star
42

skiplist-rs

Rust
4
star
43

rusoto

AWS SDK for Rust
Rust
3
star
44

client-validator

Provide functional checks for tikv client implementations in different languages.
Go
3
star
45

tracing-active-tree

Rust
3
star
46

tlaplus-specs

TiKV TLA+ specifications
TLA
3
star