• Stars
    star
    214
  • Rank 184,678 (Top 4 %)
  • Language
    Rust
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated almost 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Write Ahead Log (WAL) implementation in Rust

OkayWAL

A write-ahead log (WAL) implementation for Rust.

There's The Great Wall, and then there's this: an okay WAL.

WARNING: This crate is early in development. Please do not use in any production projects until this has been incorporated into Sediment and shipping as part of Nebari. The file format is currently considered unstable.

okaywal forbids unsafe code crate version Live Build Status HTML Coverage Report for main branch Documentation

This crate exposes a WAL that supports:

  • Atomic and Durable writes from multiple threads.
  • Random access for previously written data.
  • Automatic checkpointing to allow reusing disk space and preventing the WAL from growing too large.
  • Interactive recovery process with basic data versioning support.

Basic How-To

WriteAheadLog::recover() is used to create or recover a WAL in a given directory. To open a log, an implementer of LogManager must be provided. This trait is how OkayWAL communicates with your code when recovering or checkpointing a log.

The basic example shows this process with many comments describing how OkayWAL works.

// Open a log using a Checkpointer that echoes the information passed into each
// function that the Checkpointer trait defines.
let log = WriteAheadLog::recover("my-log", LoggingCheckpointer)?;

// Begin writing an entry to the log.
let mut writer = log.begin_entry()?;

// Each entry is one or more chunks of data. Each chunk can be individually
// addressed using its LogPosition.
let record = writer.write_chunk("this is the first entry".as_bytes())?;

// To fully flush all written bytes to disk and make the new entry
// resilient to a crash, the writer must be committed.
writer.commit()?;

Multi-Threaded Writing

Optimized writing to the log from multiple threads is handled automatically. Only one thread may access the active log file at any moment in time. Because the slowest part of writing data to disk is fsync, OkayWAL manages synchronizing multiple writers such that a single fsync call can be made for multiple writes.

This can be demonstrated by running the benchmark suite: cargo bench -p benchmarks:

commit-256B

Label avg min max stddev out%
okaywal-01t 1.001ms 617.5us 7.924ms 557.3us 0.016%
okaywal-02t 1.705ms 617.3us 11.38ms 912.1us 0.006%
okaywal-04t 1.681ms 622.4us 9.688ms 671.4us 0.021%
okaywal-08t 1.805ms 656.5us 13.88ms 1.001ms 0.014%
okaywal-16t 1.741ms 643.2us 7.895ms 796.4us 0.028%

commit-1KB

Label avg min max stddev out%
okaywal-01t 959.3us 621.9us 7.419ms 584.4us 0.012%
okaywal-02t 1.569ms 627.5us 7.986ms 1.007ms 0.028%
okaywal-04t 1.856ms 650.5us 11.14ms 1.087ms 0.017%
okaywal-08t 2.054ms 697.3us 11.04ms 1.066ms 0.021%
okaywal-16t 1.875ms 641.5us 8.193ms 674.6us 0.032%

commit-4KB

Label avg min max stddev out%
okaywal-01t 1.242ms 748.8us 6.902ms 982.4us 0.008%
okaywal-02t 1.767ms 761.9us 8.986ms 902.1us 0.016%
okaywal-04t 2.347ms 787.1us 8.853ms 1.084ms 0.016%
okaywal-08t 2.798ms 810.8us 12.53ms 1.168ms 0.014%
okaywal-16t 2.151ms 840.5us 14.74ms 1.201ms 0.008%

commit-1MB

Label avg min max stddev out%
okaywal-01t 7.018ms 5.601ms 9.865ms 788.2us 0.027%
okaywal-02t 11.06ms 4.281ms 20.14ms 3.521ms 0.000%
okaywal-04t 19.77ms 5.094ms 73.21ms 8.794ms 0.007%
okaywal-08t 25.06ms 2.871ms 97.60ms 17.33ms 0.002%
okaywal-16t 19.01ms 3.480ms 58.85ms 7.195ms 0.014%

These numbers are the time taken for a single thread to perform an atomic and durable write of a given size to the log file. Despite using a single-file approach, we are able to keep average write times very low even with a large number of simultaneous writers.

How OkayWAL works

OkayWAL streams incoming data into "segments". Each segment file is named with the format wal-{id}. The id of a segment file refers to the first EntryId that could appear within the segment file.

Segment files are pre-allocated to the length configured in Configuration::preallocate_bytes. Preallocating files is critical for performance, as overwriting existing bytes in general is less expensive than allocating new bytes on disk.

OkayWAL always has a current segment file. When a new entry is written, it always goes to the current segment file. When an entry is completed, the length of the segment file is checked against Configuration::checkpoint_after_bytes. If enough data has been written to trigger a checkpoint, the file is sent to the checkpointing thread and a new segment file is activated.

Regardless of whether the file is checkpointed, before control returns from committing an entry, the file is fsynced. fsync operations are batched, allowing multiple entries to be written by separate threads during the same fsync operation.

OkayWAL also keeps track of any time a new file is created or a file is renamed. As needed, the directory containing the write-ahead logs is also fsynced to ensure necessary file and directory metadata is fully synchronized. Just like file fsync batching, OkayWAL also automatically batches directory fsyncs across threads.

Checkpointing a segment file (Background Thread)

The checkpointing thread holds a weak reference to the WriteAheadLog data. When a file is received by the thread to checkpoint, it will upgrade the weak reference. If it cannot, the checkpointing thread shuts down gracefully and the recovery process will send the file again for checkpointing the next time the log is opened.

The thread invokes LogManager::checkpoint_to for the file, allowing the LogManager to make any needed changes to persist the data stored in the segment being checkpointed.

After the LogManager finishes, the file is renamed to include -cp as its suffix. Until this step, readers are able to be opened against data stored in the file being checkpointed. Once the file is renamed, new readers will begin returning not found errors.

After the file is renamed, the checkpointer waits for all outstanding readers to finish reading data. The file is then finally recycled by moving it to the inactive files list.

Activating a new segment file

If there are any files in the inactive files list, one is reused. Otherwise, a new file is created and filled with 0's to the configured preallocation length.

The file's name is set to wal-{next EntryId}. For example, a brand new write-ahead log's first segment file will be named wal-1, and the first EntryId written will be 1.

Segment File Format

Each segment file starts with this header:

  • okw: Three byte magic code
  • OkayWAL Version: Single byte version number. Currently 0.
  • Configuration::version_info length: Single byte. The embedded information must be 255 or less bytes long.
  • Embedded Version Info: The bytes of the version info. The previous byte controls how many bytes long this field is.

After this header, the file is a series of entries, each which contain a series of chunks. A byte with a value of 1 signifies a new entry. Any other byte causes the reader to stop reading entries from the file.

The first 8 bytes of the entry are the little-endian representation of its EntryId.

After the EntryId, a series of chunks is expected. A byte with a value of 2 signals that a chunk is next in the file. A byte with a value of 3 signals that this is the end of the current entry being written. Any other byte causes the SegmentReader to return an AbortedEntry result. Any already-read chunks from this entry should be ignored/rolled back by the LogManager.

The first four bytes of a chunk are the data length in little-endian representation. The data for the chunk follows.

Finally, a four-byte CRC-32 ends the chunk.

If a reader does not encounter a new chunk marker (2) or an end-of-entry marker (3), the entry should be considered abandoned and all chunks should be ignored.

Open-source Licenses

This project, like all projects from Khonsu Labs, are open-source. This repository is available under the MIT License or the Apache License 2.0.

To learn more about contributing, please see CONTRIBUTING.md.

More Repositories

1

bonsaidb

A developer-friendly document database that grows with you, written in Rust
Rust
1,018
star
2

cushy

An experimental cross-platform graphical user interface (GUI) crate for Rust.
Rust
462
star
3

nebari

A pure Rust database implementation using an append-only B-Tree file format.
Rust
266
star
4

justjson

An efficient JSON Value parser
Rust
75
star
5

sediment

A low-level MVCC file format for storing blobs.
Rust
63
star
6

kludgine

A kludgey 2d game engine written in Rust atop wgpu
Rust
62
star
7

pot

A concise, self-describing binary format written in Rust for Serde
Rust
53
star
8

budlang

A safe, fast, lightweight embeddable scripting language written in Rust.
Rust
20
star
9

cosmicverge

A systematic, sandbox MMO still in the concept phase. Will be built with Rust atop BonsaiDb and Gooey
HTML
18
star
10

lrumap

A set of safe Least Recently Used (LRU) map/cache types for Rust
Rust
16
star
11

easygpu

A small wrapper around wgpu to make some things simpler
Rust
16
star
12

fabruic

An easy-to-use QUIC-based protocol that supports reliable payload delivery.
Rust
15
star
13

basws

A basis for building an async client-server Websocket protocol
Rust
15
star
14

circulate

Lightweight async PubSub framework
Rust
12
star
15

rsn

Rust
11
star
16

refuse

An easy-to-use, incremental, multi-threaded garbage collector for Rust
Rust
9
star
17

pulldown-cmark-frontmatter

A Frontmatter extractor for Markdown documents built atop pulldown-cmark
Rust
8
star
18

ordered-varint

Variable-length signed and unsigned integer encoding that is byte-orderable for Rust
Rust
8
star
19

transmog

Utilities for serializing with multiple formats of data in Rust.
Rust
8
star
20

figures

A math library specialized for 2d screen graphics.
Rust
8
star
21

delve-rs

A crate search engine powered by Rust and BonsaiDb
Rust
7
star
22

watchable

An observable RwLock-like type for Rust that is compatible with both multi-threaded and async code
Rust
7
star
23

ncog

Rust
7
star
24

alot

A forbid-unsafe, generational slot map with usize-sized IDs
Rust
5
star
25

interner

Rust
5
star
26

nominals

Formatting of nominal identifiers in various languages (ordered list headings)
Rust
5
star
27

muse

Rust
5
star
28

minority-game

A mini-game demo of BonsaiDb + Gooey
Rust
4
star
29

englishid

English formatting for unsigned integer IDs
Rust
4
star
30

rustme

A tool for managing a Rust project's README-like files.
Rust
4
star
31

cityhasher

Rust
4
star
32

custodian

End-user secret management with Rust. Early WIP.
Rust
3
star
33

kempt

A #[forbid_unsafe] ordered collection crate for Rust
Rust
3
star
34

dossier

An artifact server built with BonsaiDb
Rust
3
star
35

gooey-attempt-2

The second attempt at the `gooey` crate for Rust. Never published to crates.io.
Rust
3
star
36

magrathea

A pixel-art-style planet generator
Rust
2
star
37

muse-archive

Virtual instrument synthesizer library for rust built atop cpal
Rust
2
star
38

actionable

An enum-based async framework for building permission-driven APIs
Rust
2
star
39

async-handle

A reference-counted async RwLock for Rust
Rust
2
star
40

easing-function

Rust
2
star
41

pixaxe

A scriptable, indexed pixel art editor written in Rust
Rust
2
star
42

appit

An opinionated wrapper for `winit` utilizing a single-thread-per-window model
Rust
2
star
43

watchfile

An easy-to-use automatic file reloader.
Rust
1
star
44

async-locking-benchmarks

Rust
1
star
45

khonsubase

Khonsubase is a lightweight project management and knowledgebase collaboration tool written in Rust.
HTML
1
star
46

budget-executor

An approach to "throttling" async tasks in Rust using manual instrumentation
Rust
1
star
47

pot-diff

Rust
1
star
48

yew-bulma

Yew wrappers for the Bulma CSS framework
Rust
1
star
49

ncog-archived

A consumer-centric, privacy and safety-forward authentication service written with game developers in mind
CSS
1
star
50

console-thingy

Rust
1
star
51

approximint

A large integer Rust crate using an approximate representation inspired by scientific notation
Rust
1
star
52

FunnyBones

A simple 2D kinematics library for Rust
Rust
1
star
53

cushy-clicker

A proof-of-concept library for creating clicker/incremental games in Rust using Cushy
Rust
1
star
54

gooey-canvas

A Canvas widget for the `Gooey` UI framework
Rust
1
star
55

file-manager

Rust
1
star
56

gooey-attempt-1

The first attempt at the `gooey` crate for Rust. Never published to crates.io.
Rust
1
star