• Stars
    star
    256
  • Rank 159,219 (Top 4 %)
  • Language
    Rust
  • License
    MIT License
  • Created almost 7 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

♒︎ [WIP] An experimental ~distributed~ commit-log

An experimental distributed streaming platform

Status

Currently, working in the foundation of the storage layer.

Found an issue? Feel like contributing? Make sure to check out our contributing guide first.

To know more about component internals, performance and references, please check the architecture internals documentation.

Project Goals

  • Learn
  • Implement a Kinesis-like streaming-service
  • Single binary
  • Easy to Host, Run & Operate

Commands

Available make commands

  • make build - Builds the application with cargo
  • make build_release - Builds the application with cargo, with release optimizations
  • make docker_test_watcher - Runs funzzy on linux over docker-compose
  • make docs - Generates the GitHub Markdown docs (At the moment only mermaid)
  • make format - Formats the code according to cargo
  • make help - Lists the available commands
  • make install - Builds a release version and installs to your cago bin path
  • make run - Runs the newly built
  • make test - Tests all features

Architecture

At this point, we have only the foundation of the Storage layer implemented. The other parts of the above picture are for demonstration purposes of future componentes.

Storage

The storage layer is where the data is persisted for long-term reading.

CommitLog

The main component of the whole system is the commit-log, an abstraction manages reads and writes to the log by implementing an immutable, append-only, file-backed sequence of "records", or chunks of data/events that are transmited from producers to consumers.

Records can be written to the log, always appending the last record over and over.

e.g.:

                          current cursor
 segment 0                       ^
 |-------------------------------|
 | record 0  |  record 1  |  ... |  --> time
 |-------------------------------|

In order to manage and scale read and writes, the commit-log split groups of records into Segments, managing to write to a single segment until it reaches a certain, specified size.

Each time a record is written, the segment is trusted to have enough space for the given buffer, then the record is written to the current segment, and the pointer is updated.

More info in the commit_log/src/lib.rs file.

Segment

A Segment is a tuple abstraction to manage the Index and Log files.

Every Segment is composed of a log-file and an index, e.g.:

00000000000011812312.log
00000000000011812312.idx

The role of the segment is to manage writes to the logfile and ensure the entries can be read later on by doing lookups in the index.

On every write, the segment writes an entry to the index with the record's position and size, in the log-file, for later use.

The segment also manages the size of the log file, preventing it from being written once it reaches the specified.

When a segment is full, the commit log makes sure to rotate to a new one, closing the old one.

See how it looks like on disk (on a high-level):

                                                       current cursor
segment 0                                                     ^
|-------------------------------|                             |
| record 0  |  record 1  |  ... | segment 1 (current)         |
|-------------------------------|-----------------------------| --> time
                                |  record 2  | record 3 | ... |
                                |-----------------------------|

Under the hood is a bit more complex, the management of writing to the file to disk is of the Segments', as well as managing the Index file.

More info in the commit_log/src/segment.rs and commit_log/src/segment/index.rs and log.rs files.

Log file

The log file is a varied-size sequence of bytes that is storing the content of the records produced by the producers. However, the log itself doesn't have any mechanism for recovery of such records. That's responsibility of the index.

Once initialized, the log-file is truncated to reach the desired value and reserve both memory and disk space, the same for the index.

                         current cursor
                                ^
|-------------------------------|
| record 0  |  record 1  |  ... |----> time
|-------------------------------|

Neither reads nor writes to the index are directly triggering disk-level actions.

Both operations are being intermediated by a memory-mapping buffers, managed by the OS.

More info in the commit_log/src/segment/log.rs file.

Index file

The role of the index is to provide pointers to records in the log file. Each entry of the index is 20 bytes long, 10 bytes are used for the offset address of the record in the log file, the other 10 bytes for the size of the record.

e.g.:

                          current cursor
                                 ^
 |-------------------------------|
 | offset-size | offset-size |...|----> time
 |-------------------------------|

There is no separator, it's position-based.

e.g.:

00000001000000000020
---------------------
  offset  |  size

* 000000010 -> offset
* 000000020 -> size

Neither reads nor writes to the index are directly triggering disk-level actions.

Both operations are being intermediated by a memory-mapping buffers, managed by the OS.

More info in the commit_log/src/segment/index.rs file.

Performance

These are preliminar and poorly collected results, yet it looks interesting:

Storage (Tests are completely offline, no network¹ ...)

  • Setup 1:
OS: macOS Mojave 10.14.4 (18E226)
CPU: 2,5 GHz Intel Core i7
RAM: 16 GB 2133 MHz LPDDR3
HD: 256 GB SSD Storage
---------------
Segment size: 20 MB
Index size: 10 MB
5 GB worth records written in 37.667706s
5 GB worth cold records read in 1.384433s
5 GB worth warm records read in 1.266053s

Per-segment²:

  • ~130 MB/s on write

  • ~3.7 GB/s on cold read (while loading into memory pages)

  • ~4.2 GB/s on warm read (already loaded into memory pages)

  • Setup 2:

OS: macOS Mojave 10.14.5 (18F203)
CPU: 2,9 GHz Intel Core i9
RAM: 32 GB 2400 MHz DDR4
HD: 500 GB SSD Storage
---------------
Segment size: 20 MB
Index size: 10 MB
5 GB worth records written in 26.851791s
5 GB worth cold records read in 141.969ms
5 GB worth warm records read in 124.623ms

Per-segment²:

  • ~187 MB/s on write

  • ~35 GB/s on cold read (while loading into memory pages)

  • ~41 GB/s on warm read (already loaded into memory pages)

  • Setup 3:

OS: macOS Mojave 10.14.5 (18F203)
CPU: 2,9 GHz Intel Core i9
RAM: 32 GB 2400 MHz DDR4
HD: 500 GB SSD Storage
---------------
Segment size: 50 MB
Index size: 20 MB
10 GB worth records written in 54.96796s
10 GB worth cold records read in 437.933ms
10 GB worth warm records read in 310.853ms

Per-segment²:

  • ~181 MB/s on write
  • ~22 GB/s on cold read (while loading into memory pages)
  • ~21 GB/s on warm read (already loaded into memory pages)

Notes:

  • ¹ - Offline - no network overhead taken into account, network will be a big player on the overhead. However, the focus now is storage.
  • ² - Per-segment performance, in a comparinson with kinesis/kafka that would be the per-shard value. If you were to have 10 shards, you could achieve 10x that, limited by external factors, HD/CPU/...

References

More Repositories

1

bojack

🐴 The unreliable key-value store
Crystal
107
star
2

awesome-expat

🌎 A curated list of resources for expats
90
star
3

sre

📚 Index for my study topics
Makefile
61
star
4

awesome-osx

📦 Awesome stuff for OSX
Crystal
57
star
5

logisim-7-segment-display-driver

➿ TTL-7447-like implementation for logisim
37
star
6

spotify.cr

🎧 A Crystal wrapper for the Spotify Web API
Crystal
35
star
7

rust-elm

Rust + Elm Web Application Template with Multistage Docker Build
Dockerfile
29
star
8

awsudo

> sudo-like behavior for role assumed access on AWS accounts
Rust
21
star
9

pdf.cr

📝 PDF writer for Crystal
Crystal
20
star
10

BVGame

💛 an unofficial BVG Stations Game
Elm
19
star
11

fish

🐟 A Sails bundle to easily deliver web apps! [UNMAINTENED AND DEPRECATED]
JavaScript
18
star
12

confirm-before

✅ Are you sure? No? Confirm Before - Sanity check for your shell commands
Rust
16
star
13

satriani

🎸 A micro-framework for creating REST-like APIs in Crystal
Crystal
13
star
14

certifications

📒 My binder for certification studies
HCL
12
star
15

spread_blood

💉 Winner of #HackInPoA 2015
Ruby
11
star
16

power-metal

PowerMetal Lyric Generator
Crystal
10
star
17

m3u8.cr

Generate and parse m3u8 playlists for HLS
Crystal
9
star
18

freak_tags

📺 What have you been watching ?
Ruby
8
star
19

SobrinhoPHP

SobrinhoPHP, se você sabe sabe, caso contrário SobrinhoPHP.
CSS
8
star
20

ffmpeg.cr

🎥 FFmpeg wrapper for Crystal
Crystal
7
star
21

crowd

👥 [WIP] An experimental High Available Reverse Proxy for Massive Asynchronous Message Consumption
Go
6
star
22

blew

📑 A developers code-sharing tool
Ruby
6
star
23

srt.cr

Generate and Parse SRT files
Crystal
6
star
24

kurz-old

🚀 A blazing fast URL Shortener
Scala
6
star
25

option.cr

ƛ Crystal implementation of Scala's Option Monad
Crystal
6
star
26

tmpdir.cr

🐵 Monkey Patch for tmp dir creation on Crystal standard library
Crystal
5
star
27

vhdl-examples

Unisinos class of Electronics Engineering
VHDL
5
star
28

coursera-machine-learning

My code for the programming exercises in the Stanford University Machine Learning class on Coursera.
MATLAB
5
star
29

terminal-notifier.cr

📢 Terminal Notifier binds for Crystal Lang
Crystal
5
star
30

CS8803

🎓 Georgia Tech - Introduction to Operating Systems
C
4
star
31

shortify

URL Shortener API assignment
Ruby
4
star
32

serial.cr

Serial port bindings for Crystal
Crystal
4
star
33

hffp

ƛ My solutions for the Haskell From First Principles book
Haskell
4
star
34

spec2-nc.cr

📢 Spec2 tests with OSX notifications
Crystal
3
star
35

aws-lambda-datadog

A small library to facilitate sending metrics from an aws lambda function to datadog
JavaScript
3
star
36

satriani-example

This is a simple website built with Satriani.
Crystal
3
star
37

prezzo

💰Toolbox to create complex pricing models
Ruby
3
star
38

old-dot-files

My dot-files
Shell
3
star
39

trpl

🔩 My My notepad for The Rust Programming Language Book
Rust
3
star
40

terraform-provider-statuscake

[DEPRECATED] StatusCake Finally created a terraform provider, still not great! 🍰 A custom statuscake provider that points to API V1, providing support to new features.
Go
3
star
41

secret_advisor

👤 An elegant and simple way to to send secret advices to co-workers without drawing too much attention, focused on small teams.
Ruby
3
star
42

eng-class-data-structures

C
2
star
43

numerical-methods

2014/2
MATLAB
2
star
44

blog

My Personal Blog with Jekyyl
CSS
2
star
45

kurz

[WIP] [RESEARCH] 🚀 A URL Shortener so there are no URL Shorteners
Dockerfile
2
star
46

git.cr

:octocat: Git binary wrapper for Crystal [WIP]
Crystal
2
star
47

unisinos-microprocessors

C
2
star
48

electronic-circuits-1

M
2
star
49

stonks

📈 A stock display for Arduino
C++
2
star
50

bojack-kemal-heroku

Crystal
2
star
51

miller-urey

MillerUrey is a system to manage controlled environments
C
2
star
52

arnode

Arduino + Node (WARNING: POC, you should use johnny-five)
JavaScript
2
star
53

euler

ƛ Project Euler solved in some different languages
C
2
star
54

deutschland

🇩🇪
Makefile
2
star
55

dev

HTML
2
star
56

tmux-ruby-version

Just a simple tmux plugin to show the ruby version present on your environment
Shell
2
star
57

eng-class-alg

C
2
star
58

ruby-conference-tracks

Simple Tracks Manager for conferences
Ruby
1
star
59

mxcursos-git

Repositórios para os Alunos do Curso de Git e GitHub
1
star
60

marceloboeira.github.io

My Personal Website, made it with Jekyll.
CSS
1
star
61

marceloboeira

1
star
62

eng-class-sis-dig-01

1
star
63

cdkworkshop

TypeScript
1
star
64

e2e-provisioning

HCL
1
star
65

sixpack

A Docker Image with Sixpack ready to go.
1
star
66

dull

🤪 a dummy multi-thread HTTP server
Rust
1
star