• Stars
    star
    134
  • Rank 262,194 (Top 6 %)
  • Language
    Go
  • License
    MIT License
  • Created over 1 year ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

(educational) build your own disk based KV store in Go

logo

CaskDB - Disk based Log Structured Hash Table Store

made-with-go build codecov MIT license twitter@iavins

architecture

CaskDB is a disk-based, embedded, persistent, key-value store based on the Riak's bitcask paper, written in Go. It is more focused on the educational capabilities than using it in production. The file format is platform, machine, and programming language independent. Say, the database file created from Go on macOS should be compatible with Rust on Windows.

This project aims to help anyone, even a beginner in databases, build a persistent database in a few hours. There are no external dependencies; only the Go standard library is enough.

If you are interested in writing the database yourself, head to the workshop section.

Features

  • Low latency for reads and writes
  • High throughput
  • Easy to back up / restore
  • Simple and easy to understand
  • Store data much larger than the RAM

Limitations

Most of the following limitations are of CaskDB. However, there are some due to design constraints by the Bitcask paper.

  • Single file stores all data, and deleted keys still take up the space
  • CaskDB does not offer range scans
  • CaskDB requires keeping all the keys in the internal memory. With a lot of keys, RAM usage will be high
  • Slow startup time since it needs to load all the keys in memory

Dependencies

CaskDB does not require any external libraries to run. Go standard library is enough.

Installation

go get github.com/avinassh/go-caskdb

Usage

store, _ := NewDiskStore("books.db")
store.Set("othello", "shakespeare")
author := store.Get("othello")

Cask DB (Python)

This project is a Go version of the same project in Python.

Prerequisites

The workshop is for intermediate-advanced programmers. Knowing basics of Go helps, and you can build the database in any language you wish.

Not sure where you stand? You are ready if you have done the following in any language:

  • If you have used a dictionary or hash table data structure
  • Converting an object (class, struct, or dict) to JSON and converting JSON back to the things
  • Open a file to write or read anything. A common task is dumping a dictionary contents to disk and reading back

Workshop

NOTE: I don't have any workshops scheduled shortly. Follow me on Twitter for updates. Drop me an email if you wish to arrange a workshop for your team/company.

CaskDB comes with a full test suite and a wide range of tools to help you write a database quickly. A Github action is present with an automated tests runner. Fork the repo, push the code, and pass the tests!

Throughout the workshop, you will implement the following:

  • Serialiser methods take a bunch of objects and serialise them into bytes. Also, the procedures take a bunch of bytes and deserialise them back to the things.
  • Come up with a data format with a header and data to store the bytes on the disk. The header would contain metadata like timestamp, key size, and value.
  • Store and retrieve data from the disk
  • Read an existing CaskDB file to load all keys

Tasks

  1. Read the paper. Fork this repo and checkout the start-here branch
  2. Implement the fixed-sized header, which can encode timestamp (uint, 4 bytes), key size (uint, 4 bytes), value size (uint, 4 bytes) together
  3. Implement the key, value serialisers, and pass the tests from format_test.go
  4. Figure out how to store the data on disk and the row pointer in the memory. Implement the get/set operations. Tests for the same are in disk_store_test.go
  5. Code from the task #2 and #3 should be enough to read an existing CaskDB file and load the keys into memory

Run make test to run the tests locally. Push the code to Github, and tests will run on different OS: ubuntu, mac, and windows.

Not sure how to proceed? Then check the hints file which contains more details on the tasks and hints.

Hints

  • Not sure how to come up with a file format? Read the comment in the format file

What next?

I often get questions about what is next after the basic implementation. Here are some challenges (with different levels of difficulties)

Level 1:

  • Crash safety: the bitcask paper stores CRC in the row, and while fetching the row back, it verifies the data
  • Key deletion: CaskDB does not have a delete API. Read the paper and implement it
  • Instead of using a hash table, use a data structure like the red-black tree to support range scans
  • CaskDB accepts only strings as keys and values. Make it generic and take other data structures like int or bytes.

Level 2:

  • Hint file to improve the startup time. The paper has more details on it
  • Implement an internal cache which stores some of the key-value pairs. You may explore and experiment with different cache eviction strategies like LRU, LFU, FIFO etc.
  • Split the data into multiple files when the files hit a specific capacity

Level 3:

  • Support for multiple processes
  • Garbage collector: keys which got updated and deleted remain in the file and take up space. Write a garbage collector to remove such stale data
  • Add SQL query engine layer
  • Store JSON in values and explore making CaskDB as a document database like Mongo
  • Make CaskDB distributed by exploring algorithms like raft, paxos, or consistent hashing

Line Count

$ tokei -f format.go disk_store.go
===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Go                      2          320          133          168           19
-------------------------------------------------------------------------------
 format.go                          111           35           67            9
 disk_store.go                      209           98          101           10
===============================================================================
 Total                   2          320          133          168           19
===============================================================================

License

The MIT license. Please check LICENSE for more details.

More Repositories

1

rockstar

Makes you a Rockstar C++ Programmer in 2 minutes
Python
4,147
star
2

py-caskdb

(educational) build your own disk based KV store
Python
968
star
3

grpc-errors

A handy guide to gRPC errors
C#
549
star
4

fast-sqlite3-inserts

Some bunch of test scripts to generate a SQLite DB with 1B rows in fastest possible way
Rust
337
star
5

status

HTTP Status for Humans
Python
276
star
6

gg-flip

Highly performant Javascript library to flip the signs
HTML
174
star
7

pytorch-flask-api

Python
166
star
8

pytorch-flask-api-heroku

HTML
160
star
9

haxor

Unofficial Python wrapper for official Hacker News API
Python
153
star
10

slackipy

Automate user invites to your Slack channel!
Python
87
star
11

della

Della is a Django app for managing Secret Santa/Gift Exchange.
Python
48
star
12

Reddit-GoodReads-Bot

Python
32
star
13

little-finger

Clojure
31
star
14

cowin-assist

Telegram Bot to check covid vaccine slot availability on CoWin site
Python
30
star
15

prawoauth2

Helper library to make your life easier using OAuth2 for PRAW
Python
29
star
16

kylo

The FAQ Bot
Python
16
star
17

history-bleed

Check your browser history against Cloud Bleed
JavaScript
13
star
18

score-notify

Displays cricket score as notification. OS X Only.
Python
11
star
19

dev-startups-india

list of Indian startups working on building developer tools
9
star
20

fluvio-go

Rust
8
star
21

saveourcinema

code which powers http://saveourcinema.in
HTML
8
star
22

ares

Slack Moderator
Go
8
star
23

Hocus-Pocus

Simple OS X app which shows hidden files. Written in Swift.
Swift
8
star
24

breast-cancer-prediction

Jupyter Notebook
7
star
25

openshift-tornado-starter

Simple repo to get started on Openshift with Python Tornado
6
star
26

cemetery.io

HTML
5
star
27

sign-flip

JavaScript
5
star
28

isso-openshift

This repo helps you install Isso on Openshift with a single click
Python
5
star
29

brave-browser-hardening

mirror of https://gitlab.com/CHEF-KOCH/brave-browser-hardening
5
star
30

Laozi

Laozi is a Goodreads bot for Telegram
Python
5
star
31

heroku-tornado-starter

Simple repo to get started on Heroku to serve static files using Python Tornado
Python
5
star
32

nightreads

Python
5
star
33

slackipycore

Invite Users to Slack using Python
Python
4
star
34

talkwithme

simple chatroom built using Tornado
Python
4
star
35

unnamed-wip-document-db

a document database (like mongo) written in Python. Uses CaskDB as storage backend.
4
star
36

learning-scraping

A project written in Django, which aims to teach Web Scraping for beginners.
Python
3
star
37

polistats

Google Search Trends of Indian Politics
Python
3
star
38

kekday

Find a Redditor's Cake Day!
HTML
3
star
39

good-human

Python
2
star
40

gh-vue

A simple Vue JS project to display my latest commits from Github
JavaScript
2
star
41

learning-tornado

Python
2
star
42

btree

An in-memory B Tree implementation in Go
Go
2
star
43

tildes-plus

JavaScript
2
star
44

mca_scraper

Python
2
star
45

gofish

Go
2
star
46

little-finger-android

Java
2
star
47

hugo-skyfall

Hugo theme - skyfall
CSS
2
star
48

mvcc-go

Go
2
star
49

reddit-india-books

What does Reddit India likes to read?
Python
2
star
50

skyfall

Clean Pelican theme for http://avi.im/blag
HTML
2
star
51

mkstemp

A Hacker News Clone
Python
2
star
52

offenders

List of Telecos and Companies which break Net Neutrality
2
star
53

Bitbucket-import

Python
1
star
54

monkey

An interpreter for the Monkey Programming Language in Golang
Go
1
star
55

wutong

A naughty Telegram bot
Python
1
star
56

tornado-benchmarks

pointless tornado benchmarks
Python
1
star
57

FoodPin

Swift
1
star
58

rtiman

Kickstarter for RTIs
Python
1
star
59

avinassh.github.io

HTML
1
star
60

bast

Delete your Reddit comment history safely
Go
1
star
61

blag

hosts my blog
CSS
1
star
62

goodreads-export

CSS
1
star
63

little-finger-ios

Swift
1
star
64

inks

Go
1
star
65

Calculator-Swift

Simple Calculator app written using Swift
Swift
1
star
66

honk

Go
1
star
67

ratelimit

Go
1
star
68

heroku-static

Ruby
1
star
69

build-your-regex

Go port of Rob Pike's Regex Engine (from C)
Go
1
star
70

hugo-publish

My Github Actions setup for publishing a Hugo blog
1
star