Log Structured Merge B+ Tree (LSMBT)
This an implementation of two different data structures:
The implementation also leverages a write-ahead log to ensure that data is not lost.
Basic Architecture
When you create a LSMBT 2 files are created: a blank B+ Tree file, and a blank WAL file. An in-memory BTreeMap is also constructed. Each method of the LSMBT is outlined below
Insert (key,value)
When a (key,value) pair is added to the LSMBT the following occurs:
- The (key,value) pair is written to the WAL file.
- The (key,value) pair is added to the in-memory BTree. If the size of the in-memory BTree hits a particular threshold, then
- The in-memory BTree is merged with the on-disk B+Tree to create a new on-disk B+Tree.
- The in-memory BTree and the WAL file are both truncated.
Get Values
Because a key can be associated with a set (no duplicate values per key) of values, the get
method returns a list of values:
- Collect all of the values associated with a given key in the in-memory BTree.
- Collect all of the values associated with a given key in the on-disk B+Tree.
- Return all the unique values
Delete Value
Again, because a key can be associated with a set of values, the value to be removed must be supplied during a delete:
- Remove the value from the in-memory BTree. If it is the only value associated with the key, then remove the key as well.
- Mark the value in the on-disk B+Tree as deleted. (The value isn't actually removed until a compaction occurs.)