Jungle
Embedded key-value storage library, based on a combined index of LSM-tree and copy-on-write (append-only) B+tree. Please refer to our paper.
Jungle is specialized for building replicated state machine of consensus protocols such as Paxos or Raft, by providing chronological ordering and lightweight persistent snapshot. It can be also used for building log store.
Features
- Ordered mapping of key and its value on disk (file system). Both key and value are arbitrary length binary.
- Monotonically increasing sequence number for each key-value modification.
- Point lookup on both key and sequence number.
- Range lookup on both key and sequence number, by using iterator:
- Snapshot isolation: each individual iterator is a snapshot.
- Bi-directional traversal and jump:
prev
,next
,gotoBegin
,gotoEnd
, andseek
.
- Lightweight persistent snapshot, based on sequence number:
- Nearly no overhead for the creation of a snapshot.
- Snapshots are durable; preserved even after process restart.
- Tunable configurations:
- The number of threads for log flushing and compaction.
- Custom size ratio between LSM levels.
- Compaction factor (please refer to the paper).
- Log store mode:
- Ordered mapping of sequence number and value, eliminating key indexing.
- Lightweight log truncation based on sequence number.
Things we DO NOT (and also WILL NOT) support
- Secondary indexing, or SQL-like query:
- Jungle will not understand the contents of value. Value is just a binary from Jungle's point of view.
- Server-client style service, or all other network-involving tasks such as replication:
- Jungle is a library that should be embedded into your process.
Benefits
Compared to other widely used LSM-based key-value storage libraries, benefits of Jungle are as follows:
- Smaller write amplification.
- Jungle will have 4-5 times less write amplification, while providing the similar level of write performance.
- Chronological ordering of key-value pairs
- Along with persistent logical snapshot, this feature is very useful when you use it as a replicated state machine for Paxos or Raft.
How to Build
cmake
:
1. Install - Ubuntu
$ sudo apt-get install cmake
- OSX
$ brew install cmake
2. Build
jungle$ ./prepare.sh -j8
jungle$ mkdir build
jungle$ cd build
jungle/build$ cmake ../
jungle/build$ make
Run unit tests:
jungle/build$ ./runtests.sh
How to Use
Please refer to this document.
Example Implementation
Please refer to examples.
Supported Platforms
- Ubuntu (tested on 14.04, 16.04, and 18.04)
- Centos (tested on 7)
- OSX (tested on 10.13 and 10.14)
Platforms will be supported in the future
- Windows
Contributing to This Project
We welcome contributions. If you find any bugs, potential flaws and edge cases, improvements, new feature suggestions or discussions, please submit issues or pull requests.
Contact
- Jung-Sang Ahn [email protected]
Coding Convention
- Recommended not to exceed 90 characters per line.
- Indent: 4 spaces, K&R (1TBS).
- Class & struct name:
UpperCamelCase
. - Member function and member variable name:
lowerCamelCase
. - Local variable, helper function, and parameter name:
snake_case
.
class MyClass {
public:
void myFunction(int my_parameter) {
int local_var = my_parameter + 1;
if (local_var < myVariable) {
// ...
} else {
// ...
}
}
private:
int myVariable;
};
int helper_function() {
return 0;
}
- Header include order: local to global.
- Header file corresponding to this source file (if applicable).
- Header files in the same project (i.e., Jungle).
- Header files from the other projects.
- C++ system header files.
- C system header files.
- Note: alphabetical order within the same category.
- Example (
my_file.cc
):
#include "my_file.h" // Corresponding header file.
#include "table_file.h" // Header files in the same project.
#include "table_helper.h"
#include "forestdb.h" // Header files from the other projects.
#include <cassert> // C++ header files.
#include <iostream>
#include <vector>
#include <sys/stat.h> // C header files.
#include <sys/types.h>
#include <unistd.h>
License Information
Copyright 2017-2019 eBay Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
3rd Party Code
-
URL: https://github.com/couchbase/forestdb
License: https://github.com/couchbase/forestdb/blob/master/LICENSE
Originally licensed under the Apache 2.0 license. -
URL: https://github.com/stbrumme/crc32
Original Copyright 2011-2016 Stephan Brumme
See Original ZLib License: https://github.com/stbrumme/crc32/blob/master/LICENSE -
URL: https://github.com/greensky00/simple_logger
License: https://github.com/greensky00/simple_logger/blob/master/LICENSE
Originally licensed under the MIT license. -
URL: https://github.com/greensky00/testsuite
License: https://github.com/greensky00/testsuite/blob/master/LICENSE
Originally licensed under the MIT license. -
URL: https://github.com/greensky00/latency-collector
License: https://github.com/greensky00/latency-collector/blob/master/LICENSE
Originally licensed under the MIT license. -
URL: https://github.com/eriwen/lcov-to-cobertura-xml/blob/master/lcov_cobertura/lcov_cobertura.py
License: https://github.com/eriwen/lcov-to-cobertura-xml/blob/master/LICENSE
Copyright 2011-2012 Eric Wendelin
Originally licensed under the Apache 2.0 license. -
URL: https://github.com/bilke/cmake-modules
License: https://github.com/bilke/cmake-modules/blob/master/LICENSE_1_0.txt
Copyright 2012-2017 Lars Bilke
Originally licensed under the BSD license. -
URL: https://github.com/aappleby/smhasher/tree/master/src
Copyright 2016 Austin Appleby
Originally licensed under the MIT license.