• Stars
    star
    372
  • Rank 114,858 (Top 3 %)
  • Language
    C
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast integer compression in C using the StreamVByte codec

streamvbyte

Ubuntu 22.04 CI (GCC 9, 10, 11 and 12, LLVM 12, 13, 14) Ubuntu 20.04 CI (GCC 9.4 and 10, LLVM 10 and 11) macOS 11 CI (LLVM 13, GCC 10, 11, 12) VS16-CI VS17-CI

StreamVByte is a new integer compression technique that applies SIMD instructions (vectorization) to Google's Group Varint approach. The net result is faster than other byte-oriented compression techniques.

The approach is patent-free, the code is available under the Apache License.

It includes fast differential coding.

It assumes a recent Intel processor (most Intel and AMD processors released after 2010) or an ARM processor with NEON instructions (which is almost all of them except for the tiny cores). Big-endian processors are unsupported at this time, but they are getting to be extremely rare.

The code should build using most standard-compliant C99 compilers. The provided makefile expects a Linux-like system. We have a CMake build.

Requirements

  • A C99 compatible compiler (GCC 9 and up, LLVM 10 and up, Visual Studio 2019 and up).
  • We support macOS, Linux and Windows. It should be easy to extend support to FreeBSD and other POSIX systems.

For high performance, you should have either a 64-bit ARM processor or a 64-bit x64 system with SSE 4.1 support. SSE 4.1 was added to Intel processors in 2007 so it is almost certain that your Intel or AMD processor supports it.

Users

This library is used by

Usage

See examples/example.c for an example.

Short code sample:

// suppose that datain is an array of uint32_t integers
size_t compsize = streamvbyte_encode(datain, N, compressedbuffer); // encoding
// here the result is stored in compressedbuffer using compsize bytes
streamvbyte_decode(compressedbuffer, recovdata, N); // decoding (fast)

If the values are sorted, then it might be preferable to use differential coding:

// suppose that datain is an array of uint32_t integers
size_t compsize = streamvbyte_delta_encode(datain, N, compressedbuffer,0); // encoding
// here the result is stored in compressedbuffer using compsize bytes
streamvbyte_delta_decode(compressedbuffer, recovdata, N,0); // decoding (fast)

You have to know how many integers were coded when you decompress. You can store this information along with the compressed stream.

During decoding, the library may read up to STREAMVBYTE_PADDING extra bytes from the input buffer (these bytes are read but never used).

1. Building with CMake:

We expect a recent CMake. Please make sure that your version of CMake is up-to-date or you may need to adapt our instructions.

The cmake build system also offers a libstreamvbyte_static static library (libstreamvbyte_static under linux) in addition to libstreamvbyte shared library (libstreamvbyte.so under linux).

-DCMAKE_INSTALL_PREFIX:PATH=/path/to/install is optional. Defaults to /usr/local{include,lib}

cmake -DCMAKE_BUILD_TYPE=Release \
         -DCMAKE_INSTALL_PREFIX:PATH=/path/to/install \
	 -DSTREAMVBYTE_ENABLE_EXAMPLES=ON \
	 -DSTREAMVBYTE_ENABLE_TESTS=ON -B build

cmake --build build
# run the tests like:
ctest --test-dir build

Installation with CMake

cmake --install build 

Benchmarking with CMake

After building, you may run our benchmark as follows:

./build/test/perf

The benchmarks are not currently built under Windows.

2. Building with Makefile:

  make
  ./unit

Installation with Makefile

You can install the library (as a dynamic library) on your machine if you have root access:

  sudo make install

To uninstall, simply type:

  sudo make uninstall

It is recommended that you try make dyntest before proceeding.

Benchmarking with Makefile

You can try to benchmark the speed in this manner:

  make perf
  ./perf

Make sure to run make test before, as a sanity test.

Signed integers

We do not directly support signed integers, but you can use fast functions to convert signed integers to unsigned integers.

#include "streamvbyte_zigzag.h"

zigzag_encode(mysignedints, myunsignedints, number); // mysignedints => myunsignedints

zigzag_decode(myunsignedints, mysignedints, number); // myunsignedints => mysignedints

Technical posts

Alternative encoding

By default, Stream VByte uses 1, 2, 3 or 4 bytes per integer. In the case where you expect many of your integers to be zero, you might try the streamvbyte_encode_0124 and streamvbyte_decode_0124 which use 0, 1, 2, or 4 bytes per integer.

Stream VByte in other languages

Format Specification

We specify the format as follows.

We do not store how many integers (count) are compressed in the compressed data per se. If you want to store the data stream (e.g., to disk), you need to add this information. It is intentionally left out because, in applications, it is often the case that there are better ways to store this count.

There are two streams:

  • The data starts with an array of "control bytes". There are (count + 3) / 4 of them.
  • Following the array of control bytes, there are data bytes.

We can interpret the control bytes as a sequence of 2-bit words. The first 2-bit word is made of the least significant 2 bits in the first byte, and so forth. There are four 2-bit words written in each byte.

Starting from the first 2-bit word, we have corresponding sequence in the data bytes, written in sequence from the beginning:

  • When the 2-bit word is 00, there is a single data byte.
  • When the 2-bit words is 01, there are two data bytes.
  • When the 2-bit words is 10, there are three data bytes.
  • When the 2-bit words is 11, there are four data bytes.

The data bytes are stored using a little-endian encoding.

Consider the following example:

control bytes: [0x40 0x55 ... ]
data bytes: [0x00 0x64 0xc8 0x2c 0x01 0x90  0x01 0xf4 0x01 0x58 0x02 0xbc 0x02 ...]

The first control byte is 0x40 or the four 2-bit words : 00 00 00 01. The second control byte is 0x55 or the four 2-bit words : 01 01 01 01. Thus the first three values are given by the first three bytes: 0x00, 0x64, 0xc8 (or 0, 100, 200 in base 10). The five next values are stored using two bytes each: 0x2c 0x01, 0x90 0x01, 0xf4 0x01, 0x58 0x02, 0xbc 0x02. As little endian integers, these are to be interpreted as 300, 400, 500, 600, 700.

Thus, to recap, the sequence of integers (0,100,200,300,400,500,600,700) gets encoded as the 15 bytes 0x40 0x55 0x00 0x64 0xc8 0x2c 0x01 0x90 0x01 0xf4 0x01 0x58 0x02 0xbc 0x02.

If the countis not divisible by four, then we include a final partial group where we use zero 2-bit corresponding to no data byte.

Reference

See also

More Repositories

1

FastPFor

The FastPFOR C++ library: Fast integer compression
C++
876
star
2

Code-used-on-Daniel-Lemire-s-blog

This is a repository for the code posted on my blog
C
720
star
3

fast_double_parser

Fast function to parse strings into double (binary64) floating-point values, enforces the RFC 7159 (JSON standard) grammar: 4x faster than strtod
C++
574
star
4

javaewah

A compressed alternative to the Java BitSet class
Java
560
star
5

JavaFastPFOR

A simple integer compression library in Java
Java
505
star
6

simdcomp

A simple C library for compressing lists of integers using binary packing
C
480
star
7

EWAHBoolArray

A compressed bitmap class in C++.
C++
430
star
8

SIMDCompressionAndIntersection

A C++ library to compress and intersect sorted lists of integers using SIMD instructions
C++
420
star
9

fastbase64

SIMD-accelerated base64 codecs
C
388
star
10

FastPriorityQueue.js

a fast heap-based priority queue in JavaScript
JavaScript
328
star
11

fastvalidate-utf-8

header-only library to validate utf-8 strings at high speeds (using SIMD instructions)
C
286
star
12

fastrange

A fast alternative to the modulo reduction
C
272
star
13

fastmod

A C/C++ header file for fast 32-bit division remainders (and divisibility tests) on 64-bit hardware.
C++
264
star
14

clhash

C library implementing the ridiculously fast CLHash hashing function
C
256
star
15

externalsortinginjava

External-Memory Sorting in Java
Java
248
star
16

rollinghashcpp

Rolling Hash C++ Library
C++
185
star
17

FastBitSet.js

Speed-optimized BitSet implementation for modern browsers and JavaScript engines
JavaScript
150
star
18

docker_programming_station

A simple script to help programmers who want to work within Docker
Shell
142
star
19

despacer

C library to remove white space from strings as fast as possible
C
133
star
20

StronglyUniversalStringHashing

Benchmark showing the we can randomly hash strings very quickly with good universality
C++
133
star
21

testingRNG

Testing common random-number generators (RNG)
C++
131
star
22

MaskedVByte

Fast decoder for VByte-compressed integers
C
115
star
23

cbitset

A simple bitset library in C
C
110
star
24

lbimproved

Dynamic Time Warping (DTW) library implementing lower bounds (LB_Keogh, LB_Improved...)
C++
105
star
25

dictionary

High-performance dictionary coding
C++
99
star
26

SIMDxorshift

Fast random number generators: Vectorized (SIMD) version of xorshift128+
C
99
star
27

csharpewah

Compressed bitmaps in C#
C#
82
star
28

fastrand

Fast random number generation in an interval in Python: Up to 10x faster than random.randint.
C
77
star
29

bloofi

Bloofi: A java implementation of multidimensional Bloom filters
Java
75
star
30

rollinghashjava

Rolling hash functions in Java
Java
75
star
31

LittleIntPacker

C library to pack and unpack short arrays of integers as fast as possible
C
70
star
32

simdpcg

Vectorized version of the PCG random number generator
C
68
star
33

TypedFastBitSet.js

Speed-optimized BitSet implementation for modern browsers and JavaScript engines, uses typed arrays
TypeScript
67
star
34

FastIntegerCompression.js

Fast integer compression library in JavaScript
JavaScript
54
star
35

simdprune

Pruning elements in SIMD vectors (i.e., packing left elements)
C
50
star
36

testingmlp

Testing memory-level parallelism
C
49
star
37

FastDifferentialCoding

Fast differential coding functions (using SIMD instructions)
C
49
star
38

multiplatform_simd_recipes

SIMD recipes, for various platforms (collection of code snippets)
C
43
star
39

runningmaxmin

Fast maximum-minimum filters implemented in C++
C++
43
star
40

validateutf8-experiments

Reproducible experimeents on UTF-8 validation using SIMD instructions
C++
40
star
41

fasthashing

A Variable-Length String Hashing Library in C++
C++
40
star
42

fast_division

Simple C++ code to benchmark fast division algorithms
C++
40
star
43

FrameOfReference

C++ library to pack and unpack vectors of integers having a small range of values using a technique called Frame of Reference
C
40
star
44

SwiftBitset

A fast Bitset class in Swift
Swift
36
star
45

FastShuffleExperiments

How fast can we shuffle values?
C++
33
star
46

programmingishard

Long-term book project
32
star
47

StablePriorityQueue.js

A stable heap-based priority queue in JavaScript
JavaScript
27
star
48

fastscancount

Fast implementations of the scancount algorithm: C++ header-only library
C++
24
star
49

SIMDgameoflife

Vectorized (AVX) version of the game of life
C
23
star
50

CMemoryUsage

Measuring memory usage in C and C++
C
23
star
51

CRoaringUnityBuild

Dumps of CRoaring unity builds (for convenience)
C
23
star
52

SwiftWyhash

Fast random number generator in pure Swift
Swift
23
star
53

kodakimagecollection

Set of images made available royalty-free by Kodak
22
star
54

sparsebitmap

A simple sparse bitmap implementation in java
Java
21
star
55

IndexWikipedia

A simple utility to index wikipedia dumps using Lucene.
Java
21
star
56

PiecewiseLinearTimeSeriesApproximation

code from Daniel Lemire, A Better Alternative to Piecewise Linear Time Series Segmentation, SIAM Data Mining 2007, 2007.
C++
21
star
57

Concise

C++ implementation of Concise and WAH compressed bitsets
C++
19
star
58

MemoryLanes

iOS app to test memory-level parallelism
C++
18
star
59

fastrandom

Go
18
star
60

crackingxoroshiro128plus

How to "crack" xoroshiro128+: from two outputs, derive a possible seed
Python
18
star
61

talks

Random material having to do with Daniel Lemire's talks
C++
18
star
62

RealisticTabularDataSets

Some realistic tabular datasets for testing (CSV)
17
star
63

StarSchemaBenchmark

O'Neil et al.'s Star Schema Benchmark: curated code
C
16
star
64

vectorclass

Random number generator for large applications using vector instructions
C++
15
star
65

simple_fastfloat_benchmark

C
15
star
66

simplebitmapbenchmark

Simple benchmark between compressed bitmap libraries in Java
Java
15
star
67

zobristhashing

Zobrist hashing in C
C
14
star
68

BitSliceIndex

Experiments on bit-slice indexing
Java
14
star
69

SIMDIntersections

Vectorized intersections (research code)
C++
14
star
70

microbenchmarks

Private microbenchmarks
Java
13
star
71

backward_multiplication

Multiplying... backward?
C++
12
star
72

pospopcnt_avx512

benchmarking positional population count
Assembly
11
star
73

costofsafety

Quick experiment to see how expensive safety is in C, for research
C
11
star
74

constantdivisionbenchmarks

Benchmarks for constant-division problems (not a library! for research only!)
C++
11
star
75

createfasthash

Code from article http://locklessinc.com/articles/fast_hash/
C
9
star
76

SwiftCallingCHeader

Calling a C header from Swift (example)
Swift
8
star
77

arraylayout

Pat Morin's arraylayout
TeX
8
star
78

WebAssemblyVSJavaScript

Project to compare the performance of WebAssembly with JavaScript
JavaScript
7
star
79

rowreorderingcpplibrary

This is a set of row-reordering algorithms and data compression compression schemes implemented in C++.
C++
7
star
80

ComputerLanguageBenchmark

Captured as http://benchmarksgame.alioth.debian.org is closing
HTML
7
star
81

fastheap

Fast heap data structures in Go (golang)
Go
6
star
82

iosbitmapdecoding

Experimenting with bitmap decoding on ios
C++
6
star
83

notesdecours

Notes de cours
Java
6
star
84

gutenberg-headers

Removing Manually-Generated Boilerplate from Project Gutenberg e-Books
Java
6
star
85

gobitmapbenchmark

A Go project to benchmark various bitmap implementations (this is not a library!)
Go
6
star
86

citationdata

Influential references dataset
6
star
87

simple_simdjson_python_wrapper

Simple use of simdjson from python (for research purposes)
C++
6
star
88

pythonmaxmin

Fast minimum-maximal filter in Python
Python
6
star
89

jstypes

Doing C-like arithmetic and logical operations in JavaScript (full 64-bit support)
JavaScript
5
star
90

hierarchicalbinbuffering

Hierarchical Bin Buffering C++ Library
C++
5
star
91

fast_float_supplemental_tests

Supplemental tests for the fast_float library (credit: Nigel Tao)
C++
5
star
92

HashVSTree

Do hash table use more memory than trees? A case study in Java.
Java
5
star
93

zone_benchmarks

zone-file parsing benchmark
C
5
star
94

DivisionBenchmark

A not-so-exciting benchmark to measure the performance of some division functions
C++
4
star
95

RoaringVSConciseBenchmark

A benchmark comparing roaring and concise
Java
4
star
96

exercices_lucene

Exercises pour s'approprier lucene, le moteur de recherche
Java
4
star
97

roaring_rust_benchmark

Basic benchmark to compare different Roaring bitmap implementations in Rust
Rust
4
star
98

datasciencebook

Python
4
star
99

viewsizeestimation

Unassuming hashing-based view-size estimation techniques
C++
4
star
100

jackson-json-bench

A silly benchmark for Jackson (JSON parser)
Java
4
star