• Stars
    star
    22,448
  • Rank 1,061 (Top 0.03 %)
  • Language
    C
  • License
    Other
  • Created almost 10 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Zstandard - Fast real-time compression algorithm

Zstandard

Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios. It's backed by a very fast entropy stage, provided by Huff0 and FSE library.

Zstandard's format is stable and documented in RFC8878. Multiple independent implementations are already available. This repository represents the reference implementation, provided as an open-source dual BSD OR GPLv2 licensed C library, and a command line utility producing and decoding .zst, .gz, .xz and .lz4 files. Should your project require another programming language, a list of known ports and bindings is provided on Zstandard homepage.

Development branch status:

Build Status Build status Build status Fuzzing Status

Benchmarks

For reference, several fast compression algorithms were tested and compared on a desktop featuring a Core i7-9700K CPU @ 4.9GHz and running Ubuntu 20.04 (Linux ubu20 5.15.0-101-generic), using lzbench, an open-source in-memory benchmark by @inikep compiled with gcc 9.4.0, on the Silesia compression corpus.

Compressor name Ratio Compression Decompress.
zstd 1.5.6 -1 2.887 510 MB/s 1580 MB/s
zlib 1.2.11 -1 2.743 95 MB/s 400 MB/s
brotli 1.0.9 -0 2.702 395 MB/s 430 MB/s
zstd 1.5.6 --fast=1 2.437 545 MB/s 1890 MB/s
zstd 1.5.6 --fast=3 2.239 650 MB/s 2000 MB/s
quicklz 1.5.0 -1 2.238 525 MB/s 750 MB/s
lzo1x 2.10 -1 2.106 650 MB/s 825 MB/s
lz4 1.9.4 2.101 700 MB/s 4000 MB/s
lzf 3.6 -1 2.077 420 MB/s 830 MB/s
snappy 1.1.9 2.073 530 MB/s 1660 MB/s

The negative compression levels, specified with --fast=#, offer faster compression and decompression speed at the cost of compression ratio.

Zstd can also offer stronger compression ratios at the cost of compression speed. Speed vs Compression trade-off is configurable by small increments. Decompression speed is preserved and remains roughly the same at all settings, a property shared by most LZ compression algorithms, such as zlib or lzma.

The following tests were run on a server running Linux Debian (Linux version 4.14.0-3-amd64) with a Core i7-6700K CPU @ 4.0GHz, using lzbench, an open-source in-memory benchmark by @inikep compiled with gcc 7.3.0, on the Silesia compression corpus.

Compression Speed vs Ratio Decompression Speed
Compression Speed vs Ratio Decompression Speed

A few other algorithms can produce higher compression ratios at slower speeds, falling outside of the graph. For a larger picture including slow modes, click on this link.

The case for Small Data compression

Previous charts provide results applicable to typical file and stream scenarios (several MB). Small data comes with different perspectives.

The smaller the amount of data to compress, the more difficult it is to compress. This problem is common to all compression algorithms, and reason is, compression algorithms learn from past data how to compress future data. But at the beginning of a new data set, there is no "past" to build upon.

To solve this situation, Zstd offers a training mode, which can be used to tune the algorithm for a selected type of data. Training Zstandard is achieved by providing it with a few samples (one file per sample). The result of this training is stored in a file called "dictionary", which must be loaded before compression and decompression. Using this dictionary, the compression ratio achievable on small data improves dramatically.

The following example uses the github-users sample set, created from github public API. It consists of roughly 10K records weighing about 1KB each.

Compression Ratio Compression Speed Decompression Speed
Compression Ratio Compression Speed Decompression Speed

These compression gains are achieved while simultaneously providing faster compression and decompression speeds.

Training works if there is some correlation in a family of small data samples. The more data-specific a dictionary is, the more efficient it is (there is no universal dictionary). Hence, deploying one dictionary per type of data will provide the greatest benefits. Dictionary gains are mostly effective in the first few KB. Then, the compression algorithm will gradually use previously decoded content to better compress the rest of the file.

Dictionary compression How To:

  1. Create the dictionary

    zstd --train FullPathToTrainingSet/* -o dictionaryName

  2. Compress with dictionary

    zstd -D dictionaryName FILE

  3. Decompress with dictionary

    zstd -D dictionaryName --decompress FILE.zst

Build instructions

make is the officially maintained build system of this project. All other build systems are "compatible" and 3rd-party maintained, they may feature small differences in advanced options. When your system allows it, prefer using make to build zstd and libzstd.

Makefile

If your system is compatible with standard make (or gmake), invoking make in root directory will generate zstd cli in root directory. It will also create libzstd into lib/.

Other available options include:

  • make install : create and install zstd cli, library and man pages
  • make check : create and run zstd, test its behavior on local platform

The Makefile follows the GNU Standard Makefile conventions, allowing staged install, standard flags, directory variables and command variables.

For advanced use cases, specialized compilation flags which control binary generation are documented in lib/README.md for the libzstd library and in programs/README.md for the zstd CLI.

cmake

A cmake project generator is provided within build/cmake. It can generate Makefiles or other build scripts to create zstd binary, and libzstd dynamic and static libraries.

By default, CMAKE_BUILD_TYPE is set to Release.

Support for Fat (Universal2) Output

zstd can be built and installed with support for both Apple Silicon (M1/M2) as well as Intel by using CMake's Universal2 support. To perform a Fat/Universal2 build and install use the following commands:

cmake -B build-cmake-debug -S build/cmake -G Ninja -DCMAKE_OSX_ARCHITECTURES="x86_64;x86_64h;arm64"
cd build-cmake-debug
ninja
sudo ninja install

Meson

A Meson project is provided within build/meson. Follow build instructions in that directory.

You can also take a look at .travis.yml file for an example about how Meson is used to build this project.

Note that default build type is release.

VCPKG

You can build and install zstd vcpkg dependency manager:

git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
./vcpkg install zstd

The zstd port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please create an issue or pull request on the vcpkg repository.

Visual Studio (Windows)

Going into build directory, you will find additional possibilities:

  • Projects for Visual Studio 2005, 2008 and 2010.
    • VS2010 project is compatible with VS2012, VS2013, VS2015 and VS2017.
  • Automated build scripts for Visual compiler by @KrzysFR, in build/VS_scripts, which will build zstd cli and libzstd library without any need to open Visual Studio solution.

Buck

You can build the zstd binary via buck by executing: buck build programs:zstd from the root of the repo. The output binary will be in buck-out/gen/programs/.

Bazel

You easily can integrate zstd into your Bazel project by using the module hosted on the Bazel Central Repository.

Testing

You can run quick local smoke tests by running make check. If you can't use make, execute the playTest.sh script from the src/tests directory. Two env variables $ZSTD_BIN and $DATAGEN_BIN are needed for the test script to locate the zstd and datagen binary. For information on CI testing, please refer to TESTING.md.

Status

Zstandard is currently deployed within Facebook and many other large cloud infrastructures. It is run continuously to compress large amounts of data in multiple formats and use cases. Zstandard is considered safe for production environments.

License

Zstandard is dual-licensed under BSD OR GPLv2.

Contributing

The dev branch is the one where all contributions are merged before reaching release. If you plan to propose a patch, please commit into the dev branch, or its own feature branch. Direct commit to release are not permitted. For more information, please read CONTRIBUTING.

More Repositories

1

react

The library for web and native user interfaces.
JavaScript
227,971
star
2

react-native

A framework for building native applications using React
C++
118,682
star
3

create-react-app

Set up a modern web app by running one command.
JavaScript
101,913
star
4

docusaurus

Easy to maintain open source documentation websites.
TypeScript
56,059
star
5

jest

Delightful JavaScript Testing.
TypeScript
41,554
star
6

rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
C++
28,328
star
7

folly

An open-source C++ library developed and used at Facebook.
C++
27,122
star
8

flow

Adds static typing to JavaScript to improve developer productivity and code quality.
OCaml
22,068
star
9

lexical

Lexical is an extensible text editor framework that provides excellent reliability, accessibility and performance.
TypeScript
19,616
star
10

relay

Relay is a JavaScript framework for building data-driven React applications.
Rust
18,191
star
11

hhvm

A virtual machine for executing programs written in Hack.
Hack
18,048
star
12

prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Python
17,943
star
13

fresco

An Android library for managing images and the memory they use.
Java
17,041
star
14

yoga

Yoga is an embeddable layout engine targeting web standards.
C++
16,928
star
15

infer

A static analyzer for Java, C, C++, and Objective-C
OCaml
14,715
star
16

flipper

A desktop debugging platform for mobile developers.
TypeScript
13,221
star
17

watchman

Watches files and records, or triggers actions, when they change.
C++
12,294
star
18

react-devtools

An extension that allows inspection of React component hierarchy in the Chrome and Firefox Developer Tools.
11,030
star
19

hermes

A JavaScript engine optimized for running React Native.
C++
9,388
star
20

jscodeshift

A JavaScript codemod toolkit.
JavaScript
9,270
star
21

chisel

Chisel is a collection of LLDB commands to assist debugging iOS apps.
Python
9,090
star
22

buck

A fast build system that encourages the creation of small, reusable modules over a variety of platforms and languages.
Java
8,568
star
23

stylex

StyleX is the styling system for ambitious user interfaces.
JavaScript
8,333
star
24

proxygen

A collection of C++ HTTP libraries including an easy to use HTTP server.
C++
8,026
star
25

facebook-ios-sdk

Used to integrate the Facebook Platform with your iOS & tvOS apps.
Swift
7,720
star
26

litho

A declarative framework for building efficient UIs on Android.
Java
7,646
star
27

pyre-check

Performant type-checking for python.
OCaml
6,696
star
28

facebook-android-sdk

Used to integrate Android apps with Facebook Platform.
Kotlin
6,066
star
29

redex

A bytecode optimizer for Android apps
C++
5,991
star
30

sapling

A Scalable, User-Friendly Source Control System.
Rust
5,815
star
31

componentkit

A React-inspired view framework for iOS.
Objective-C++
5,746
star
32

fishhook

A library that enables dynamically rebinding symbols in Mach-O binaries running on iOS.
C
5,117
star
33

PathPicker

PathPicker accepts a wide range of input -- output from git commands, grep results, searches -- pretty much anything. After parsing the input, PathPicker presents you with a nice UI to select which files you're interested in. After that you can open them in your favorite editor or execute arbitrary commands.
Python
5,075
star
34

metro

🚇 The JavaScript bundler for React Native
JavaScript
5,061
star
35

prop-types

Runtime type checking for React props and similar objects
JavaScript
4,446
star
36

idb

idb is a flexible command line interface for automating iOS simulators and devices
Objective-C
4,431
star
37

Haxl

A Haskell library that simplifies access to remote data, such as databases or web-based services.
Haskell
4,227
star
38

FBRetainCycleDetector

iOS library to help detecting retain cycles in runtime.
Objective-C++
4,190
star
39

memlab

A framework for finding JavaScript memory leaks and analyzing heap snapshots
TypeScript
4,187
star
40

duckling

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
Haskell
4,021
star
41

fbt

A JavaScript Internationalization Framework
JavaScript
3,849
star
42

regenerator

Source transformer enabling ECMAScript 6 generator functions in JavaScript-of-today.
JavaScript
3,817
star
43

buck2

Build system, successor to Buck
Rust
3,307
star
44

mcrouter

Mcrouter is a memcached protocol router for scaling memcached deployments.
C++
3,222
star
45

wangle

Wangle is a framework providing a set of common client/server abstractions for building services in a consistent, modular, and composable way.
C++
3,030
star
46

react-strict-dom

React Strict DOM (RSD) is a subset of React DOM, imperative DOM, and CSS that supports web and native targets
JavaScript
2,922
star
47

wdt

Warp speed Data Transfer (WDT) is an embeddedable library (and command line tool) aiming to transfer data between 2 systems as fast as possible over multiple TCP paths.
C++
2,836
star
48

igl

Intermediate Graphics Library (IGL) is a cross-platform library that commands the GPU. It provides a single low-level cross-platform interface on top of various graphics APIs (e.g. OpenGL, Metal and Vulkan).
C++
2,719
star
49

fbthrift

Facebook's branch of Apache Thrift, including a new C++ server.
C++
2,535
star
50

mysql-5.6

Facebook's branch of the Oracle MySQL database. This includes MyRocks.
C++
2,446
star
51

Ax

Adaptive Experimentation Platform
Python
2,272
star
52

fbjs

A collection of utility libraries used by other Meta JS projects.
JavaScript
1,953
star
53

jsx

The JSX specification is a XML-like syntax extension to ECMAScript.
HTML
1,945
star
54

react-native-website

The React Native website and docs
JavaScript
1,899
star
55

screenshot-tests-for-android

Generate fast deterministic screenshots during Android instrumentation tests
Java
1,733
star
56

idx

Library for accessing arbitrarily nested, possibly nullable properties on a JavaScript object.
JavaScript
1,686
star
57

TextLayoutBuilder

An Android library that allows you to build text layouts more easily.
Java
1,470
star
58

mvfst

An implementation of the QUIC transport protocol.
C++
1,433
star
59

SoLoader

Native code loader for Android
Java
1,300
star
60

facebook-python-business-sdk

Python SDK for Meta Marketing APIs
Python
1,240
star
61

ThreatExchange

Trust & Safety tools for working together to fight digital harms.
C++
1,170
star
62

CacheLib

Pluggable in-process caching engine to build and scale high performance services
C++
1,097
star
63

mariana-trench

A security focused static analysis tool for Android and Java applications.
C++
1,041
star
64

fatal

Fatal is a library for fast prototyping software in modern C++. It provides facilities to enhance the expressive power of C++. The library is heavily based on template meta-programming, while keeping the complexity under-the-hood.
C++
1,000
star
65

transform360

Transform360 is an equirectangular to cubemap transform for 360 video.
C
996
star
66

openr

Distributed platform for building autonomic network functions.
C++
883
star
67

fboss

Facebook Open Switching System Software for controlling network switches.
C++
851
star
68

ktfmt

A program that reformats Kotlin source code to comply with the common community standard for Kotlin code conventions.
Kotlin
818
star
69

facebook-php-business-sdk

PHP SDK for Meta Marketing API
PHP
810
star
70

winterfell

A STARK prover and verifier for arbitrary computations
Rust
728
star
71

pyre2

Python wrapper for RE2
C++
631
star
72

starlark-rust

A Rust implementation of the Starlark language
Rust
623
star
73

openbmc

OpenBMC is an open software framework to build a complete Linux image for a Board Management Controller (BMC).
C
615
star
74

SPARTA

SPARTA is a library of software components specially designed for building high-performance static analyzers based on the theory of Abstract Interpretation.
C++
609
star
75

time

Meta's Time libraries
Go
570
star
76

chef-cookbooks

Open source chef cookbooks.
Ruby
565
star
77

IT-CPE

Meta's Client Platform Engineering tools. Some of the tools we have written to help manage our fleet of client systems.
Ruby
554
star
78

dotslash

Simplified executable deployment
Rust
523
star
79

Rapid

The OpenStreetMap editor driven by open data, AI, and supercharged features
JavaScript
515
star
80

lexical-ios

Lexical iOS is an extensible text editor framework that integrates the APIs and philosophies from Lexical Web with a Swift API built on top of TextKit.
Swift
477
star
81

facebook-sdk-for-unity

The facebook sdk for unity.
C#
474
star
82

facebook-nodejs-business-sdk

Node.js SDK for Meta Marketing APIs
JavaScript
469
star
83

FAI-PEP

Facebook AI Performance Evaluation Platform
Python
384
star
84

facebook-java-business-sdk

Java SDK for Meta Marketing APIs
Java
379
star
85

chef-utils

Utilities related to Chef
Ruby
290
star
86

opaque-ke

An implementation of the OPAQUE password-authenticated key exchange protocol
Rust
275
star
87

dns

Collection of Meta's DNS Libraries
Go
257
star
88

facebook360_dep

Facebook360 Depth Estimation Pipeline - https://facebook.github.io/facebook360_dep
HTML
241
star
89

akd

An implementation of an auditable key directory
Rust
219
star
90

tac_plus

A Tacacs+ Daemon tested on Linux (CentOS) to run AAA via TACACS+ Protocol via IPv4 and IPv6.
C
207
star
91

facebook-ruby-business-sdk

Ruby SDK for Meta Marketing API
Ruby
204
star
92

usort

Safe, minimal import sorting for Python projects.
Python
171
star
93

grocery-delivery

The Grocery Delivery utility for managing cookbook uploads to distributed Chef backends.
Ruby
153
star
94

taste-tester

Software to manage a chef-zero instance and use it to test changes on production servers.
Ruby
146
star
95

TestSlide

A Python test framework
Python
143
star
96

sapp

Post Processor for Facebook Static Analysis Tools.
Python
127
star
97

homebrew-fb

OS X Homebrew formulas to install Meta open source software
Ruby
124
star
98

threat-research

Welcome to the Meta Threat Research Indicator Repository, a dedicated resource for the sharing of Indicators of Compromise (IOCs) and other threat indicators with the external research community
Python
124
star
99

ocamlrep

Sets of libraries and tools to write applications and libraries mixing OCaml and Rust. These libraries will help keeping your types and data structures synchronized, and enable seamless exchange between OCaml and Rust
Rust
121
star
100

squangle

SQuangLe is a C++ API for accessing MySQL servers
C++
121
star