• Stars
    star
    544
  • Rank 81,699 (Top 2 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast lossless data compression in Go

kanzi

Kanzi is a modern, modular, expandable and efficient lossless data compressor implemented in Go.

  • modern: state-of-the-art algorithms are implemented and multi-core CPUs can take advantage of the built-in multi-tasking.
  • modular: entropy codec and a combination of transforms can be provided at runtime to best match the kind of data to compress.
  • expandable: clean design with heavy use of interfaces as contracts makes integrating and expanding the code easy. No dependencies.
  • efficient: the code is optimized for efficiency (trade-off between compression ratio and speed).

Unlike the most common lossless data compressors, Kanzi uses a variety of different compression algorithms and supports a wider range of compression ratios as a result. Most usual compressors do not take advantage of the many cores and threads available on modern CPUs (what a waste!). Kanzi is multithreadead by design and uses several threads by default to compress blocks concurrently. It is not compatible with standard compression formats. Kanzi is a lossless data compressor, not an archiver. It uses checksums (optional but recommended) to validate data integrity but does not have a mechanism for data recovery. It also lacks data deduplication across files.

For more details, check https://github.com/flanglet/kanzi-go/wiki.

See how to reuse the code here: https://github.com/flanglet/kanzi-go/wiki/Using-and-extending-the-code

There is a C++ implementation available here: https://github.com/flanglet/kanzi-cpp

There is Java implementation available here: https://github.com/flanglet/kanzi

Credits

Matt Mahoney, Yann Collet, Jan Ondrus, Yuta Mori, Ilya Muravyov, Neal Burns, Fabian Giesen, Jarek Duda, Ilya Grebnov

Disclaimer

Use at your own risk. Always keep a backup of your files.

Build Status Go Report Card Documentation

Silesia corpus benchmark

i7-7700K @4.20GHz, 32GB RAM, Ubuntu 22.04

go1.19.beta1

Kanzi version 2.1 Go implementation. Block size is 100 MB.

Compressor Encoding (sec) Decoding (sec) Size
Original 211,938,580
Kanzi -l 1 -j 1 1.9 1.0 69,399,477
Kanzi -l 1 -j 6 0.7 0.4 69,399,477
Pigz 2.6 -5 -p6 1.0 0.7 69,170,603
Gzip 1.10 -5 4.8 1.0 69,143,980
Zstd 1.5.3 -2 --long=30 0.9 0.5 68,694,316
Zstd 1.5.3 -2 -T6 --long=30 0.4 0.3 68,694,316
Brotli 1.0.9 -2 --large_window=30 1.5 0.8 68,033,377
Pigz 2.6 -9 -p6 3.0 0.6 67,656,836
Gzip 1.10 -9 15.5 1.0 67,631,990
Brotli 1.0.9 -4 --large_window=30 4.1 0.7 64,267,169
Kanzi -l 2 -j 1 3.1 1.2 63,808,747
Kanzi -l 2 -j 6 1.2 0.5 63,808,747
Zstd 1.5.3 -9 --long=30 3.7 0.3 59,272,590
Zstd 1.5.3 -9 -T6 --long=30 2.3 0.3 59,272,590
Kanzi -l 3 -j 1 5.4 2.1 59,199,795
Kanzi -l 3 -j 6 1.8 0.8 59,199,795
Orz 1.5.0 7.7 2.0 57,564,831
Brotli 1.0.9 -9 --large_window=30 36.7 0.7 56,232,817
Lzma 5.2.2 -3 24.1 2.6 55,743,540
Kanzi -l 4 -j 1 9.7 5.8 54,998,198
Kanzi -l 4 -j 6 3.7 2.2 54,998,198
Bzip2 1.0.6 -9 14.9 5.2 54,506,769
Zstd 1.5.3 -19 --long=30 62.0 0.3 52,828,057
Zstd 1.5.3 -19 -T6 --long=30 62.0 0.4 52,828,057
Kanzi -l 5 -j 1 12.2 5.4 51,760,244
Kanzi -l 5 -j 6 4.2 2.0 51,760,244
Brotli 1.0.9 --large_window=30 356.2 0.9 49,383,136
Lzma 5.2.2 -9 65.6 2.5 48,780,457
Kanzi -l 6 -j 1 14.3 8.4 48,068,000
Kanzi -l 6 -j 6 4.8 2.8 48,068,000
bsc 3.2.3 -b100 -T -t 8.8 6.0 46,932,394
bsc 3.2.3 -b100 5.4 4.9 46,932,394
BCM 1.65 -b100 15.5 21.1 46,506,716
Kanzi -l 7 -j 1 19.4 13.3 46,447,003
Kanzi -l 7 -j 6 6.8 5.1 46,447,003
Tangelo 2.4 83.2 85.9 44,862,127
zpaq v7.14 m4 t1 107.3 112.2 42,628,166
zpaq v7.14 m4 t12 108.1 111.5 42,628,166
Kanzi -l 8 -j 1 60.7 60.8 41,821,127
Kanzi -l 8 -j 6 24.5 20.9 41,821,127
Tangelo 2.0 302.0 310.9 41,267,068
Kanzi -l 9 -j 1 87.5 90.0 40,361,391
Kanzi -l 9 -j 6 39.2 34.5 40,361,391
zpaq v7.14 m5 t1 343.1 352.0 39,112,924
zpaq v7.14 m5 t12 344.3 350.4 39,112,924

enwik8

i7-7700K @4.20GHz, 32GB RAM, Ubuntu 22.04

go1.19.beta1

Kanzi version 2.1 Go implementation. Block size is 100 MB. 1 thread

Compressor Encoding (sec) Decoding (sec) Size
Original 100,000,000
Kanzi -l 1 -j 1 1.10 0.42 37,969,539
Kanzi -l 2 -j 1 1.90 0.63 30,953,719
Kanzi -l 3 -j 1 2.68 1.05 27,362,969
Kanzi -l 4 -j 1 4.73 2.82 25,670,924
Kanzi -l 5 -j 1 4.85 1.94 22,490,875
Kanzi -l 6 -j 1 6.57 3.32 21,232,300
Kanzi -l 7 -j 1 9.44 5.98 20,935,519
Kanzi -l 8 -j 1 23.70 23.50 19,671,786
Kanzi -l 9 -j 1 34.09 35.61 19,097,946

Build

It is recommended to use formal releases (see https://github.com/flanglet/kanzi-go/releases).

Else, to build manually from the latest tag, follow the instructions below:

cd $GOPATH/src

mkdir github.com; cd github.com

mkdir flanglet; cd flanglet

git clone https://github.com/flanglet/kanzi-go.git

cd kanzi-go/v2/app

go build Kanzi.go BlockCompressor.go BlockDecompressor.go InfoPrinter.go