• Stars
    star
    325
  • Rank 129,350 (Top 3 %)
  • Language
    Rust
  • License
    The Unlicense
  • Created over 3 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Like pigz, but rust

🦀 crabz

Build Status license Version info
Like pigz, but rust.

A cross platform, fast, compression and decompression tool.

Synopsis

This is currently a proof of concept CLI tool using the gzp crate.

Supported formats:

  • Gzip
  • Zlib
  • Mgzip
  • BGZF
  • Raw Deflate
  • Snap

Install

  • Homebrew / Linuxbrew
brew tap sstadick/crabz
brew install crabz
  • Debian (Ubuntu)
curl -LO https://github.com/sstadick/crabz/releases/download/<latest>/crabz-linux-amd64.deb
sudo dpkg -i crabz-linux-amd64.deb
  • Cargo
cargo install crabz
  • Conda
conda install -c conda-forge crabz

Usage

❯ crabz -h              
Compress and decompress files

USAGE:
    crabz [FLAGS] [OPTIONS] [FILE]

FLAGS:
    -d, --decompress    
            Flag to switch to decompressing inputs. Note: this flag may change in future releases

    -h, --help          
            Prints help information

    -I, --in-place      
            Perform the compression / decompression in place.
            
            **NOTE** this will remove the input file at completion.
    -V, --version       
            Prints version information


OPTIONS:
    -l, --compression-level <compression-level>        
            Compression level [default: 6]

    -p, --compression-threads <compression-threads>
            Number of compression threads to use, or if decompressing a format that allow for multi-threaded
            decompression, the number to use. Note that > 4 threads for decompression doesn't seem to help [default:
            32]
    -f, --format <format>
            The format to use [default: gzip]  [possible values: gzip, bgzf, mgzip,
            zlib, deflate, snap]
    -o, --output <output>                              
            Output path to write to, empty or "-" to write to stdout

    -P, --pin-at <pin-at>                              
            Specify the physical core to pin threads at.
            
            This can provide a significant performance improvement, but has the downside of possibly conflicting with
            other pinned cores. If you are running multiple instances of `crabz` at once you can manually space out the
            pinned cores.
            
            # Example
            - Instance 1 has `-p 4 -P 0` set indicating that it will use 4 cores pinned at 0, 1, 2, 3
            - Instance 2 has `-p 4 -P 4` set indicating that it will use 4 cores pinned at 4, 5, 6, 7

ARGS:
    <FILE>    
            Input file to read from, empty or "-" to read from stdin

Benchmarks

These benchmarks use the data in bench-data catted together 100 times. Run with bash ./benchmark.sh data.txt.

Benchmark system specs: Ubuntu 20 AMD Ryzen 9 3950X 16-Core Processor w/ 64 GB DDR4 memory and 1TB NVMe Drive

pigz v2.4 installed via apt on Ubuntu

Takeaways:

  • crabz with zlib backend is pretty much identical to pigz
  • crabz with zlib-ng backend is roughly 30-50% faster than pigz
  • crabz with rust backend is roughly 5-10% faster than pigz

It is already known that zlib-ng is faster than zlib, so none of this is groundbreaking. However, I think crabz gets an an edge due to the following:

  • crabz with deflate_rust backend is using all Rust only code, which is in theory more secure / safe.
  • crabz with zlib-ng is easier to install than pigz with a zlib-ng backend
  • crabz supports more formats than pigz
  • crabz is cross platform and can run on windows

With regards to block formats like Mgzip and BGZF, crabz is using libdeflater by default which excels at compressing and decompression known-sized blocks. This makes block compression formats very fast at a small loss to the compression ratio.

See end of benchmarks section for comparison against bgzip.

As crabz is just a wrapper for the gzp library, the most exciting thing about these benchmarks is that gzp is on par with best in class CLI tools for multi-threaded compression and decompression as a library.

Flate2 zlib-ng backend

Compression

Command Mean [s] Min [s] Max [s] Relative
crabz -p 1 -c 3 < ./data.txt 6.450 ± 0.069 6.328 6.540 16.86 ± 0.24
pigz -p 1 -3 < ./data.txt 11.404 ± 0.152 11.186 11.717 29.81 ± 0.49
crabz -p 2 -c 3 < ./data.txt 3.437 ± 0.017 3.418 3.461 8.98 ± 0.10
pigz -p 2 -3 < ./data.txt 5.868 ± 0.031 5.826 5.927 15.34 ± 0.17
crabz -p 4 -c 3 < ./data.txt 1.741 ± 0.008 1.729 1.752 4.55 ± 0.05
pigz -p 4 -3 < ./data.txt 2.952 ± 0.008 2.939 2.960 7.72 ± 0.08
crabz -p 8 -c 3 < ./data.txt 0.889 ± 0.004 0.882 0.895 2.32 ± 0.02
pigz -p 8 -3 < ./data.txt 1.505 ± 0.008 1.493 1.520 3.93 ± 0.04
crabz -p 16 -c 3 < ./data.txt 0.485 ± 0.014 0.457 0.502 1.27 ± 0.04
pigz -p 16 -3 < ./data.txt 0.775 ± 0.011 0.764 0.797 2.02 ± 0.04
crabz -p 32 -c 3 < ./data.txt 0.383 ± 0.004 0.375 0.388 1.00
pigz -p 32 -3 < ./data.txt 0.699 ± 0.029 0.668 0.770 1.83 ± 0.08
crabz -p 1 -c 6 < ./data.txt 10.367 ± 0.211 10.106 10.642 27.10 ± 0.61
pigz -p 1 -6 < ./data.txt 26.734 ± 0.345 26.234 27.135 69.89 ± 1.12
crabz -p 2 -c 6 < ./data.txt 5.366 ± 0.036 5.299 5.429 14.03 ± 0.16
pigz -p 2 -6 < ./data.txt 13.589 ± 0.083 13.428 13.679 35.52 ± 0.40
crabz -p 4 -c 6 < ./data.txt 2.719 ± 0.021 2.694 2.757 7.11 ± 0.09
pigz -p 4 -6 < ./data.txt 6.887 ± 0.013 6.871 6.916 18.00 ± 0.17
crabz -p 8 -c 6 < ./data.txt 1.381 ± 0.007 1.372 1.397 3.61 ± 0.04
pigz -p 8 -6 < ./data.txt 3.479 ± 0.008 3.463 3.488 9.09 ± 0.09
crabz -p 16 -c 6 < ./data.txt 0.745 ± 0.022 0.727 0.804 1.95 ± 0.06
pigz -p 16 -6 < ./data.txt 1.818 ± 0.036 1.765 1.874 4.75 ± 0.10
crabz -p 32 -c 6 < ./data.txt 0.549 ± 0.006 0.538 0.557 1.44 ± 0.02
pigz -p 32 -6 < ./data.txt 1.187 ± 0.011 1.172 1.210 3.10 ± 0.04
crabz -p 1 -c 9 < ./data.txt 30.114 ± 0.196 29.842 30.420 78.72 ± 0.90
pigz -p 1 -9 < ./data.txt 51.369 ± 0.164 51.246 51.698 134.29 ± 1.33
crabz -p 2 -c 9 < ./data.txt 15.371 ± 0.070 15.202 15.443 40.18 ± 0.42
pigz -p 2 -9 < ./data.txt 26.452 ± 0.085 26.253 26.576 69.15 ± 0.69
crabz -p 4 -c 9 < ./data.txt 7.729 ± 0.022 7.699 7.768 20.20 ± 0.20
pigz -p 4 -9 < ./data.txt 13.365 ± 0.047 13.271 13.449 34.94 ± 0.35
crabz -p 8 -c 9 < ./data.txt 3.901 ± 0.006 3.889 3.910 10.20 ± 0.10
pigz -p 8 -9 < ./data.txt 6.749 ± 0.014 6.737 6.781 17.64 ± 0.17
crabz -p 16 -c 9 < ./data.txt 2.039 ± 0.024 1.997 2.071 5.33 ± 0.08
pigz -p 16 -9 < ./data.txt 3.486 ± 0.054 3.426 3.574 9.11 ± 0.17
crabz -p 32 -c 9 < ./data.txt 1.337 ± 0.072 1.220 1.411 3.49 ± 0.19
pigz -p 32 -9 < ./data.txt 2.203 ± 0.114 2.082 2.378 5.76 ± 0.30

Decompression

Command Mean [s] Min [s] Max [s] Relative
crabz -d < ./data.3.txt.gz 1.422 ± 0.010 1.411 1.437 1.03 ± 0.02
pigz -d < ./data.3.txt.gz 1.674 ± 0.031 1.621 1.705 1.21 ± 0.03
crabz -d < ./data.6.txt.gz 1.403 ± 0.016 1.389 1.427 1.01 ± 0.02
pigz -d < ./data.6.txt.gz 1.724 ± 0.026 1.697 1.766 1.24 ± 0.02
crabz -d < ./data.9.txt.gz 1.385 ± 0.018 1.359 1.416 1.00
pigz -d < ./data.9.txt.gz 1.745 ± 0.044 1.684 1.797 1.26 ± 0.04

Flate2 zlib backend

Compression

Command Mean [s] Min [s] Max [s] Relative
crabz -p 1 -c 3 < ./data.txt 11.248 ± 0.247 11.085 11.532 20.23 ± 0.45
pigz -p 1 -3 < ./data.txt 11.296 ± 0.171 11.104 11.434 20.32 ± 0.31
crabz -p 2 -c 3 < ./data.txt 5.681 ± 0.040 5.645 5.725 10.22 ± 0.08
pigz -p 2 -3 < ./data.txt 5.926 ± 0.015 5.916 5.944 10.66 ± 0.04
crabz -p 4 -c 3 < ./data.txt 2.891 ± 0.007 2.883 2.895 5.20 ± 0.02
pigz -p 4 -3 < ./data.txt 2.966 ± 0.013 2.955 2.980 5.34 ± 0.03
crabz -p 8 -c 3 < ./data.txt 1.461 ± 0.003 1.459 1.465 2.63 ± 0.01
pigz -p 8 -3 < ./data.txt 1.509 ± 0.004 1.505 1.512 2.71 ± 0.01
crabz -p 16 -c 3 < ./data.txt 0.784 ± 0.010 0.775 0.795 1.41 ± 0.02
pigz -p 16 -3 < ./data.txt 0.772 ± 0.010 0.765 0.784 1.39 ± 0.02
crabz -p 32 -c 3 < ./data.txt 0.556 ± 0.002 0.554 0.557 1.00
pigz -p 32 -3 < ./data.txt 0.743 ± 0.047 0.694 0.786 1.34 ± 0.08
crabz -p 1 -c 6 < ./data.txt 26.366 ± 0.154 26.189 26.469 47.42 ± 0.31
pigz -p 1 -6 < ./data.txt 26.688 ± 0.103 26.579 26.783 48.00 ± 0.23
crabz -p 2 -c 6 < ./data.txt 13.443 ± 0.069 13.400 13.523 24.18 ± 0.14
pigz -p 2 -6 < ./data.txt 13.605 ± 0.059 13.567 13.673 24.47 ± 0.13
crabz -p 4 -c 6 < ./data.txt 6.833 ± 0.005 6.828 6.837 12.29 ± 0.03
pigz -p 4 -6 < ./data.txt 6.866 ± 0.028 6.834 6.884 12.35 ± 0.06
crabz -p 8 -c 6 < ./data.txt 3.446 ± 0.000 3.445 3.446 6.20 ± 0.02
pigz -p 8 -6 < ./data.txt 3.482 ± 0.002 3.480 3.483 6.26 ± 0.02
crabz -p 16 -c 6 < ./data.txt 1.822 ± 0.012 1.813 1.835 3.28 ± 0.02
pigz -p 16 -6 < ./data.txt 1.771 ± 0.004 1.767 1.776 3.19 ± 0.01
crabz -p 32 -c 6 < ./data.txt 1.178 ± 0.008 1.171 1.187 2.12 ± 0.02
pigz -p 32 -6 < ./data.txt 1.184 ± 0.001 1.184 1.185 2.13 ± 0.01
crabz -p 1 -c 9 < ./data.txt 52.122 ± 0.288 51.790 52.293 93.75 ± 0.58
pigz -p 1 -9 < ./data.txt 53.031 ± 0.071 52.951 53.085 95.39 ± 0.29
crabz -p 2 -c 9 < ./data.txt 26.287 ± 0.047 26.249 26.339 47.28 ± 0.15
pigz -p 2 -9 < ./data.txt 26.409 ± 0.238 26.190 26.662 47.50 ± 0.45
crabz -p 4 -c 9 < ./data.txt 13.373 ± 0.051 13.317 13.419 24.05 ± 0.11
pigz -p 4 -9 < ./data.txt 13.414 ± 0.035 13.383 13.451 24.13 ± 0.09
crabz -p 8 -c 9 < ./data.txt 6.733 ± 0.003 6.731 6.736 12.11 ± 0.03
pigz -p 8 -9 < ./data.txt 6.763 ± 0.004 6.761 6.767 12.16 ± 0.03
crabz -p 16 -c 9 < ./data.txt 3.487 ± 0.034 3.450 3.517 6.27 ± 0.06
pigz -p 16 -9 < ./data.txt 3.459 ± 0.021 3.434 3.473 6.22 ± 0.04
crabz -p 32 -c 9 < ./data.txt 2.088 ± 0.008 2.081 2.097 3.76 ± 0.02
pigz -p 32 -9 < ./data.txt 2.107 ± 0.023 2.090 2.133 3.79 ± 0.04

Decompression

Flate2 rust backend

Compression

Command Mean [s] Min [s] Max [s] Relative
crabz -p 1 -c 3 < ./data.txt 10.167 ± 0.164 10.050 10.355 18.57 ± 0.33
pigz -p 1 -3 < ./data.txt 11.338 ± 0.071 11.292 11.420 20.71 ± 0.21
crabz -p 2 -c 3 < ./data.txt 4.912 ± 0.013 4.898 4.920 8.97 ± 0.08
pigz -p 2 -3 < ./data.txt 5.876 ± 0.047 5.826 5.919 10.73 ± 0.12
crabz -p 4 -c 3 < ./data.txt 2.463 ± 0.018 2.447 2.482 4.50 ± 0.05
pigz -p 4 -3 < ./data.txt 2.967 ± 0.008 2.958 2.972 5.42 ± 0.05
crabz -p 8 -c 3 < ./data.txt 1.255 ± 0.005 1.250 1.261 2.29 ± 0.02
pigz -p 8 -3 < ./data.txt 1.509 ± 0.002 1.507 1.511 2.76 ± 0.02
crabz -p 16 -c 3 < ./data.txt 0.705 ± 0.030 0.673 0.731 1.29 ± 0.05
pigz -p 16 -3 < ./data.txt 0.780 ± 0.015 0.768 0.797 1.42 ± 0.03
crabz -p 32 -c 3 < ./data.txt 0.547 ± 0.004 0.544 0.552 1.00
pigz -p 32 -3 < ./data.txt 0.755 ± 0.025 0.726 0.771 1.38 ± 0.05
crabz -p 1 -c 6 < ./data.txt 27.064 ± 0.288 26.863 27.394 49.44 ± 0.66
pigz -p 1 -6 < ./data.txt 27.034 ± 0.090 26.938 27.117 49.38 ± 0.43
crabz -p 2 -c 6 < ./data.txt 12.400 ± 0.083 12.321 12.487 22.65 ± 0.24
pigz -p 2 -6 < ./data.txt 13.619 ± 0.074 13.558 13.702 24.88 ± 0.24
crabz -p 4 -c 6 < ./data.txt 6.279 ± 0.023 6.263 6.305 11.47 ± 0.10
pigz -p 4 -6 < ./data.txt 6.879 ± 0.020 6.867 6.901 12.57 ± 0.11
crabz -p 8 -c 6 < ./data.txt 3.189 ± 0.010 3.178 3.198 5.83 ± 0.05
pigz -p 8 -6 < ./data.txt 3.477 ± 0.007 3.470 3.483 6.35 ± 0.05
crabz -p 16 -c 6 < ./data.txt 1.756 ± 0.015 1.740 1.771 3.21 ± 0.04
pigz -p 16 -6 < ./data.txt 1.799 ± 0.024 1.779 1.827 3.29 ± 0.05
crabz -p 32 -c 6 < ./data.txt 1.192 ± 0.011 1.183 1.205 2.18 ± 0.03
pigz -p 32 -6 < ./data.txt 1.196 ± 0.016 1.183 1.214 2.19 ± 0.03
crabz -p 1 -c 9 < ./data.txt 44.907 ± 0.283 44.585 45.116 82.03 ± 0.84
pigz -p 1 -9 < ./data.txt 53.109 ± 1.049 52.373 54.311 97.02 ± 2.07
crabz -p 2 -c 9 < ./data.txt 19.977 ± 0.159 19.819 20.136 36.49 ± 0.41
pigz -p 2 -9 < ./data.txt 26.562 ± 0.134 26.407 26.643 48.52 ± 0.46
crabz -p 4 -c 9 < ./data.txt 10.397 ± 0.484 10.070 10.953 18.99 ± 0.90
pigz -p 4 -9 < ./data.txt 13.346 ± 0.040 13.300 13.372 24.38 ± 0.21
crabz -p 8 -c 9 < ./data.txt 5.100 ± 0.021 5.076 5.114 9.32 ± 0.08
pigz -p 8 -9 < ./data.txt 6.754 ± 0.016 6.736 6.767 12.34 ± 0.10
crabz -p 16 -c 9 < ./data.txt 2.716 ± 0.014 2.708 2.732 4.96 ± 0.05
pigz -p 16 -9 < ./data.txt 3.444 ± 0.038 3.420 3.487 6.29 ± 0.09
crabz -p 32 -c 9 < ./data.txt 1.747 ± 0.009 1.740 1.758 3.19 ± 0.03
pigz -p 32 -9 < ./data.txt 2.086 ± 0.008 2.077 2.093 3.81 ± 0.03

Decompression

Command Mean [s] Min [s] Max [s] Relative
crabz -d < ./data.3.txt.gz 1.599 ± 0.014 1.573 1.615 1.00
pigz -d < ./data.3.txt.gz 1.696 ± 0.020 1.654 1.725 1.06 ± 0.02
crabz -d < ./data.6.txt.gz 1.615 ± 0.012 1.586 1.626 1.01 ± 0.01
pigz -d < ./data.6.txt.gz 1.760 ± 0.030 1.687 1.797 1.10 ± 0.02
crabz -d < ./data.9.txt.gz 1.613 ± 0.014 1.596 1.641 1.01 ± 0.01
pigz -d < ./data.9.txt.gz 1.767 ± 0.012 1.748 1.787 1.11 ± 0.01

Block Formats with libdeflater

Decompression

Command Mean [s] Min [s] Max [s] Relative
crabz -p 1 -d -f mgzip ./bdata.3.txt.gz > data.txt 1.221 ± 0.164 1.073 1.397 2.32 ± 0.31
pigz -d -c ./bdata.3.txt.gz > data.txt 2.415 ± 0.063 2.347 2.472 4.58 ± 0.14
crabz -p 1 -d -f mgzip ./bdata.6.txt.gz > data.txt 1.256 ± 0.063 1.200 1.325 2.38 ± 0.13
pigz -d -c ./bdata.6.txt.gz > data.txt 2.513 ± 0.052 2.467 2.569 4.77 ± 0.13
crabz -p 1 -d -f mgzip ./bdata.9.txt.gz > data.txt 1.147 ± 0.065 1.094 1.219 2.18 ± 0.13
pigz -d -c ./bdata.9.txt.gz > data.txt 2.394 ± 0.118 2.262 2.488 4.54 ± 0.24
crabz -p 1 -d -f mgzip ./bdata.12.txt.gz > data.txt 1.165 ± 0.074 1.106 1.248 2.21 ± 0.15
pigz -d -c ./bdata.12.txt.gz > data.txt 2.457 ± 0.067 2.408 2.534 4.66 ± 0.15
crabz -p 2 -d -f mgzip ./bdata.3.txt.gz > data.txt 0.634 ± 0.008 0.628 0.642 1.20 ± 0.03
pigz -d -c ./bdata.3.txt.gz > data.txt 2.379 ± 0.012 2.368 2.391 4.51 ± 0.08
crabz -p 2 -d -f mgzip ./bdata.6.txt.gz > data.txt 0.645 ± 0.015 0.629 0.658 1.22 ± 0.03
pigz -d -c ./bdata.6.txt.gz > data.txt 2.438 ± 0.073 2.356 2.497 4.62 ± 0.16
crabz -p 2 -d -f mgzip ./bdata.9.txt.gz > data.txt 0.659 ± 0.015 0.644 0.674 1.25 ± 0.04
pigz -d -c ./bdata.9.txt.gz > data.txt 2.451 ± 0.075 2.400 2.538 4.65 ± 0.16
crabz -p 2 -d -f mgzip ./bdata.12.txt.gz > data.txt 0.656 ± 0.015 0.647 0.673 1.24 ± 0.04
pigz -d -c ./bdata.12.txt.gz > data.txt 2.450 ± 0.045 2.412 2.500 4.65 ± 0.12
crabz -p 4 -d -f mgzip ./bdata.3.txt.gz > data.txt 0.577 ± 0.024 0.554 0.603 1.10 ± 0.05
pigz -d -c ./bdata.3.txt.gz > data.txt 2.459 ± 0.052 2.420 2.518 4.66 ± 0.13
crabz -p 4 -d -f mgzip ./bdata.6.txt.gz > data.txt 0.559 ± 0.024 0.531 0.576 1.06 ± 0.05
pigz -d -c ./bdata.6.txt.gz > data.txt 2.538 ± 0.044 2.502 2.587 4.81 ± 0.12
crabz -p 4 -d -f mgzip ./bdata.9.txt.gz > data.txt 0.552 ± 0.011 0.539 0.560 1.05 ± 0.03
pigz -d -c ./bdata.9.txt.gz > data.txt 2.402 ± 0.018 2.385 2.420 4.56 ± 0.08
crabz -p 4 -d -f mgzip ./bdata.12.txt.gz > data.txt 0.592 ± 0.040 0.546 0.616 1.12 ± 0.08
pigz -d -c ./bdata.12.txt.gz > data.txt 2.525 ± 0.038 2.484 2.558 4.79 ± 0.11
crabz -p 8 -d -f mgzip ./bdata.3.txt.gz > data.txt 0.563 ± 0.013 0.548 0.571 1.07 ± 0.03
pigz -d -c ./bdata.3.txt.gz > data.txt 2.490 ± 0.126 2.369 2.621 4.72 ± 0.25
crabz -p 8 -d -f mgzip ./bdata.6.txt.gz > data.txt 0.552 ± 0.018 0.533 0.569 1.05 ± 0.04
pigz -d -c ./bdata.6.txt.gz > data.txt 2.531 ± 0.115 2.417 2.647 4.80 ± 0.23
crabz -p 8 -d -f mgzip ./bdata.9.txt.gz > data.txt 0.603 ± 0.029 0.583 0.636 1.14 ± 0.06
pigz -d -c ./bdata.9.txt.gz > data.txt 2.483 ± 0.042 2.435 2.515 4.71 ± 0.11
crabz -p 8 -d -f mgzip ./bdata.12.txt.gz > data.txt 0.527 ± 0.009 0.519 0.537 1.00
pigz -d -c ./bdata.12.txt.gz > data.txt 2.524 ± 0.093 2.417 2.583 4.79 ± 0.19
crabz -p 16 -d -f mgzip ./bdata.3.txt.gz > data.txt 0.603 ± 0.058 0.551 0.665 1.14 ± 0.11
pigz -d -c ./bdata.3.txt.gz > data.txt 2.392 ± 0.007 2.384 2.397 4.54 ± 0.08
crabz -p 16 -d -f mgzip ./bdata.6.txt.gz > data.txt 0.611 ± 0.065 0.565 0.686 1.16 ± 0.13
pigz -d -c ./bdata.6.txt.gz > data.txt 2.593 ± 0.148 2.427 2.712 4.92 ± 0.29
crabz -p 16 -d -f mgzip ./bdata.9.txt.gz > data.txt 0.564 ± 0.027 0.541 0.594 1.07 ± 0.05
pigz -d -c ./bdata.9.txt.gz > data.txt 2.426 ± 0.023 2.404 2.450 4.60 ± 0.09
crabz -p 16 -d -f mgzip ./bdata.12.txt.gz > data.txt 0.601 ± 0.020 0.582 0.623 1.14 ± 0.04
pigz -d -c ./bdata.12.txt.gz > data.txt 2.528 ± 0.022 2.507 2.550 4.80 ± 0.09
crabz -p 32 -d -f mgzip ./bdata.3.txt.gz > data.txt 0.595 ± 0.019 0.577 0.614 1.13 ± 0.04
pigz -d -c ./bdata.3.txt.gz > data.txt 2.544 ± 0.107 2.422 2.621 4.83 ± 0.22
crabz -p 32 -d -f mgzip ./bdata.6.txt.gz > data.txt 0.601 ± 0.021 0.586 0.626 1.14 ± 0.05
pigz -d -c ./bdata.6.txt.gz > data.txt 2.519 ± 0.114 2.435 2.649 4.78 ± 0.23
crabz -p 32 -d -f mgzip ./bdata.9.txt.gz > data.txt 0.565 ± 0.023 0.539 0.579 1.07 ± 0.05
pigz -d -c ./bdata.9.txt.gz > data.txt 2.487 ± 0.064 2.415 2.540 4.72 ± 0.15
crabz -p 32 -d -f mgzip ./bdata.12.txt.gz > data.txt 0.557 ± 0.013 0.548 0.571 1.06 ± 0.03
pigz -d -c ./bdata.12.txt.gz > data.txt 2.505 ± 0.105 2.442 2.626 4.75 ± 0.22

crabz, pigz, and bgzip

These benchmarks were run on the all_train.csv data found here

Compression

Command Mean [s] Min [s] Max [s] Relative
crabz -p 2 -P 0 -l 2 -f bgzf ./data.txt > ./data.out.txt.gz 15.837 ± 0.137 15.688 15.959 5.52 ± 0.13
bgzip -c -@ 2 -l 2 ./data.txt > ./data.out.txt.gz 19.471 ± 0.178 19.268 19.602 6.78 ± 0.16
crabz -p 2 -P 0 -l 2 -f gzip ./data.txt > ./data.out.txt.gz 19.723 ± 0.632 19.285 20.448 6.87 ± 0.26
pigz -c -p 2 -2 ./data.txt > ./data.out.txt.gz 32.249 ± 0.024 32.226 32.274 11.24 ± 0.24
crabz -p 4 -P 0 -l 2 -f bgzf ./data.txt > ./data.out.txt.gz 8.601 ± 0.538 8.040 9.113 3.00 ± 0.20
bgzip -c -@ 4 -l 2 ./data.txt > ./data.out.txt.gz 10.953 ± 0.033 10.929 10.990 3.82 ± 0.08
crabz -p 4 -P 0 -l 2 -f gzip ./data.txt > ./data.out.txt.gz 10.887 ± 0.584 10.236 11.364 3.79 ± 0.22
pigz -c -p 4 -2 ./data.txt > ./data.out.txt.gz 16.493 ± 0.323 16.257 16.861 5.75 ± 0.17
crabz -p 8 -P 0 -l 2 -f bgzf ./data.txt > ./data.out.txt.gz 5.206 ± 0.372 4.780 5.464 1.81 ± 0.14
bgzip -c -@ 8 -l 2 ./data.txt > ./data.out.txt.gz 6.920 ± 0.033 6.893 6.957 2.41 ± 0.05
crabz -p 8 -P 0 -l 2 -f gzip ./data.txt > ./data.out.txt.gz 5.893 ± 0.135 5.777 6.041 2.05 ± 0.06
pigz -c -p 8 -2 ./data.txt > ./data.out.txt.gz 8.974 ± 0.467 8.553 9.477 3.13 ± 0.18
crabz -p 16 -P 0 -l 2 -f bgzf ./data.txt > ./data.out.txt.gz 2.870 ± 0.061 2.816 2.936 1.00
bgzip -c -@ 16 -l 2 ./data.txt > ./data.out.txt.gz 5.124 ± 0.107 5.040 5.244 1.79 ± 0.05
crabz -p 16 -P 0 -l 2 -f gzip ./data.txt > ./data.out.txt.gz 4.250 ± 0.323 3.933 4.579 1.48 ± 0.12
pigz -c -p 16 -2 ./data.txt > ./data.out.txt.gz 4.767 ± 0.223 4.513 4.933 1.66 ± 0.09
crabz -p 32 -P 0 -l 2 -f bgzf ./data.txt > ./data.out.txt.gz 3.669 ± 0.303 3.320 3.865 1.28 ± 0.11
bgzip -c -@ 32 -l 2 ./data.txt > ./data.out.txt.gz 4.676 ± 0.038 4.632 4.701 1.63 ± 0.04
crabz -p 32 -P 0 -l 2 -f gzip ./data.txt > ./data.out.txt.gz 4.324 ± 0.246 4.143 4.605 1.51 ± 0.09
pigz -c -p 32 -2 ./data.txt > ./data.out.txt.gz 5.854 ± 0.070 5.795 5.931 2.04 ± 0.05
crabz -p 2 -P 0 -l 6 -f bgzf ./data.txt > ./data.out.txt.gz 27.696 ± 0.147 27.593 27.864 9.65 ± 0.21
bgzip -c -@ 2 -l 6 ./data.txt > ./data.out.txt.gz 30.961 ± 0.446 30.446 31.231 10.79 ± 0.28
crabz -p 2 -P 0 -l 6 -f gzip ./data.txt > ./data.out.txt.gz 36.229 ± 0.175 36.092 36.427 12.62 ± 0.27
pigz -c -p 2 -6 ./data.txt > ./data.out.txt.gz 97.175 ± 0.571 96.743 97.823 33.86 ± 0.74
crabz -p 4 -P 0 -l 6 -f bgzf ./data.txt > ./data.out.txt.gz 14.802 ± 0.436 14.316 15.159 5.16 ± 0.19
bgzip -c -@ 4 -l 6 ./data.txt > ./data.out.txt.gz 16.927 ± 0.130 16.789 17.048 5.90 ± 0.13
crabz -p 4 -P 0 -l 6 -f gzip ./data.txt > ./data.out.txt.gz 19.192 ± 0.675 18.629 19.940 6.69 ± 0.27
pigz -c -p 4 -6 ./data.txt > ./data.out.txt.gz 49.305 ± 0.114 49.203 49.429 17.18 ± 0.37
crabz -p 8 -P 0 -l 6 -f bgzf ./data.txt > ./data.out.txt.gz 7.833 ± 0.065 7.784 7.907 2.73 ± 0.06
bgzip -c -@ 8 -l 6 ./data.txt > ./data.out.txt.gz 9.858 ± 0.105 9.739 9.939 3.43 ± 0.08
crabz -p 8 -P 0 -l 6 -f gzip ./data.txt > ./data.out.txt.gz 10.417 ± 0.979 9.626 11.511 3.63 ± 0.35
pigz -c -p 8 -6 ./data.txt > ./data.out.txt.gz 25.276 ± 0.170 25.083 25.404 8.81 ± 0.20
crabz -p 16 -P 0 -l 6 -f bgzf ./data.txt > ./data.out.txt.gz 4.704 ± 0.321 4.337 4.937 1.64 ± 0.12
bgzip -c -@ 16 -l 6 ./data.txt > ./data.out.txt.gz 6.565 ± 0.155 6.429 6.734 2.29 ± 0.07
crabz -p 16 -P 0 -l 6 -f gzip ./data.txt > ./data.out.txt.gz 5.722 ± 0.320 5.530 6.092 1.99 ± 0.12
pigz -c -p 16 -6 ./data.txt > ./data.out.txt.gz 13.673 ± 0.129 13.525 13.762 4.76 ± 0.11
crabz -p 32 -P 0 -l 6 -f bgzf ./data.txt > ./data.out.txt.gz 4.202 ± 0.213 3.957 4.328 1.46 ± 0.08
bgzip -c -@ 32 -l 6 ./data.txt > ./data.out.txt.gz 5.538 ± 0.135 5.395 5.663 1.93 ± 0.06
crabz -p 32 -P 0 -l 6 -f gzip ./data.txt > ./data.out.txt.gz 5.488 ± 0.064 5.423 5.550 1.91 ± 0.05
pigz -c -p 32 -6 ./data.txt > ./data.out.txt.gz 9.079 ± 0.286 8.808 9.379 3.16 ± 0.12
crabz -p 2 -P 0 -l 9 -f bgzf ./data.txt > ./data.out.txt.gz 162.875 ± 0.100 162.778 162.977 56.75 ± 1.20
bgzip -c -@ 2 -l 9 ./data.txt > ./data.out.txt.gz 172.428 ± 0.242 172.207 172.687 60.08 ± 1.27
crabz -p 2 -P 0 -l 9 -f gzip ./data.txt > ./data.out.txt.gz 139.245 ± 0.270 138.974 139.514 48.52 ± 1.03
pigz -c -p 2 -9 ./data.txt > ./data.out.txt.gz 209.645 ± 0.058 209.580 209.691 73.05 ± 1.55
crabz -p 4 -P 0 -l 9 -f bgzf ./data.txt > ./data.out.txt.gz 84.624 ± 0.185 84.414 84.762 29.49 ± 0.63
bgzip -c -@ 4 -l 9 ./data.txt > ./data.out.txt.gz 87.228 ± 0.232 87.053 87.492 30.39 ± 0.65
crabz -p 4 -P 0 -l 9 -f gzip ./data.txt > ./data.out.txt.gz 72.339 ± 0.166 72.187 72.517 25.21 ± 0.54
pigz -c -p 4 -9 ./data.txt > ./data.out.txt.gz 106.579 ± 0.236 106.307 106.731 37.14 ± 0.79
crabz -p 8 -P 0 -l 9 -f bgzf ./data.txt > ./data.out.txt.gz 42.988 ± 0.130 42.905 43.138 14.98 ± 0.32
bgzip -c -@ 8 -l 9 ./data.txt > ./data.out.txt.gz 44.550 ± 0.097 44.449 44.642 15.52 ± 0.33
crabz -p 8 -P 0 -l 9 -f gzip ./data.txt > ./data.out.txt.gz 36.555 ± 0.030 36.521 36.579 12.74 ± 0.27
pigz -c -p 8 -9 ./data.txt > ./data.out.txt.gz 54.047 ± 0.016 54.030 54.062 18.83 ± 0.40
crabz -p 16 -P 0 -l 9 -f bgzf ./data.txt > ./data.out.txt.gz 22.391 ± 0.234 22.154 22.623 7.80 ± 0.18
bgzip -c -@ 16 -l 9 ./data.txt > ./data.out.txt.gz 24.041 ± 0.237 23.813 24.286 8.38 ± 0.20
crabz -p 16 -P 0 -l 9 -f gzip ./data.txt > ./data.out.txt.gz 19.285 ± 0.125 19.141 19.363 6.72 ± 0.15
pigz -c -p 16 -9 ./data.txt > ./data.out.txt.gz 27.645 ± 0.078 27.579 27.731 9.63 ± 0.21
crabz -p 32 -P 0 -l 9 -f bgzf ./data.txt > ./data.out.txt.gz 15.148 ± 0.138 14.992 15.252 5.28 ± 0.12
bgzip -c -@ 32 -l 9 ./data.txt > ./data.out.txt.gz 16.091 ± 0.193 15.874 16.243 5.61 ± 0.14
crabz -p 32 -P 0 -l 9 -f gzip ./data.txt > ./data.out.txt.gz 11.832 ± 0.168 11.637 11.930 4.12 ± 0.11
pigz -c -p 32 -9 ./data.txt > ./data.out.txt.gz 16.912 ± 0.095 16.804 16.982 5.89 ± 0.13

Decompression

Command Mean [s] Min [s] Max [s] Relative
crabz -d -p 4 -f bgzf ./data.txt.gz > ./data.out.txt 5.941 ± 0.172 5.745 6.070 1.11 ± 0.09
bgzip -d -c -@ 4 ./data.txt.gz > ./data.out.txt 5.357 ± 0.407 4.925 5.734 1.00
crabz -d -p 8 -f bgzf ./data.txt.gz > ./data.out.txt 5.569 ± 0.496 5.023 5.990 1.04 ± 0.12
bgzip -d -c -@ 8 ./data.txt.gz > ./data.out.txt 5.867 ± 0.252 5.682 6.154 1.10 ± 0.10
crabz -d -p 16 -f bgzf ./data.txt.gz > ./data.out.txt 5.663 ± 0.240 5.506 5.939 1.06 ± 0.09
bgzip -d -c -@ 16 ./data.txt.gz > ./data.out.txt 5.534 ± 0.124 5.416 5.663 1.03 ± 0.08

TODOs

  • Add some form of auto format detection, even just by file extension

More Repositories

1

hck

A sharp cut(1) clone.
Rust
691
star
2

gzp

Multi-threaded Compression
Rust
156
star
3

perbase

Per-base per-nucleotide depth analysis
Rust
114
star
4

cargo-bundle-licenses

Generate a THIRDPARTY file with all licenses in a cargo project.
Rust
86
star
5

rust-lapper

Rust implementation of a fast, easy, interval tree library nim-lapper
Rust
55
star
6

nython

Build Python Extension Modules for Nim libraries.
Python
52
star
7

ponim

Nim + Python + Poetry = :)
Python
32
star
8

rumi

Rust UMI Directional Adjacency Deduplicator
Rust
14
star
9

ripline

Fast by-line reader from ripgrep
Rust
12
star
10

bam-builder

Wrapper over rust-htslib for building collections of BAM records for testing.
Rust
11
star
11

bioinfo_benchmarks

Language benchmarks that are important for Bioinformatics scripting
Nim
6
star
12

ExtraMojo

A library of nice to have things not found in the current mojo stdlib
Mojo
6
star
13

proglog

Simple, thread-safe, counter based progress logging
Rust
5
star
14

lapper.cr

Crystal port of nim-lapper: a fast genomic intervals query library
Crystal
5
star
15

timfmt

A small utility for formatting things the way Tim prefers.
Rust
4
star
16

nimedlib

Nim wrapper for the Edlib library
Nim
4
star
17

readfq

A packaged version of readfq implementation for reading fastq and fastq formatted files.
Go
3
star
18

ScAIList

Rust implementation of an Augmented Interval List, with a scaling factor.
Rust
2
star
19

dot

data over time
Rust
2
star
20

esc

Small CLI for escaping and unescaping characters in strings
Rust
2
star
21

interval_bakeoff

Test tool for different interval libraries
Rust
2
star
22

scivs

Collection of Data Structures for working with genomic intervals
Scala
2
star
23

ny_lapper

Python wrapper around nim-lapper using nython. Currently just a POC
Nim
1
star
24

ivtools

Rust lib for genomic interval tools.
Rust
1
star
25

aoc-2023

Rust
1
star
26

basebits

A memory efficient encoding for short DNA sequences and some associated operations.
Rust
1
star
27

we-bt

Rust
1
star
28

mash

Mash files together
Rust
1
star
29

cleanse

Small tool to clean up delimited data to make it consumable by standard unix tools
Rust
1
star
30

IntervalLapper.jl

Julia implementation of nim-lapper, a fast and easy interval library tailored for genomic data.
Julia
1
star
31

aoc-lisp-rs

A Lisp to use for Advent of Code, written in Rust
1
star