• Stars
    star
    673
  • Rank 67,060 (Top 2 %)
  • Language
    C
  • License
    GNU Lesser Genera...
  • Created over 2 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A better and stronger spiritual successor to BZip2.

BZip3

Build

A better, faster and stronger spiritual successor to BZip2. Features higher compression ratios and better performance thanks to a order-0 context mixing entropy coder, a fast Burrows-Wheeler transform code making use of suffix arrays and a RLE with Lempel Ziv+Prediction pass based on LZ77-style string matching and PPM-style context modeling.

Like its ancestor, BZip3 excels at compressing text or code.

Installation

# If using a git clone (not needed for source packages), first...
$ ./bootstrap.sh

# All...
$ ./configure
$ make
$ sudo make install

Alternatively, you might be able to install bzip3 using your system's package manager:

Packaging status

On macOS, you can use Homebrew to easily install:

$ brew install bzip3

Perl source code benchmark

First, I have downloaded every version of Perl5 ever released and decompressed them.

% wget -r -l1 -nH --cut-dirs=2 --no-parent -A.tar.gz --no-directories https://www.cpan.org/src/5.0/
% for g in *.gz; do gunzip $g; done
% ls -la | wc -l
262

Then, I put all the resulting .tar files in a single .tar file and tried to compress it using various compressors:

xz -T16 -9 -k all.tar  10829.91s user 26.91s system 1488% cpu 14658M memory 12:09.24 total
bzip2 -9 -k all.tar  981.78s user 9.77s system 95% cpu 8M memory 17:16.64 total
bzip3 -e -b 256 -j 12 all.tar  2713.81s user 16.28s system 634% cpu 18301M memory 7:10.10 total
bzip3 -e -b 511 -j 4 all.tar  17.65s user 12.19s system 170% cpu 12178M memory 7:08.65 total
zstd -T12 -16 all.tar  4162.94s user 16.40s system 1056% cpu 687M memory 6:35.62 total

The results follow:

  • LZMA (xz) - 2'056'645'240 bytes
  • bzip2 - 3'441'163'911 bytes
  • bzip3 -b 256 - 1'001'957'587 bytes
  • bzip3 -b 511 - 546'456'978 bytes
  • Zstandard - 3'076'143'660 bytes

Finally, wall clock time decompression times (WD Blue HDD):

  • LZMA (xz) - 4min 40s
  • bzip2 - 9min 22s
  • bzip3 (parallel) - 4min 6s
  • Zstandard - 3min 51s

Then, I used lrzip to perform long-range deduplication on the original .tar file:

% time lrzip -n -o all_none.tar.lrz all.tar
546.17s user 160.87s system 102% cpu 10970M memory 11:28.00 total

% time lrzip --lzma -o all_lzma.tar.lrz all.tar
702.16s user 161.87s system 122% cpu 10792M memory 11:44.83 total

% time lrzip -b -o all_bzip2.tar.lrz all.tar
563.93s user 147.38s system 112% cpu 10970M memory 10:34.10 total

Finally, I compressed the resulting none.tar.lrz file using bzip3:

% time bzip3 -e -b 256 -j 2 all_none.tar.lrz
32.05s user 0.76s system 146% cpu 2751M memory 22.411 total

The results follow:

  • lrzip + bzip3 - 60'672'608 bytes.
  • lrzip + lzma - 64'774'202 bytes.
  • lrzip + bzip2 - 75'685'065 bytes.

For further benchmarks against Turbo-Range-Coder and BSC, check powturbo's benchmark of bzip3, bzip2, bsc and others.

Disclaimers

I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE USE OF THIS PROGRAM/LIBRARY, HOWSOEVER CAUSED.

Every compression of a file implies an assumption that the compressed file can be decompressed to reproduce the original. Great efforts in design, coding and testing have been made to ensure that this program works correctly.

However, the complexity of the algorithms, and, in particular, the presence of various special cases in the code which occur with very low but non-zero probability make it impossible to rule out the possibility of bugs remaining in the program.

DO NOT COMPRESS ANY DATA WITH THIS PROGRAM UNLESS YOU ARE PREPARED TO ACCEPT THE POSSIBILITY, HOWEVER SMALL, THAT THE DATA WILL NOT BE RECOVERABLE.

That is not to say this program is inherently unreliable. Indeed, I very much hope the opposite is true. Bzip3/libbz3 has been carefully constructed and extensively tested.

Bzip3's performance is heavily dependent on the compiler. x64 Linux clang13 builds usually can go as high as 17MiB/s compression and 23MiB/s decompression per thread. Windows and 32-bit builds might be considerably slower.

Bzip3 has been tested on the following architectures:

  • x86
  • x86_64
  • armv6
  • armv7
  • aarch64
  • ppc64le
  • mips
  • mips64
  • sparc
  • s390x

Corpus benchmarks

visualisation of the benchmarks

Check etc/BENCHMARKS.md for more results.

Licensing

A breakdown of components and their licenses follows:

  • (runtime) The codebase as a whole: Copyright 2022-2023, Kamila Szewczyk ([email protected]); LGPL (LICENSE)
  • (runtime) The Burrows-Wheeler transform (libsais) and LZP code: 2021-2022, Ilya Grebnov ([email protected]); Apache 2.0 (3rdparty/libsais-LICENSE)
  • (compile-time) build-aux: Copyright 2011, Daniel Richard G ([email protected]), 2019, Marc Stevens ([email protected]), 2008, Steven G. Johnson ([email protected]); GPL-3+ with AutoConf exception
  • (compile-time) build-aux/ax_check_compile_flag.m4: Copyright 2008, Guido U. Draheim ([email protected]), 2011, Maarten Bosmans ([email protected]); FSFAP
  • (compile-time) build-aux/git-version-gen: Copyright 2007-2012, Free Software Foundation, Inc; GPLv3
  • (runtime) bz3grep: Copyright 2003, Thomas Klausner; BSD-2-clause
  • (runtime) include/getopt-shim.h: Copyright 2005-2014, Rich Felker; Expat

bzip3 as a whole is licensed under LGPLv3 only. It is not dual-licensed under LGPLv3 and Apache 2.0.

Thanks

  • Ilya Grebnov for his libsais library used for BWT construction in BZip3 and the LZP encoder which I had used as a reference implementation to improve myself.
  • Caleb Maclennan for configuring autotools as a packaging-friendly build system for BZip3.
  • Ilya Muravyov for his public domain BWT post-coder, a derivative of which is used in this project.

More Repositories

1

malbolge-lisp

A lightweight (350MB) Lisp interpreter in Malbolge Unshackled, often dubbed the hardest turing complete programming language.
TeX
484
star
2

kamilalisp

A functional, flexible and concise Lisp.
Java
271
star
3

C-Learning-Resources

Resources for learning C that are the best in my opinion.
116
star
4

asmbf

The only true brainfuck-targetting assembler.
C
108
star
5

qbdiff

building and applying patches to binary files
C
67
star
6

sdlgames

A collection of small games made using SDL and C/++.
C++
23
star
7

tinyz80

A minimal Z80 implementation.
C
21
star
8

A-Programming-Language

An effort to transcribe Ken Iverson's "A Programming Language" book to LaTeX.
TeX
18
star
9

blc-mb

Binary Lambda Calculus evaluation engine written in Malbolge.
C
17
star
10

tiny6502

a small (~140 line) and portable 6502 emulator demo.
Assembly
16
star
11

typeracer

A ncurses-powered typing game
C
16
star
12

modern-rzip

A backup suite. Supports FLZMA2, bzip3, LZ4, Zstandard, LSH i-node ordering deduplicating archiver, long range deduplication, encryption and recovery records
C
15
star
13

Maja

A slick numerics-oriented Mathematical library for Java
Java
15
star
14

tau

a reasonably fast syntax highlighter
C
13
star
15

writings

a single place to collectively store every bit of my writings i deem at least remotely valuable.
TeX
13
star
16

lc-apl

Journey to the Center of the Lambda Calculus
APL
12
star
17

asm2ws

alpha-grade whitespace toolchain
C
11
star
18

lz4huf

An attempt to marry a fast Lempel-Ziv codec (LZ4) with a fast entropy coder (Huff0).
C
10
star
19

ski

a 666-byte, public domain SKI combinator calculus evaluator in C, minsky machines and other stuff
Perl
9
star
20

mblzp

A lightweight (8MB) implementation of the McIlroy-Tamayo Lempel-Ziv variation in Malbolge Unshackled.
C
9
star
21

mri

minecraft region interchange - compressing minecraft savefiles with bzip3.
C
9
star
22

compression

playing with small decompressors and good ratios :)
9
star
23

adler32-sse2

Adler32 implementation used in Alpha64 at ~13GiB/s in a 1 kilobyte binary.
Assembly
8
star
24

ski-windows

a 976-byte, GUI SKI calculus evaluator written in x86 assembly for windows
Assembly
8
star
25

nonalphanumeric-c

A compiler targetting a subset of C which doesn't use letters nor numbers.
C
8
star
26

cosmopolitan-sk

SK calculus reducer in as many programming languages as possible.
TSQL
7
star
27

elfdude

a small & primitive elf32 packer.
Assembly
7
star
28

rezip

Turn any archive format supported by libarchive into an uncompressed zip file for better archiving purposes.
C
6
star
29

cursed-asm

Use AT&T syntax on even lines and Intel syntax on odd lines
Assembly
6
star
30

LambdaCalculus

Dead simple implementation of Lambda Calculus.
C
6
star
31

dirac

Delightfully Intricate Reasonably Amazing Calculator
C
6
star
32

aoc2023

Advent Of Code 2023 in APL, Haskell and C.
Haskell
6
star
33

recreational

My submissions for Code guessing, Code golf and other recreational programming things.
C
6
star
34

b2all

Collection of brainfuck-to-anything compilers in brainfuck.
Brainfuck
6
star
35

dev-urandom

An assortment of random programs that serve random purposes.
C++
6
star
36

MacroLogger

Simple C logging library utilizing only C89 preprocessor and standard library. It's possible to turn on ANSI colors and GNU preprocessor extensions (and they are by default)).
C
6
star
37

esofun

esofun is an array, imperative, procedural and functional language mix.
C++
5
star
38

euler-apl

project euler solved in APL
APL
5
star
39

apl-misc-math

Miscellaneous mathematical and numerical utilities in APL.
APL
5
star
40

x86lisp

2158 byte Lisp interpreter for Windows.
5
star
41

cask

An alternative way to package Java applications.
Java
4
star
42

e8e9

A command-line wrapper for Shelwien's e8e9 algorithm.
C
4
star
43

dx

Domain eXtensions for Dyalog APL
APL
3
star
44

elf-infection

Source code for https://palaiologos.rocks/essays/posts/elf-infection/
Assembly
3
star
45

snakes

A KoTH programming game host for the Esolangs Discord server.
C
3
star
46

apl-logic

Logic gate system emulation in APL.
APL
3
star
47

apio

a very good and not bad c++ utility library.
C++
3
star
48

constant-overhead

Measuring constant overhead of various programming languages.
C
3
star
49

yarg

Yet another UNIX-like argument parser for C. CC0-licensed.
C
2
star
50

Proton

Proton is toolkit for desktop app creation in ActionScript3
AutoIt
2
star
51

dyalog-hs

Dyalog APL Competition solved with Haskell.
Haskell
2
star
52

yaspell

Yet Another Spellchecker. WIP.
Shell
2
star
53

sublime-v2

sublime: an asm2bf execution bot written in typescript
Perl
2
star
54

rust-jni-template

A template for developing JNI libraries using the Rust programming language
Java
2
star
55

euler

Project Euler solved in Brainfuck.
Brainfuck
2
star
56

kspalaiologos

Github description, I guess.
2
star
57

Chess.fl

Chess.fl is simple Flash library for managing chessboards.
ActionScript
1
star
58

minits

a crude testing platform used by malbolgelisp ca. August of 2021, preserved for historical reasons.
TypeScript
1
star
59

TinyURL-Shortener

TinyURL frontend script written in Bash. Requires cURL to be installed on target machine.
Shell
1
star
60

dotfiles

A stripped down version of my dotfiles hopefully suitable for use by others.
Shell
1
star
61

ILoCore-Release

Binaries of ILoCore Minecraft 1.14 plugin.
1
star
62

Chemal

The chemical balancer
ActionScript
1
star
63

code-guessing-backend

a (yet incomplete), fully automated code guessing automation platform.
Shell
1
star
64

namechanger

simple bash script to replace one pattern with another for each of your github repositories
Shell
1
star
65

sdlmine

A SDL2 minesweeper in 1000 lines of code.
C
1
star
66

sdlreversi

A reversi/othello game under 500 lines of C++ code.
C++
1
star
67

MANIAC-2

Replica of the first computer to beat human in a chess-like game.
C
1
star
68

ctlsh

A C port of the Trend Micro Locality Sensitive Hashing library.
C
1
star
69

ppmdj1

A Github mirror of Dymitry Shkarin's PPMd var. J
C++
1
star
70

xpar

an error/erasure code system guarding data integrity
C
1
star
71

kcrypt2

a proof-of-concept for cryptographic systems based on polynomial reconstruction.
C
1
star