• Stars
    star
    629
  • Rank 68,697 (Top 2 %)
  • Language
    C
  • License
    GNU Lesser Genera...
  • Created about 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A better and stronger spiritual successor to BZip2.

BZip3

Build

A better, faster and stronger spiritual successor to BZip2. Features higher compression ratios and better performance thanks to a order-0 context mixing entropy coder, a fast Burrows-Wheeler transform code making use of suffix arrays and a RLE with Lempel Ziv+Prediction pass based on LZ77-style string matching and PPM-style context modeling.

Like its ancestor, BZip3 excels at compressing text or code.

Installation

# If using a git clone (not needed for source packages), first...
$ ./bootstrap.sh

# All...
$ ./configure
$ make
$ sudo make install

Alternatively, you might be able to install bzip3 using your system's package manager:

Packaging status

On macOS, you can use Homebrew to easily install:

$ brew install bzip3

Perl source code benchmark

First, I have downloaded every version of Perl5 ever released and decompressed them.

% wget -r -l1 -nH --cut-dirs=2 --no-parent -A.tar.gz --no-directories https://www.cpan.org/src/5.0/
% for g in *.gz; do gunzip $g; done
% ls -la | wc -l
262

Then, I put all the resulting .tar files in a single .tar file and tried to compress it using various compressors:

xz -T16 -9 -k all.tar  10829.91s user 26.91s system 1488% cpu 14658M memory 12:09.24 total
bzip2 -9 -k all.tar  981.78s user 9.77s system 95% cpu 8M memory 17:16.64 total
bzip3 -e -b 256 -j 12 all.tar  2713.81s user 16.28s system 634% cpu 18301M memory 7:10.10 total
bzip3 -e -b 511 -j 4 all.tar  17.65s user 12.19s system 170% cpu 12178M memory 7:08.65 total
zstd -T12 -16 all.tar  4162.94s user 16.40s system 1056% cpu 687M memory 6:35.62 total

The results follow:

  • LZMA (xz) - 2'056'645'240 bytes
  • bzip2 - 3'441'163'911 bytes
  • bzip3 -b 256 - 1'001'957'587 bytes
  • bzip3 -b 511 - 546'456'978 bytes
  • Zstandard - 3'076'143'660 bytes

Finally, wall clock time decompression times (WD Blue HDD):

  • LZMA (xz) - 4min 40s
  • bzip2 - 9min 22s
  • bzip3 (parallel) - 4min 6s
  • Zstandard - 3min 51s

Then, I used lrzip to perform long-range deduplication on the original .tar file:

% time lrzip -n -o all_none.tar.lrz all.tar
546.17s user 160.87s system 102% cpu 10970M memory 11:28.00 total

% time lrzip --lzma -o all_lzma.tar.lrz all.tar
702.16s user 161.87s system 122% cpu 10792M memory 11:44.83 total

% time lrzip -b -o all_bzip2.tar.lrz all.tar
563.93s user 147.38s system 112% cpu 10970M memory 10:34.10 total

Finally, I compressed the resulting none.tar.lrz file using bzip3:

% time bzip3 -e -b 256 -j 2 all_none.tar.lrz
32.05s user 0.76s system 146% cpu 2751M memory 22.411 total

The results follow:

  • lrzip + bzip3 - 60'672'608 bytes.
  • lrzip + lzma - 64'774'202 bytes.
  • lrzip + bzip2 - 75'685'065 bytes.

For further benchmarks against Turbo-Range-Coder and BSC, check powturbo's benchmark of bzip3, bzip2, bsc and others.

Disclaimers

I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE USE OF THIS PROGRAM/LIBRARY, HOWSOEVER CAUSED.

Every compression of a file implies an assumption that the compressed file can be decompressed to reproduce the original. Great efforts in design, coding and testing have been made to ensure that this program works correctly.

However, the complexity of the algorithms, and, in particular, the presence of various special cases in the code which occur with very low but non-zero probability make it impossible to rule out the possibility of bugs remaining in the program.

DO NOT COMPRESS ANY DATA WITH THIS PROGRAM UNLESS YOU ARE PREPARED TO ACCEPT THE POSSIBILITY, HOWEVER SMALL, THAT THE DATA WILL NOT BE RECOVERABLE.

That is not to say this program is inherently unreliable. Indeed, I very much hope the opposite is true. Bzip3/libbz3 has been carefully constructed and extensively tested.

Bzip3's performance is heavily dependent on the compiler. x64 Linux clang13 builds usually can go as high as 17MiB/s compression and 23MiB/s decompression per thread. Windows and 32-bit builds might be considerably slower.

Bzip3 has been tested on the following architectures:

  • x86
  • x86_64
  • armv6
  • armv7
  • aarch64
  • ppc64le
  • mips
  • mips64
  • sparc
  • s390x

Corpus benchmarks

visualisation of the benchmarks

Check etc/BENCHMARKS.md for more results.

Licensing

A breakdown of components and their licenses follows:

  • (runtime) The codebase as a whole: Copyright 2022-2023, Kamila Szewczyk ([email protected]); LGPL (LICENSE)
  • (runtime) The Burrows-Wheeler transform (libsais) and LZP code: 2021-2022, Ilya Grebnov ([email protected]); Apache 2.0 (3rdparty/libsais-LICENSE)
  • (compile-time) build-aux: Copyright 2011, Daniel Richard G ([email protected]), 2019, Marc Stevens ([email protected]), 2008, Steven G. Johnson ([email protected]); GPL-3+ with AutoConf exception
  • (compile-time) build-aux/ax_check_compile_flag.m4: Copyright 2008, Guido U. Draheim ([email protected]), 2011, Maarten Bosmans ([email protected]); FSFAP
  • (compile-time) build-aux/git-version-gen: Copyright 2007-2012, Free Software Foundation, Inc; GPLv3
  • (runtime) bz3grep: Copyright 2003, Thomas Klausner; BSD-2-clause
  • (runtime) include/getopt-shim.h: Copyright 2005-2014, Rich Felker; Expat

bzip3 as a whole is licensed under LGPLv3 only. It is not dual-licensed under LGPLv3 and Apache 2.0.

Thanks

  • Ilya Grebnov for his libsais library used for BWT construction in BZip3 and the LZP encoder which I had used as a reference implementation to improve myself.
  • Caleb Maclennan for configuring autotools as a packaging-friendly build system for BZip3.
  • Ilya Muravyov for his public domain BWT post-coder, a derivative of which is used in this project.

More Repositories

1

malbolge-lisp

A lightweight (350MB) Lisp interpreter in Malbolge Unshackled, often dubbed the hardest turing complete programming language.
TeX
458
star
2

kamilalisp

A functional, flexible and concise Lisp.
Java
254
star
3

asmbf

The only true brainfuck-targetting assembler.
C
101
star
4

C-Learning-Resources

Resources for learning C that are the best in my opinion.
93
star
5

qbdiff

building and applying patches to binary files
C
62
star
6

tinyz80

A minimal Z80 implementation.
C
20
star
7

sdlgames

A collection of small games made using SDL and C/++.
C++
19
star
8

blc-mb

Binary Lambda Calculus evaluation engine written in Malbolge.
C
17
star
9

A-Programming-Language

An effort to transcribe Ken Iverson's "A Programming Language" book to LaTeX.
TeX
17
star
10

tiny6502

a small (~140 line) and portable 6502 emulator demo.
Assembly
16
star
11

typeracer

A ncurses-powered typing game
C
16
star
12

lc-apl

Journey to the Center of the Lambda Calculus
APL
13
star
13

tau

a reasonably fast syntax highlighter
C
13
star
14

writings

a single place to collectively store every bit of my writings i deem at least remotely valuable.
TeX
13
star
15

modern-rzip

A backup suite. Supports FLZMA2, bzip3, LZ4, Zstandard, LSH i-node ordering deduplicating archiver, long range deduplication, encryption and recovery records
C
13
star
16

Maja

A slick numerics-oriented Mathematical library for Java
Java
13
star
17

asm2ws

alpha-grade whitespace toolchain
C
11
star
18

lz4huf

An attempt to marry a fast Lempel-Ziv codec (LZ4) with a fast entropy coder (Huff0).
C
10
star
19

ski

a 666-byte, public domain SKI combinator calculus evaluator in C, minsky machines and other stuff
Perl
9
star
20

mri

minecraft region interchange - compressing minecraft savefiles with bzip3.
C
9
star
21

compression

playing with small decompressors and good ratios :)
9
star
22

adler32-sse2

Adler32 implementation used in Alpha64 at ~13GiB/s in a 1 kilobyte binary.
Assembly
8
star
23

ski-windows

a 976-byte, GUI SKI calculus evaluator written in x86 assembly for windows
Assembly
8
star
24

nonalphanumeric-c

A compiler targetting a subset of C which doesn't use letters nor numbers.
C
8
star
25

JSONFormatter

JSON formatting service in... Brainfuck?
Brainfuck
7
star
26

cosmopolitan-sk

SK calculus reducer in as many programming languages as possible.
TSQL
7
star
27

rezip

Turn any archive format supported by libarchive into an uncompressed zip file for better archiving purposes.
C
6
star
28

cursed-asm

Use AT&T syntax on even lines and Intel syntax on odd lines
Assembly
6
star
29

LambdaCalculus

Dead simple implementation of Lambda Calculus.
C
6
star
30

dirac

Delightfully Intricate Reasonably Amazing Calculator
C
6
star
31

aoc2023

Advent Of Code 2023 in APL, Haskell and C.
Haskell
6
star
32

b2all

Collection of brainfuck-to-anything compilers in brainfuck.
Brainfuck
6
star
33

esofun

esofun is an array, imperative, procedural and functional language mix.
C++
5
star
34

euler-apl

project euler solved in APL
APL
5
star
35

recreational

My submissions for Code guessing, Code golf and other recreational programming things.
C
5
star
36

apl-misc-math

Miscellaneous mathematical and numerical utilities in APL.
APL
5
star
37

MacroLogger

Simple C logging library utilizing only C89 preprocessor and standard library. It's possible to turn on ANSI colors and GNU preprocessor extensions (and they are by default)).
C
5
star
38

MerseneTuringCompletness

Proof that Mersenne Twister is able to generate program capable of simulating any algorithm's logic that is possible.
TeX
5
star
39

x86lisp

2158 byte Lisp interpreter for Windows.
5
star
40

cask

An alternative way to package Java applications.
Java
4
star
41

e8e9

A command-line wrapper for Shelwien's e8e9 algorithm.
C
4
star
42

apl-logic

Logic gate system emulation in APL.
APL
3
star
43

dx

Domain eXtensions for Dyalog APL
APL
3
star
44

esologs

Logs of the #esoteric channel on freenode.net
3
star
45

dyalog-hs

Dyalog APL Competition solved with Haskell.
Haskell
3
star
46

snakes

A KoTH programming game host for the Esolangs Discord server.
C
3
star
47

apio

a very good and not bad c++ utility library.
C++
3
star
48

YouAreAnIdiot

Classic You Are An Idiot virus that was very popular ca. 2004
HTML
2
star
49

elf-infection

Source code for https://palaiologos.rocks/essays/posts/elf-infection/
Assembly
2
star
50

Gisa

Gisa is unique programming language with pipeline compiling to Brainfuck
C
2
star
51

codegolf-submissions

My submissions on https://codegolf.stackexchange.com
mupad
2
star
52

sublime-v2

sublime: an asm2bf execution bot written in typescript
Perl
2
star
53

Proton

Proton is toolkit for desktop app creation in ActionScript3
AutoIt
2
star
54

euler

Project Euler solved in Brainfuck.
Brainfuck
2
star
55

kspalaiologos

Github description, I guess.
2
star
56

malware

Malware created for DOS/Win9x/WinNT/Office solely for educational purposes :)
Assembly
2
star
57

minits

a crude testing platform used by malbolgelisp ca. August of 2021, preserved for historical reasons.
TypeScript
1
star
58

Chess.fl

Chess.fl is simple Flash library for managing chessboards.
ActionScript
1
star
59

TinyURL-Shortener

TinyURL frontend script written in Bash. Requires cURL to be installed on target machine.
Shell
1
star
60

terranuke

Simple Terraria server exploit targeting servers that don't close connection after receiving garbage data and process it afterwards.
AutoIt
1
star
61

ILoCore-Release

Binaries of ILoCore Minecraft 1.14 plugin.
1
star
62

dotfiles

A stripped down version of my dotfiles hopefully suitable for use by others.
Shell
1
star
63

uAnnouncer

uAnnouncer is tiny AutoMessage inspired plugin supporting all stable minecraft releases
1
star
64

namechanger

simple bash script to replace one pattern with another for each of your github repositories
Shell
1
star
65

Chemal

The chemical balancer
ActionScript
1
star
66

code-guessing-backend

a (yet incomplete), fully automated code guessing automation platform.
Shell
1
star
67

sdlreversi

A reversi/othello game under 500 lines of C++ code.
C++
1
star
68

sdlmine

A SDL2 minesweeper in 1000 lines of code.
C
1
star
69

MANIAC-2

Replica of the first computer to beat human in a chess-like game.
C
1
star
70

ctlsh

A C port of the Trend Micro Locality Sensitive Hashing library.
C
1
star
71

rust-jni-template

A template for developing JNI libraries using the Rust programming language
Java
1
star
72

ppmdj1

A Github mirror of Dymitry Shkarin's PPMd var. J
C++
1
star