• Stars
    star
    341
  • Rank 123,998 (Top 3 %)
  • Language
    JavaScript
  • License
    GNU General Publi...
  • Created almost 12 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pure JavaScript de/compression (bzip2, etc) for node.js, volo, and the browser.

compressjs

NPM

Build Status dependency status dev dependency status

compressjs contains fast pure-JavaScript implementations of various de/compression algorithms, including bzip2, Charles Bloom's LZP3, a modified LZJB, PPM-D, and an implementation of Dynamic Markov Compression. compressjs is written by C. Scott Ananian. The Range Coder used is a JavaScript port of Michael Schindler's C range coder. Bits also also borrowed from Yuta Mori's SAIS implementation; Eli Skeggs, Kevin Kwok, Rob Landley, James Taylor, and Matthew Francis for Bzip2 compression and decompression code. "Bear" wrote the original JavaScript LZJB; the version here is based on the node lzjb module.

Compression benchmarks

Here are some representative speeds and sizes for the various algorithms implemented in this package. Times are with node 0.8.22 on my laptop, but they should be valid for inter-algorithm comparisons.

test/sample5.ref

This is the Taoism article from the Simple English wikipedia, in HTML format as generated by the Wikipedia Parsoid project.

Type Level Size (bytes) Compress time (s) Decompress time (s)
bwtc 9 272997 13.10 1.85
bzip2 9 275087 22.57 1.21
lzp3 - 292978 1.73 1.74
ppm - 297220 42.05 44.04
bzip2 1 341615 22.63 1.40
bwtc 1 345764 12.34 0.80
dmc - 434182 6.97 9.00
lzjbr 9 491476 3.19 1.92
lzjbr 1 523780 2.76 2.02
lzjb 9 706210 1.02 0.30
lzjb 1 758467 0.66 0.29
context1 - 939098 5.20 4.69
fenwick - 1440645 3.06 3.72
mtf - 1441763 1.92 3.86
huffman - 1452055 7.15 6.56
simple - 1479143 0.72 2.42
defsum - 1491107 3.19 1.46
no - 2130648 0.80 0.92
- - 2130640 - -

enwik8

This test data is the first 108 bytes of the English Wikipedia XML dump on March 3, 2006. This is the data set used for the Large Text Compression Benchmark. It can be downloaded from that site.

Type Level Size (bytes) Compress time (s) Decompress time (s)
ppm - 26560169 2615.82 2279.17
bzip2 9 28995650 1068.51 66.95
bwtc 9 29403626 618.63 112.00
bzip2 1 33525893 1035.29 66.98
lzp3 - 34305420 123.69 167.77
bwtc 1 34533422 618.61 43.52
lzjbr 9 43594841 242.60 141.51
lzjbr 1 44879071 207.38 147.14
context1 - 48480225 253.48 223.30
huffman - 62702157 301.50 267.31
fenwick - 62024449 143.49 164.15
mtf - 62090746 83.62 168.03
simple - 63463479 27.79 92.84
defsum - 64197615 75.48 32.05
lzjb 9 64992459 63.75 5.90
lzjb 1 67828511 29.26 5.89
no - 100000008 26.29 31.98
- - 100000000 - -

Algorithm descriptions

  • compressjs.Bzip2 (-t bzip2) is the bzip2 algorithm we all have come to know and love. It has a block size between 100k and 900k.
  • compressjs.BWTC (-t bwtc) is substantially the same, but with a few simplifications/improvements which make it faster, smaller, and not binary-compatible. In particular, the unnecessary initial RLE step of bzip2 is omitted, and we use a range coder with an adaptive context-0 model after the MTF/RLE2 step, instead of the static huffman codes of bzip2.
  • compressjs.PPM (-t ppm) is a naive/simple implementation of the PPMD algorithm with a 256k sliding window.
  • compressjs.Lzp3 (-t lzp3) is an algorithm similar to Charles Bloom's LZP3 algorithm. It uses a 1M sliding window, a context-4 model, and a range coder.
  • compressjs.Dmc (-t dmc) is a partial implementation of Dynamic Markov Compression. Unlike most DMC implementations, our implementation is bytewise (not bitwise). There is currently no provision for shrinking the Markov model (or throwing it out when it grows too large), so be careful with large inputs! I may return to twiddle with this some more; see the source for details.
  • compressjs.Lzjb (-t lzjb) is a straight copy of the fast LZJB algorithm from https://github.com/cscott/lzjb.
  • compressjs.LzjbR (-t lzjbr) is a hacked version of LZJB which uses a range coder and a bit of modeling instead of the fixed 9-bit literal / 17-bit match format of the original.

The remaining algorithms are self-tests for various bits of compression code, not real compressors. Context1Model is a simple adaptive context-1 model using a range coder. Huffman is an adaptive Huffman coder using Vitter's algorithm. MTFModel, FenwickModel, and DefSumModel are simple adaptive context-0 models with escapes, implementing using a move-to-front list, a Fenwick tree, and Charles Bloom's deferred summation algorithm, respectively. Simple is a static context-0 model for the range coder. NoModel encodes the input bits directly; it shows the basic I/O overhead, as well as the few bytes of overhead due to the file magic and a variable-length encoding of the uncompressed size of the file.

How to install

npm install compressjs

or

volo add cscott/compressjs

This package uses Typed Arrays if available, which are present in node.js >= 0.5.5 and many modern browsers. Full browser compatibility table is available at caniuse.com; briefly: IE 10, Firefox 4, Chrome 7, or Safari 5.1.

Testing

npm install
npm test

Usage

There is a binary available in bin:

$ bin/compressjs --help
$ echo "Test me" | bin/compressjs -t lzp3 -z > test.lzp3
$ bin/compressjs -t lzp3 -d test.lzp3
Test me

The -t argument can take a number of different strings to specify the various compression algorithms available. Use --help to see the various options.

From JavaScript:

var compressjs = require('compressjs');
var algorithm = compressjs.Lzp3;
var data = new Buffer('Example data', 'utf8');
var compressed = algorithm.compressFile(data);
var decompressed = algorithm.decompressFile(compressed);
// convert from array back to string
var data2 = new Buffer(decompressed).toString('utf8');
console.log(data2);

There is a streaming interface as well. Use Uint8Array or normal JavaScript arrays when running in a browser.

See the tests in the tests/ directory for further usage examples.

Documentation

require('compressjs') returns a compressjs object. Its fields correspond to the various algorithms implemented, which export one of two different interfaces, depending on whether it is a "compression method" or a "model/coder".

Compression Methods

Compression methods (like compressjs.Lzp3) export two methods. The first is a function accepting one, two or three parameters:

cmp.compressFile = function(input, [output], [Number compressionLevel] or [props])

The input argument can be a "stream" object (which must implement the readByte method), or a Uint8Array, Buffer, or array.

If you omit the second argument, compressFile will return a JavaScript array containing the byte values of the compressed data. If you pass a second argument, it must be a "stream" object (which must implement the writeByte method).

The third argument may be omitted, or a number between 1 and 9 indicating a compression level (1 being largest/fastest compression and 9 being smallest/slowest compression). Some algorithms also permit passing an object for finer-grained control of various compression properties.

The second exported method is a function accepting one or two parameters:

cmp.decompressFile = function(input, [output])

The input parameter is as above.

If you omit the second argument, decompressFile will return a Uint8Array, Buffer or JavaScript array with the decompressed data, depending on what your platform supports. For most modern platforms (modern browsers, recent node.js releases) the returned value will be a Uint8Array.

If you provide the second argument, it must be a "stream", implementing the writeByte method.

Models and coders

The second type of object implemented is a model/coder. Huffman and RangeCoder share the same interface as the simple context-0 probability models MTFModel, FenwickModel, LogDistanceModel, and DeflateDistanceModel.

model.factory = function(parameters)

This method returns a function which can be invoked with a size argument to create a new instance of this model with the given parameters (which usually include the input/output stream or coder).

model.encode = function(symbol, [optional context])

This method encodes the given symbol, possibly with the given additional context, and then updates the model or adaptive coder if necessary. The symbol is usually in the range [0, size), although some models allow adding "extra symbols" to the possible range, which are usually given negative values. For example, you might want to create a LogDistanceModel with one extra state to encode "same distance as the last one encoded".

model.decode = function([optional context])

Decode the next symbol and updates the model or adaptive coder. The values returned are usually in the range [0, size] although negative numbers may be returned if you requested "extra symbols" when you created the model.

Related articles and projects

Other JavaScript compressors

License (GPLv2)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

More Repositories

1

wikipedia-telnet

Telnet server for wikipedia content
JavaScript
110
star
2

prfun

Helper functions for ES6 promises
JavaScript
64
star
3

TurtleScript

Programming Environment for Simplified JavaScript
JavaScript
41
star
4

node-php-embed

Bidirectional interoperability between PHP and JavaScript code within the Node.js process.
C++
35
star
5

node-pn

The Promised Node -- a package which promisifies the node standard library.
JavaScript
26
star
6

node-random-name

Generate random (or seeded) names, based on 1990 Census data.
JavaScript
21
star
7

lzjb

A fast pure JavaScript implementation of LZJB compression/decompression.
JavaScript
20
star
8

rusty-turtle

A TurtleScript interpreter written in Rust.
Rust
16
star
9

xorduino

XOrduino -- an arduino leonardo/scratch sensor board mash up
15
star
10

node-icu-bidi

Node bindings to the ICU (53) Unicode BiDi algorithm
C++
15
star
11

3d-melltorp

Enclosure for 3d printer based on IKEA Melltorp tables
OpenSCAD
11
star
12

watchy-co2-pcb

Simple PCB to mount the Sensirion SCD40 CO2 sensor next to Watchy
10
star
13

nell-colors

A drawing/handwriting-recognition activity for Project Nell.
JavaScript
9
star
14

xostick

Open Hardware design for a simple USB hardware device for the OLPC XO
Eagle
9
star
15

X1CarbonGen10

Keyboard with Trackpoint for Framework 16 laptop: hardware design
Python
9
star
16

JDoctest

JDoctest is an implementation of Python's doctest for Java.
Java
8
star
17

Harpoon

The FLEX Java compiler infrastructure
Java
8
star
18

mediawiki-extensions-togetherjs

A mediawiki extension to incorporate real time collaboration using Mozilla's TowTruck project.
JavaScript
8
star
19

intent-addon

Plugin for Firefox/Android which exposes a web api for sending Android Intents.
JavaScript
7
star
20

node-libzim

Node bindings to libzim (ZIM file read/write library from OpenZIM project)
C++
7
star
21

instaview

A Mediawiki to HTML converter in JavaScript; packaged for node and volo.
JavaScript
7
star
22

TouchpadSpacerFW16

Touchpad Input Module parts for the Framework 16 laptop
7
star
23

nelldemo

Demo of Nell OLPC XO-3 interface ideas
JavaScript
4
star
24

babybird

A slimmed down Promise implementation, built for speed.
JavaScript
4
star
25

nell-wikipedia

Nell's Wikipedia: an offline wikipedia browser for learning literacy
JavaScript
3
star
26

SDR

Square Dance Revolution: square dance caller game (and choreography tool)
Java
3
star
27

jutter

JavaScript/WebGL reimplementation of Clutter
JavaScript
3
star
28

LiveMusic

Music arrangements for square dancing
Rouge
3
star
29

android-launcher2

Clone of https://android.googlesource.com/platform/packages/apps/Launcher2
Java
3
star
30

lua-turtle

TurtleScript interpreter in lua
Lua
3
star
31

nell-hand

Handwriting Recognition for Nell
JavaScript
3
star
32

nell-paper-idc12

LaTeX sources for IDC 2012 conference paper on Project Nell.
3
star
33

npm-travis

Trigger travis tests with `npm run travis`
JavaScript
3
star
34

texvcjs

A LaTeX validator/translator for TeX strings embedded in wikitext
JavaScript
3
star
35

3d-multimaterial

Direct-drive multi-material printing using Printrbot extruders
OpenSCAD
2
star
36

meteor-hubot

Meteor wrapper for the npm `hubot` package.
JavaScript
2
star
37

asterism

Jessica and Scott Wedding Website
JavaScript
2
star
38

wonchi-2012-05-15

Data crunching scripts for 2012-05-15 Wonchi data dump
JavaScript
2
star
39

nell-balloons

Balloons for Nell -- a simple matching game for Nell/the Literacy Project
JavaScript
2
star
40

jsdoc-wmf-theme

Theme for jsdoc following the Wikimedia Style Guide
JavaScript
2
star
41

open-town-meeting

2
star
42

node-mediawiki-express

An installation of mediawiki invoked from node.js/express.
JavaScript
2
star
43

watchy-co2-case

Case for Watchy extended with a Sensirion SCD40 CO2 sensor
Python
2
star
44

ExternalEnclosureFW16

Design files for an external enclosure for Framework 16 Input Modules
2
star
45

caterpillars

C
1
star
46

mstone-essays

Copy of mstone's essays, for comments & review
1
star
47

WebIDL

WebIDL parser for PHP
PHP
1
star
48

wikimania18-forkmerge-poster

Sources for my "Edit Conflicts, Offline Contributions, and Tor" poster for Wikimania 2018 in Cape Town
TeX
1
star
49

maker-day-kit

Schematics and PCB layout for kits for Brookline Public School's 1st Maker Day
C++
1
star
50

sechs

Sechs is a stripped-down replacement for funf.
Java
1
star
51

mw-ocg-bundler.old

Mediawiki article spider tool.
JavaScript
1
star
52

pebble-flipped-bits

Flipped version of "Just a Bit" watchface for Pebble
C
1
star
53

StringMaze

2010 Mystery Hunt code
1
star
54

offline-wiki

Offline Wikipedia browser/reader, as a web app.
JavaScript
1
star
55

pippy-examples

Python programming examples, translated from C64 user's guide
1
star
56

JUtil

JUtil is a fully-parameterized (generic) collections library for Java.
Java
1
star
57

SevenSegmentInputModule

7-segment display input module for Framework 16
Python
1
star
58

OpenEVSE_Keypad_RFID

Capacitive Keypad and RFID for OpenEVSE: Hardware
C++
1
star