• Stars
    star
    1,035
  • Rank 44,518 (Top 0.9 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 12 years ago
  • Updated 14 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Snappy compressor/decompressor for Java

snappy-java Build Status Maven Central Javadoc

snappy-java is a Java port of the snappy, a fast C++ compresser/decompresser developed by Google.

Features

  • Fast compression/decompression around 200~400MB/sec.
  • Less memory usage. SnappyOutputStream uses only 32KB+ in default.
  • JNI-based implementation to achieve comparable performance to the native C++ version.
    • Although snappy-java uses JNI, it can be used safely with multiple class loaders (e.g. Tomcat, etc.).
  • Compression/decompression of Java primitive arrays (float[], double[], int[], short[], long[], etc.)
    • To improve the compression ratios of these arrays, you can use a fast data-rearrangement implementation (BitShuffle) before compression
  • Portable across various operating systems; Snappy-java contains native libraries built for Window/Mac/Linux, etc. snappy-java loads one of these libraries according to your machine environment (It looks system properties, os.name and os.arch).
  • Simple usage. Add the snappy-java-(version).jar file to your classpath. Then call compression/decompression methods in org.xerial.snappy.Snappy.
  • Framing-format support (Since 1.1.0 version)
  • OSGi support
  • Apache License Version 2.0. Free for both commercial and non-commercial use.

Performance

Download

Maven Central Javadoc

The current stable version is available from here:

Using with Maven

Snappy-java is available from Maven's central repository. Add the following dependency to your pom.xml:

<dependency>
  <groupId>org.xerial.snappy</groupId>
  <artifactId>snappy-java</artifactId>
  <version>(version)</version>
  <type>jar</type>
  <scope>compile</scope>
</dependency>

Using with sbt

libraryDependencies += "org.xerial.snappy" % "snappy-java" % "(version)"

Usage

First, import org.xerial.snapy.Snappy in your Java code:

import org.xerial.snappy.Snappy;

Then use Snappy.compress(byte[]) and Snappy.uncompress(byte[]):

String input = "Hello snappy-java! Snappy-java is a JNI-based wrapper of "
     + "Snappy, a fast compresser/decompresser.";
byte[] compressed = Snappy.compress(input.getBytes("UTF-8"));
byte[] uncompressed = Snappy.uncompress(compressed);

String result = new String(uncompressed, "UTF-8");
System.out.println(result);

In addition, high-level methods (Snappy.compress(String), Snappy.compress(float[] ..) etc. ) and low-level ones (e.g. Snappy.rawCompress(.. ), Snappy.rawUncompress(..), etc.), which minimize memory copies, can be used.

Stream-based API

Stream-based compressor/decompressor SnappyOutputStream/SnappyInputStream are also available for reading/writing large data sets. SnappyFramedOutputStream/SnappyFramedInputStream can be used for the framing format.

Compatibility Notes

The original Snappy format definition did not define a file format. It later added a "framing" format to define a file format, but by this point major software was already using an industry standard instead -- represented in this library by the SnappyOutputStream and SnappyInputStream methods.

For interoperability with other libraries, check that compatible formats are used. Note that not all libraries support all variants.

  • SnappyOutputStream and SnappyInputStream use [magic header:16 bytes]([block size:int32][compressed data:byte array])* format. You can read the result of Snappy.compress with SnappyInputStream, but you cannot read the compressed data generated by SnappyOutputStream with Snappy.uncompress.
  • SnappyHadoopCompatibleOutputStream does not emit a file header but write out the current block size as a preemble to each block

Data format compatibility matrix:

Write\Read Snappy.uncompress SnappyInputStream SnappyFramedInputStream org.apache.hadoop.io.compress.SnappyCodec
Snappy.compress ok ok x x
SnappyOutputStream x ok x x
SnappyFramedOutputStream x x ok x
SnappyHadoopCompatibleOutputStream x x x ok

BitShuffle API (Since 1.1.3-M2)

BitShuffle is an algorithm that reorders data bits (shuffle) for efficient compression (e.g., a sequence of integers, float values, etc.). To use BitShuffle routines, import org.xerial.snapy.BitShuffle:

import org.xerial.snappy.BitShuffle;

int[] data = new int[] {1, 3, 34, 43, 34};
byte[] shuffledByteArray = BitShuffle.shuffle(data);
byte[] compressed = Snappy.compress(shuffledByteArray);
byte[] uncompressed = Snappy.uncompress(compressed);
int[] result = BitShuffle.unshuffleIntArray(uncompress);

System.out.println(result);

Shuffling and unshuffling of primitive arrays (e.g., short[], long[], float[], double[], etc.) are supported. See Javadoc for the details.

Setting classpath

If you have snappy-java-(VERSION).jar in the current directory, use -classpath option as follows:

$ javac -classpath ".;snappy-java-(VERSION).jar" Sample.java  # in Windows
or
$ javac -classpath ".:snappy-java-(VERSION).jar" Sample.java  # in Mac or Linux

Public discussion group

Post bug reports or feature request to the Issue Tracker: https://github.com/xerial/snappy-java/issues

Public discussion forum is here: Xerial Public Discussion Group

For developers

snappy-java uses sbt (simple build tool for Scala) as a build tool. Here is a simple usage

$ ./sbt            # enter sbt console
> ~test            # run tests upon source code change
> ~testOnly        # run tests that matches a given name pattern  
> publishM2        # publish jar to $HOME/.m2/repository
> package          # create jar file
> findbugs         # Produce findbugs report in target/findbugs
> jacoco:cover     # Report the code coverage of tests to target/jacoco folder    

If you need to see detailed debug messages, launch sbt with -Dloglevel=debug option:

$ ./sbt -Dloglevel=debug

For the details of sbt usage, see my blog post: Building Java Projects with sbt

Building from the source code

See the build instruction. Building from the source code is an option when your OS platform and CPU architecture is not supported. To build snappy-java, you need Git, JDK (1.6 or higher), g++ compiler (mingw in Windows) etc.

$ git clone https://github.com/xerial/snappy-java.git
$ cd snappy-java
$ make

When building on Solaris, use gmake:

$ gmake

A file target/snappy-java-$(version).jar is the product additionally containing the native library built for your platform.

Creating a new release

GitHub action [https://github.com/xerial/snappy-java/blob/master/.github/workflows/release.yml] will publish a new relase to Maven Central (Sonatype) when a new tag vX.Y.Z is pushed.

Miscellaneous Notes

Using snappy-java with Tomcat 6 (or higher) Web Server

Simply put the snappy-java's jar to WEB-INF/lib folder of your web application. Usual JNI-library specific problem no longer exists since snappy-java version 1.0.3 or higher can be loaded by multiple class loaders.

Configure snappy-java using property file

Prepare org-xerial-snappy.properties file (under the root path of your library) in Java's property file format. Here is a list of the available properties:

  • org.xerial.snappy.lib.path (directory containing a snappyjava's native library)
  • org.xerial.snappy.lib.name (library file name)
  • org.xerial.snappy.tempdir (temporary directory to extract a native library bundled in snappy-java)
  • org.xerial.snappy.use.systemlib (if this value is true, use system installed libsnappyjava.so looking the path specified by java.library.path)

Snappy-java is developed by Taro L. Saito. Twitter @taroleo

More Repositories

1

sqlite-jdbc

SQLite JDBC Driver
Java
2,828
star
2

sbt-pack

A sbt plugin for creating distributable Scala packages.
Scala
496
star
3

larray

Large off-heap arrays and mmap files for Scala and Java
Scala
400
star
4

sbt-sonatype

A sbt plugin for publishing Scala/Java projects to the Maven central.
Scala
335
star
5

streamdb-readings

Readings in Stream Processing
120
star
6

silk

Simplify SQL Workflows with Scala
CSS
38
star
7

scala-cookbook

Tutorial of the Scala Programming Language
CSS
29
star
8

sbt-sql

A sbt plugin for generating useful Scala case classes from SQL files
Scala
29
star
9

presto-metrics

Presto metric collection library for Ruby
Ruby
26
star
10

xerial

Data management utilities for Scala
Scala
19
star
11

jnuma

A Java library for accessing NUMA (Non Uniform Memory Access) API
C
17
star
12

dp-readings

Readings in Differential Privacy
14
star
13

scala-min

A minimal project template to start programming with Scala
Shell
13
star
14

sbt-jcheckstyle

A sbt plugin for checking Java code styles
Shell
6
star
15

chroniker

Simplify your batch job pipelines with Scala
Scala
4
star
16

fluentd-standalone

Standalone fluentd server for Java/Scala
Shell
4
star
17

genome-weaver-align

Toolkit for genome sciences
Java
3
star
18

xerial-java

Xerial library for Java
Java
3
star
19

scalajs-selenium

Scala.js + Selenium setup example
Shell
1
star
20

msgframe

A framework for SQL-based message processing sql
Scala
1
star
21

xerial.github.com

Xerial Web Site
HTML
1
star
22

zstd-java

Zstandard (zstd) compressor/decompressor for Java
Makefile
1
star
23

scala-steward-repos

My repository list maintained with Scala Steward
1
star