• Stars
    star
    957
  • Rank 45,950 (Top 1.0 %)
  • Language
    C
  • License
    Other
  • Created almost 13 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A blocking, shuffling and loss-less compression library that can be faster than `memcpy()`.

Blosc: A blocking, shuffling and lossless compression library

Author Contact URL
Blosc Development Team [email protected] https://www.blosc.org
Gitter GH Actions NumFOCUS Code of Conduct
Gitter CI CMake Powered by NumFOCUS Contributor Covenant

What is it?

Blosc is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor (that I'm aware of) that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations.

It uses the blocking technique so as to reduce activity in the memory bus as much as possible. In short, this technique works by dividing datasets in blocks that are small enough to fit in caches of modern processors and perform compression / decompression there. It also leverages, if available, SIMD instructions (SSE2, AVX2) and multi-threading capabilities of CPUs, in order to accelerate the compression / decompression process to a maximum.

See some benchmarks about Blosc performance.

Blosc is distributed using the BSD license, see LICENSE.txt for details.

Meta-compression and other differences over existing compressors

C-Blosc is not like other compressors: it should rather be called a meta-compressor. This is so because it can use different compressors and filters (programs that generally improve compression ratio). At any rate, it can also be called a compressor because it happens that it already comes with several compressor and filters, so it can actually work like a regular codec.

Currently C-Blosc comes with support of BloscLZ, a compressor heavily based on FastLZ (https://ariya.github.io/FastLZ/), LZ4 and LZ4HC (http://www.lz4.org/), Snappy (https://google.github.io/snappy/), Zlib (https://zlib.net/) and Zstandard (https://facebook.github.io/zstd/).

C-Blosc also comes with highly optimized (they can use SSE2 or AVX2 instructions, if available) shuffle and bitshuffle filters (for info on how and why shuffling works see here). However, additional compressors or filters may be added in the future.

Blosc is in charge of coordinating the different compressor and filters so that they can leverage the blocking technique as well as multi-threaded execution (if several cores are available) automatically. That makes that every codec and filter will work at very high speeds, even if it was not initially designed for doing blocking or multi-threading.

Finally, C-Blosc is specially suited to deal with binary data because it can take advantage of the type size meta-information for improved compression ratio by using the integrated shuffle and bitshuffle filters.

When taken together, all these features set Blosc apart from other compression libraries.

Compiling the Blosc library

Blosc can be built, tested and installed using CMake_. The following procedure describes the "out of source" build.

  $ cd c-blosc
  $ mkdir build
  $ cd build

Now run CMake configuration and optionally specify the installation directory (e.g. '/usr' or '/usr/local'):

  $ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory ..

CMake allows to configure Blosc in many different ways, like preferring internal or external sources for compressors or enabling/disabling them. Please note that configuration can also be performed using UI tools provided by CMake (ccmake or cmake-gui):

  $ ccmake ..      # run a curses-based interface
  $ cmake-gui ..   # run a graphical interface

Build, test and install Blosc:

  $ cmake --build .
  $ ctest
  $ cmake --build . --target install

The static and dynamic version of the Blosc library, together with header files, will be installed into the specified CMAKE_INSTALL_PREFIX.

Codec support with CMake

C-Blosc comes with full sources for LZ4, LZ4HC, Snappy, Zlib and Zstd and in general, you should not worry about not having (or CMake not finding) the libraries in your system because by default the included sources will be automatically compiled and included in the C-Blosc library. This effectively means that you can be confident in having a complete support for all the codecs in all the Blosc deployments (unless you are explicitly excluding support for some of them).

But in case you want to force Blosc to use external codec libraries instead of the included sources, you can do that:

  $ cmake -DPREFER_EXTERNAL_ZSTD=ON ..

You can also disable support for some compression libraries:

  $ cmake -DDEACTIVATE_SNAPPY=ON ..  # in case you don't have a C++ compiler

Examples

In the examples/ directory you can find hints on how to use Blosc inside your app.

Supported platforms

Blosc is meant to support all platforms where a C89 compliant C compiler can be found. The ones that are mostly tested are Intel (Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM Blue Gene Q embedded "A2" processor are reported to work too.

Mac OSX troubleshooting

If you run into compilation troubles when using Mac OSX, please make sure that you have installed the command line developer tools. You can always install them with:

  $ xcode-select --install

Wrapper for Python

Blosc has an official wrapper for Python. See:

https://github.com/Blosc/python-blosc

Command line interface and serialization format for Blosc

Blosc can be used from command line by using Bloscpack. See:

https://github.com/Blosc/bloscpack

Filter for HDF5

For those who want to use Blosc as a filter in the HDF5 library, there is a sample implementation in the hdf5-blosc project in:

https://github.com/Blosc/hdf5-blosc

Mailing list

There is an official mailing list for Blosc at:

[email protected] https://groups.google.com/g/blosc

Acknowledgments

See THANKS.rst.


Enjoy data!

More Repositories

1

bcolz

A columnar data container that can be compressed.
C
955
star
2

c-blosc2

A fast, compressed, persistent binary data store library for C.
C
400
star
3

python-blosc

A Python wrapper for the extremely fast Blosc compression library
Python
342
star
4

bloscpack

Command line interface to and serialization format for Blosc
Python
120
star
5

python-blosc2

Jupyter Notebook
63
star
6

hdf5-blosc

Filter for HDF5 that uses Blosc
C
42
star
7

python-caterva

Python wrapper for Caterva. Still preliminary.
Python
20
star
8

movielens-bench

Datafiles for the MovieLens for benchmarking purposes
Jupyter Notebook
11
star
9

JBlosc

Java interface for Blosc library
HTML
5
star
10

JBlosc2

Java interface for Blosc2 library
Java
5
star
11

Blosc2-Btune

BTUNE plugin for Blosc2. Automatically choose the best codec/filter for your data.
C
4
star
12

b2h5py

Transparent optimized reading of n-dimensional Blosc2 slices for h5py
Python
4
star
13

pycblosc

A low level Python interface to the C-Blosc library
Python
3
star
14

blosc2-htj2k

Playground for Blosc2 and HTJ2K
C++
3
star
15

BTune

Optimize Blosc2 parameters using deep/machine learning
C
3
star
16

pycblosc2

A simple Python/CFFI interface for the C-Blosc2 library
Python
3
star
17

subtree-merge-blosc

Script to automatically subtree merge a specifc version of blosc.
Shell
2
star
18

caterva-scipy21

Caterva poster for SciPy Conference 2021!
Jinja
2
star
19

python-blosc-wheels

Shell
2
star
20

blosc2_grok

Blosc2 plugin for grok
Jupyter Notebook
2
star
21

Caterva2

REST and on-demand access to local/remote Blosc2 data repositories
Python
2
star
22

python-blosc-conda-recipe

Conda recipe for python-blosc
Shell
1
star
23

bloscpack-benchmarking

Python
1
star
24

bcolz-conda-recipe

Conda recipe for bcolz
Shell
1
star
25

governance

The governance process and model for Project Blosc
1
star
26

blosc2_openhtj2k

Dynamic plugin for OpenHTJ2K
C++
1
star
27

leaps-examples

Jupyter Notebook
1
star
28

Gaia

Scripts for processing Gaia datasets
Jupyter Notebook
1
star
29

blosc-projects-best-practices

Some notes on best practices for all Blosc related projects
1
star
30

blogsite

The Blogsite for Blosc
HTML
1
star
31

python-blosc2-c

A Python wrapper for the extremely fast Blosc2 compression library http://python-blosc2.blosc.org
C
1
star