• Stars
    star
    982
  • Rank 46,621 (Top 1.0 %)
  • Language
    C
  • License
    Other
  • Created over 13 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A blocking, shuffling and loss-less compression library that can be faster than `memcpy()`.

Blosc: A blocking, shuffling and lossless compression library

Author Contact URL
Blosc Development Team [email protected] https://www.blosc.org
Gitter GH Actions NumFOCUS Code of Conduct
Gitter CI CMake Powered by NumFOCUS Contributor Covenant

What is it?

Blosc is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor (that I'm aware of) that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations.

It uses the blocking technique so as to reduce activity in the memory bus as much as possible. In short, this technique works by dividing datasets in blocks that are small enough to fit in caches of modern processors and perform compression / decompression there. It also leverages, if available, SIMD instructions (SSE2, AVX2) and multi-threading capabilities of CPUs, in order to accelerate the compression / decompression process to a maximum.

See some benchmarks about Blosc performance.

Blosc is distributed using the BSD license, see LICENSE.txt for details.

Meta-compression and other differences over existing compressors

C-Blosc is not like other compressors: it should rather be called a meta-compressor. This is so because it can use different compressors and filters (programs that generally improve compression ratio). At any rate, it can also be called a compressor because it happens that it already comes with several compressor and filters, so it can actually work like a regular codec.

Currently C-Blosc comes with support of BloscLZ, a compressor heavily based on FastLZ (https://ariya.github.io/FastLZ/), LZ4 and LZ4HC (http://www.lz4.org/), Snappy (https://google.github.io/snappy/), Zlib (https://zlib.net/) and Zstandard (https://facebook.github.io/zstd/).

C-Blosc also comes with highly optimized (they can use SSE2 or AVX2 instructions, if available) shuffle and bitshuffle filters (for info on how and why shuffling works see here). However, additional compressors or filters may be added in the future.

Blosc is in charge of coordinating the different compressor and filters so that they can leverage the blocking technique as well as multi-threaded execution (if several cores are available) automatically. That makes that every codec and filter will work at very high speeds, even if it was not initially designed for doing blocking or multi-threading.

Finally, C-Blosc is specially suited to deal with binary data because it can take advantage of the type size meta-information for improved compression ratio by using the integrated shuffle and bitshuffle filters.

When taken together, all these features set Blosc apart from other compression libraries.

Compiling the Blosc library

Blosc can be built, tested and installed using CMake_. The following procedure describes the "out of source" build.

  $ cd c-blosc
  $ mkdir build
  $ cd build

Now run CMake configuration and optionally specify the installation directory (e.g. '/usr' or '/usr/local'):

  $ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory ..

CMake allows to configure Blosc in many different ways, like preferring internal or external sources for compressors or enabling/disabling them. Please note that configuration can also be performed using UI tools provided by CMake (ccmake or cmake-gui):

  $ ccmake ..      # run a curses-based interface
  $ cmake-gui ..   # run a graphical interface

Build, test and install Blosc:

  $ cmake --build .
  $ ctest
  $ cmake --build . --target install

The static and dynamic version of the Blosc library, together with header files, will be installed into the specified CMAKE_INSTALL_PREFIX.

Codec support with CMake

C-Blosc comes with full sources for LZ4, LZ4HC, Snappy, Zlib and Zstd and in general, you should not worry about not having (or CMake not finding) the libraries in your system because by default the included sources will be automatically compiled and included in the C-Blosc library. This effectively means that you can be confident in having a complete support for all the codecs in all the Blosc deployments (unless you are explicitly excluding support for some of them).

But in case you want to force Blosc to use external codec libraries instead of the included sources, you can do that:

  $ cmake -DPREFER_EXTERNAL_ZSTD=ON ..

You can also disable support for some compression libraries:

  $ cmake -DDEACTIVATE_SNAPPY=ON ..  # in case you don't have a C++ compiler

Examples

In the examples/ directory you can find hints on how to use Blosc inside your app.

Supported platforms

Blosc is meant to support all platforms where a C89 compliant C compiler can be found. The ones that are mostly tested are Intel (Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM Blue Gene Q embedded "A2" processor are reported to work too.

Mac OSX troubleshooting

If you run into compilation troubles when using Mac OSX, please make sure that you have installed the command line developer tools. You can always install them with:

  $ xcode-select --install

Wrapper for Python

Blosc has an official wrapper for Python. See:

https://github.com/Blosc/python-blosc

Command line interface and serialization format for Blosc

Blosc can be used from command line by using Bloscpack. See:

https://github.com/Blosc/bloscpack

Filter for HDF5

For those who want to use Blosc as a filter in the HDF5 library, there is a sample implementation in the hdf5-blosc project in:

https://github.com/Blosc/hdf5-blosc

Mailing list

There is an official mailing list for Blosc at:

[email protected] https://groups.google.com/g/blosc

Acknowledgments

See THANKS.rst.


Enjoy data!

More Repositories

1

bcolz

A columnar data container that can be compressed.
C
959
star
2

c-blosc2

A fast, compressed, persistent binary data store library for C.
C
438
star
3

python-blosc

A Python wrapper for the extremely fast Blosc compression library
Python
351
star
4

bloscpack

Command line interface to and serialization format for Blosc
Python
122
star
5

python-blosc2

Jupyter Notebook
82
star
6

hdf5-blosc

Filter for HDF5 that uses Blosc
C
43
star
7

python-caterva

Python wrapper for Caterva. Still preliminary.
Python
21
star
8

movielens-bench

Datafiles for the MovieLens for benchmarking purposes
Jupyter Notebook
13
star
9

JBlosc

Java interface for Blosc library
HTML
6
star
10

JBlosc2

Java interface for Blosc2 library
Java
6
star
11

BTune

Optimize Blosc2 parameters using deep/machine learning
C
5
star
12

pycblosc2

A simple Python/CFFI interface for the C-Blosc2 library
Python
5
star
13

b2h5py

Transparent optimized reading of n-dimensional Blosc2 slices for h5py
Python
5
star
14

blosc2_grok

Blosc2 plugin for grok
Jupyter Notebook
5
star
15

pycblosc

A low level Python interface to the C-Blosc library
Python
4
star
16

Blosc2-Btune

BTUNE plugin for Blosc2. Automatically choose the best codec/filter for your data.
C
4
star
17

subtree-merge-blosc

Script to automatically subtree merge a specifc version of blosc.
Shell
3
star
18

blosc2-htj2k

Playground for Blosc2 and HTJ2K
C++
3
star
19

caterva-scipy21

Caterva poster for SciPy Conference 2021!
Jinja
3
star
20

python-blosc-wheels

Shell
3
star
21

python-blosc-conda-recipe

Conda recipe for python-blosc
Shell
2
star
22

governance

The governance process and model for Project Blosc
2
star
23

blosc2_openhtj2k

Dynamic plugin for OpenHTJ2K
C++
2
star
24

leaps-examples

Jupyter Notebook
2
star
25

Gaia

Scripts for processing Gaia datasets
Jupyter Notebook
2
star
26

blosc-projects-best-practices

Some notes on best practices for all Blosc related projects
2
star
27

blogsite

The Blogsite for Blosc
HTML
2
star
28

python-blosc2-c

A Python wrapper for the extremely fast Blosc2 compression library http://python-blosc2.blosc.org
C
2
star
29

Caterva2

REST and on-demand access to local/remote Blosc2 data repositories
Python
2
star
30

bloscpack-benchmarking

Python
1
star
31

bcolz-conda-recipe

Conda recipe for bcolz
Shell
1
star
32

caterva-scipy21-lt

Jupyter Notebook
1
star
33

community

General discussions on present and future of Blosc projects
1
star
34

exploring-milky-way

Scripts for the SciPy 2023 talk, "A Fast Explorer Of The Milky Way"
Jupyter Notebook
1
star
35

blosc2_plugin_example

Example of a Blosc2 plugin
C
1
star
36

blosc-doc

This repository will gather together all the Blosc documentation.
C
1
star