Skip to content

Blosc/hdf5-blosc

Repository files navigation

Blosc filter for HDF5

Travis CI:travis
And...:powered

This is a filter for HDF5 that uses the Blosc compressor; by installing this filter, you can read and write HDF5 files with Blosc-compressed datasets.

You need to be a bit careful before using this filter because you should not activate the shuffle right in HDF5, but rather from Blosc itself. This is because Blosc uses an SIMD shuffle internally which is much faster.

Installing the Blosc filter plugin

Instead of just linking this Blosc filter into your HDF5 application, it is possible to install it as a system-wide HDF5 plugin (with HDF5 1.8.11 or later). This is useful because it allows every HDF5-using program on your system to transparently read Blosc-compressed HDF5 files.

As described in the HDF5 plugin documentation, you just need to compile the Blosc plugin into a shared library and copy it to the plugin directory (which defaults to /usr/local/hdf5/lib/plugin on non-Windows systems).

Following the cmake instructions below produces a libH5Zblosc.so shared library file (or .dylib/.dll on Mac/Windows), that you can copy to the HDF5 plugin directory.

To write Blosc-compressed HDF5 files, on the other hand, an HDF5 using program must be specially modified to enable the Blosc filter when writing HDF5 datasets, as described below.

Linking the Blosc filter directly into your program

Instead of (or in addition to) installing the Blosc plugin system-wide as described above, you can also link the Blosc filter directly into your application. Although this only makes the Blosc filter available in your application (as opposed to other HDF5-using applications), it is useful in cases where installing the plugin is inconvenient. Compile the Blosc filter as described above, but link libblosc_filter.a (generated by make) directly into your program.

In order to register Blosc in your HDF5 application, you then need to call a function in blosc_filter.h, with the following signature:

int register_blosc(char **version, char **date)

Calling this will register the filter with the HDF5 library and will return info about the Blosc release in **version and **date char pointers.

A non-negative return value indicates success. If the registration fails, an error is pushed onto the current error stack and a negative value is returned.

An example C program ('src/example.c') is included which demonstrates the proper use of the filter.

This filter has been tested against HDF5 versions 1.6.5 through 1.8.10. It is released under the MIT license (see LICENSE.txt for details).

Using the Blosc filter in your application

Assuming the filter is installed (either by a system-wide plugin or registered directly in your program as described above), your application can transparently read HDF5 files with Blosc-compressed datasets. (The HDF5 library will detect that the dataset is Blosc-compressed and invoke the filter automatically).

To write an HDF5 file with a Blosc-compressed dataset, you call the H5Pset_filter function on the property list of the dataset you are creating, and pass FILTER_BLOSC (defined in blosc_filter.h) for the filter_id parameter. In addition, HDF5 only supports compression for "chunked" datasets; this just means that you need to call H5Pset_chunk to specify a chunk size (e.g. 1MB chunks), and the subsequent chunking of the dataset I/O is performed transparently by HDF5.

Compiling

The filter consists of a single 'src/blosc_filter.c' source file and 'src/blosc_filter.h' header, which will need the Blosc library installed to work. It is simplest to just use the provided cmake build scripts, which compile and both the filter and the Blosc library into a library for you

Assuming you have cmake and other standard Unix build tools installed, do:

mkdir build
cd build
cmake ..
make

This generates the library/plugin files required above in the build directory.

Acknowledgments

See THANKS.rst.


Enjoy data!