• Stars
    star
    104
  • Rank 330,604 (Top 7 %)
  • Language
    CSS
  • License
    Other
  • Created about 12 years ago
  • Updated over 10 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

HDF5 Tutorial

HDF5 is for Lovers

Bio

Anthony Scopatz is a computational nuclear engineer / physicist post-doctoral scholar at the FLASH Center at the University of Chicago. His initial workshop teaching experience came from instructing bootcamps for The Hacker Within - a peer-led teaching organization at the University of Wisconsin. Out of this grew a collaboration teaching Software Carpentry bootcamps in partnership with Greg Wilson. During his tenure at Enthought, Inc, Anthony taught many week long courses (approx. 1 per month) on scientific computing in Python.

Track

This tutorial was conceived as an advanced track tutorial. However, it could be recast as an introductory one, if the program committee desires.

Description

HDF5 is a hierarchical, binary database format that has become a de facto standard for scientific computing. While the specification may be used in a relatively simple way (persistence of static arrays) it also supports several high-level features that prove invaluable. These include chunking, ragged data, extensible data, parallel I/O, compression, complex selection, and in-core calculations. Moreover, HDF5 bindings exist for almost every language - including two Python libraries (PyTables and h5py).

This tutorial will discuss tools, strategies, and hacks for really squeezing every ounce of performance out of HDF5 in new or existing projects. It will also go over fundamental limitations in the specification and provide creative and subtle strategies for getting around them. Overall, this tutorial will show how HDF5 plays nicely with all parts of an application making the code and data both faster and smaller. With such powerful features at the developer's disposal, what is not to love?!

This tutorial is targeted at a more advanced audience which has a prior knowledge of Python and NumPy. Knowledge of C or C++ and basic HDF5 is recommended but not required.

Outline

  • Meaning in layout (20 min)

    • Tips for choosing your hierarchy
  • Advanced datatypes (20 min)

    • Tables
    • Nested types
    • Tricks with malloc() and byte-counting
  • Exercise on above topics (20 min)

  • Chunking (20 min)

    • How it works
    • How to properly select your chunksize
  • Queries and Selections (20 min)

    • In-core vs Out-of-core calculations
    • PyTables.where()
    • Datasets vs Dataspaces
  • Exercise on above topics (20 min)

  • The Starving CPU Problem (1 hr)

    • Why you should always use compression
    • Compression algorithms available
    • Choosing the correct one
    • Exercise
  • Integration with other databases (1 hr)

    • Migrating to/from SQL
    • HDF5 in other databases (JSON example)
    • Other Databases in HDF5 (JSON example)
    • Exercise

Packages Required

This tutorial will require Python 2.7, IPython 0.12+, NumPy 1.5+, and PyTables 2.3+. ViTables and MatPlotLib are also recommended. These may all be found in Linux package managers. They are also available through EPD or easy_install. ViTables may need to be installed independently.

More Repositories

1

nanorc

Improved Nano Syntax Highlighting Files
Shell
2,852
star
2

xo

exofrills text editor
Python
95
star
3

pyembree

Python Wrapper for Embree
Python
76
star
4

w3g

Access Warcraft 3 replay files from Python 2 or 3
Python
41
star
5

xontrib-kitty

Xonsh hooks for the Kitty terminal emulator
Python
20
star
6

hiddencode

A sphinx extension for hiding and showing code blocks in reST
Python
16
star
7

umdone-explanation

Explains how umdone works.
5
star
8

pymoab

Python Bindings to MOAB
Python
5
star
9

pyjsoncpp

Python Bindings to JsonCpp
C++
5
star
10

py-sci-curtin

Notebooks for Tutorial at Curtin Instituite
Jupyter Notebook
3
star
11

PurdueSCBC2012

Python
3
star
12

nf-project-inequality

A Discussion on NumFOCUS Project Diversity
Jupyter Notebook
3
star
13

pybackup

Remote backup tool written in Python 3
Python
2
star
14

nitarray

N-sized 'bit' arrays
Python
2
star
15

scipy2012

Possible SciPy 2012 Talks
Python
2
star
16

fc-deploy-opt

Nuclear Fuel Cycle Deployment Optimization
Jupyter Notebook
2
star
17

dissertation

Oh dear god, why?!
Python
2
star
18

umdone

Removes umms from audio files.
Python
2
star
19

w3ml

Warcraft 3 Machine Learning
Python
1
star
20

inequality-scipy2019

Slides for Inequality Talk at SciPy 2019
HTML
1
star
21

physor2012

A paper and data
Python
1
star
22

psam11

PSAM11 Conference Submission
1
star
23

multigroup_paper

An Essential Physics Multigroup Reactor Method
Python
1
star
24

global2011

Global 2011 Confernce
Shell
1
star
25

sensitivity_paper

CT Sensitivity Paper
Shell
1
star
26

src-highlite

Mirror of https://www.gnu.org/software/src-highlite/
HTML
1
star
27

aims-scicomp

AIMS Scientific Computing Course
Python
1
star
28

cyclus-core

1
star
29

cv

My Curriculum Vitae. (Yes, it deserves revision control!)
TeX
1
star
30

ans2015-decay

ANS 2015 Transaction - Decay
TeX
1
star
31

NukeStarToolSuite

Cluster management tools for heterogeneous multi-computers.
Shell
1
star
32

fief

Python
1
star
33

awesome

Configuration and theme files for the awesome window manager.
Lua
1
star
34

python-bullet

Cython-based bindings for Bullet Physics via the XDress library.
C++
1
star
35

ans-annual2012

My roundtable talk
Shell
1
star
36

char

Cross-sections Have Awesome Rates
Python
1
star
37

FuelLearning

Data generation for used fuel machine learning
Python
1
star