• Stars
    star
    143
  • Rank 257,007 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    BSD 3-Clause "New...
  • Created about 6 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python bindings for the C++14 Boost::Histogram library

boost-histogram logo

boost-histogram for Python

Actions Status Documentation Status Code style: black

PyPI version Conda-Forge PyPI platforms DOI

GitHub Discussion Gitter Scikit-HEP

Python bindings for Boost::Histogram (source), a C++14 library. This is one of the fastest libraries for histogramming, while still providing the power of a full histogram object. See what's new.

Other members of the boost-histogram family include:

  • Hist: The first-party analyst-friendly histogram library that extends boost-histogram with named axes, many new shortcuts including UHI+, plotting shortcuts, and more.
  • UHI: Specification for Histogram library interop, especially for plotting.
  • mplhep: Plotting extension for matplotlib with support for UHI histograms.
  • histoprint: Histogram display library for the command line with support for UHI.
  • dask-histogram: Dask support for boost-histogram.

Usage

Slideshow of features. See expandable text below if the image is not readable.

Text intro (click to expand)
import boost_histogram as bh

# Compose axis however you like; this is a 2D histogram
hist = bh.Histogram(
    bh.axis.Regular(2, 0, 1),
    bh.axis.Regular(4, 0.0, 1.0),
)

# Filling can be done with arrays, one per dimension
hist.fill(
    [0.3, 0.5, 0.2],
    [0.1, 0.4, 0.9],
)

# NumPy array view into histogram counts, no overflow bins
values = hist.values()

# Make a new histogram with just the second axis, summing over the first, and
# rebinning the second into larger bins:
h2 = hist[::sum, ::bh.rebin(2)]

We support the uhi PlottableHistogram protocol, so boost-histogram/Hist histograms can be plotted via any compatible library, such as mplhep.

Cheatsheet

Simplified list of features (click to expand)
  • Many axis types (all support metadata=...)
    • bh.axis.Regular(n, start, stop, ...): Make a regular axis. Options listed below.
      • overflow=False: Turn off overflow bin
      • underflow=False: Turn off underflow bin
      • growth=True: Turn on growing axis, bins added when out-of-range items added
      • circular=True: Turn on wrapping, so that out-of-range values wrap around into the axis
      • transform=bh.axis.transform.Log: Log spacing
      • transform=bh.axis.transform.Sqrt: Square root spacing
      • transform=bh.axis.transform.Pow(v): Power spacing
      • See also the flexible Function transform
    • bh.axis.Integer(start, stop, *, underflow=True, overflow=True, growth=False, circular=False): Special high-speed version of regular for evenly spaced bins of width 1
    • bh.axis.Variable([start, edge1, edge2, ..., stop], *, underflow=True, overflow=True, circular=False): Uneven bin spacing
    • bh.axis.IntCategory([...], *, growth=False): Integer categories
    • bh.axis.StrCategory([...], *, growth=False): String categories
    • bh.axis.Boolean(): A True/False axis
  • Axis features:
    • .index(value): The index at a point (or points) on the axis
    • .value(index): The value for a fractional bin (or bins) in the axis
    • .bin(i): The bin edges (continuous axis) or a bin value (discrete axis)
    • .centers: The N bin centers (if continuous)
    • .edges: The N+1 bin edges (if continuous)
    • .extent: The number of bins (including under/overflow)
    • .metadata: Anything a user wants to store
    • .traits: The options set on the axis
    • .size: The number of bins (not including under/overflow)
    • .widths: The N bin widths
  • Many storage types
    • bh.storage.Double(): Doubles for weighted values (default)
    • bh.storage.Int64(): 64-bit unsigned integers
    • bh.storage.Unlimited(): Starts small, but can go up to unlimited precision ints or doubles.
    • bh.storage.AtomicInt64(): Threadsafe filling, experimental. Does not support growing axis in threads.
    • bh.storage.Weight(): Stores a weight and sum of weights squared.
    • bh.storage.Mean(): Accepts a sample and computes the mean of the samples (profile).
    • bh.storage.WeightedMean(): Accepts a sample and a weight. It computes the weighted mean of the samples.
  • Accumulators
    • bh.accumulator.Sum: High accuracy sum (Neumaier) - used by the sum method when summing a numerical histogram
    • bh.accumulator.WeightedSum: Tracks a weighted sum and variance
    • bh.accumulator.Mean: Running count, mean, and variance (Welfords's incremental algorithm)
    • bh.accumulator.WeightedMean: Tracks a weighted sum, mean, and variance (West's incremental algorithm)
  • Histogram operations
    • h.ndim: The number of dimensions
    • h.size or len(h): The number of bins
    • +: Add two histograms (storages must match types currently)
    • *=: Multiply by a scaler (not all storages) (hist * scalar and scalar * hist supported too)
    • /=: Divide by a scaler (not all storages) (hist / scalar supported too)
    • .kind: Either bh.Kind.COUNT or bh.Kind.MEAN, depending on storage
    • .storage_type: Fetch the histogram storage type
    • .sum(flow=False): The total count of all bins
    • .project(ax1, ax2, ...): Project down to listed axis (numbers). Can also reorder axes.
    • .to_numpy(flow=False, view=False): Convert to a NumPy style tuple (with or without under/overflow bins)
    • .view(flow=False): Get a view on the bin contents (with or without under/overflow bins)
    • .values(flow=False): Get a view on the values (counts or means, depending on storage)
    • .variances(flow=False): Get the variances if available
    • .counts(flow=False): Get the effective counts for all storage types
    • .reset(): Set counters to 0 (growing axis remain the same size)
    • .empty(flow=False): Check to see if the histogram is empty (can check flow bins too if asked)
    • .copy(deep=False): Make a copy of a histogram
    • .axes: Get the axes as a tuple-like (all properties of axes are available too)
      • .axes[0]: Get the 0th axis
      • .axes.edges: The lower values as a broadcasting-ready array
      • .axes.centers: The centers of the bins broadcasting-ready array
      • .axes.widths: The bin widths as a broadcasting-ready array
      • .axes.metadata: A tuple of the axes metadata
      • .axes.size: A tuple of the axes sizes (size without flow)
      • .axes.extent: A tuple of the axes extents (size with flow)
      • .axes.bin(*args): Returns the bin edges as a tuple of pairs (continuous axis) or values (describe)
      • .axes.index(*args): Returns the bin index at a value for each axis
      • .axes.value(*args): Returns the bin value at an index for each axis
  • Indexing - Supports UHI Indexing
    • Bin content access / setting
      • v = h[b]: Access bin content by index number
      • v = h[{0:b}]: All actions can be represented by axis:item dictionary instead of by position (mostly useful for slicing)
    • Slicing to get histogram or set array of values
      • h2 = h[a:b]: Access a slice of a histogram, cut portions go to flow bins if present
      • h2 = h[:, ...]: Using : and ... supported just like NumPy
      • h2 = h[::sum]: Third item in slice is the "action"
      • h[...] = array: Set the bin contents, either include or omit flow bins
    • Special accessors
      • bh.loc(v): Supply value in axis coordinates instead of bin number
      • bh.underflow: The underflow bin (use empty beginning on slice for slicing instead)
      • bh.overflow: The overflow bin (use empty end on slice for slicing instead)
    • Special actions (third item in slice)
      • sum: Remove axes via projection; if limits are given, use those
      • bh.rebin(n): Rebin an axis
  • NumPy compatibility
    • bh.numpy provides faster drop in replacements for NumPy histogram functions
    • Histograms follow the buffer interface, and provide .view()
    • Histograms can be converted to NumPy style output tuple with .to_numpy()
  • Details
    • All objects support copy/deepcopy/pickle
    • Fully statically typed, tested with MyPy.

Installation

You can install this library from PyPI with pip:

python3 -m pip install boost-histogram

All the normal best-practices for Python apply; Pip should not be very old (Pip 9 is very old), you should be in a virtual environment, etc. Python 3.7+ is required; for older versions of Python (3.5 and 2.7), 0.13 will be installed instead, which is API equivalent to 1.0, but will not be gaining new features. 1.3.x was the last series to support Python 3.6.

Binaries available:

The easiest way to get boost-histogram is to use a binary wheel, which happens when you run the above command on a supported platform. Wheels are produced using cibuildwheel; all common platforms have wheels provided in boost-histogram:

System Arch Python versions PyPy versions
ManyLinux2014 64-bit 3.7, 3.8, 3.9, 3.10, 3.11 3.7, 3.8, 3.9
ManyLinux2014 ARM64 3.7, 3.8, 3.9, 3.10, 3.11 3.7, 3.8, 3.9
MuslLinux_1_1 64-bit 3.7, 3.8, 3.9, 3.10, 3.11
macOS 10.9+ 64-bit 3.7, 3.8, 3.9, 3.10, 3.11 3.7, 3.8, 3.9
macOS Universal2 Arm64 3.8, 3.9, 3.10, 3.11
Windows 32 & 64-bit 3.7, 3.8, 3.9, 3.10, 3.11
Windows 64-bit 3.7, 3.8, 3.9
  • manylinux2014: Requires pip 19.3.
  • ARM on Linux is supported. PowerPC or IBM-Z available on request.
  • macOS Universal2 wheels for Apple Silicon and Intel provided for Python 3.8+ (requires Pip 21.0.1 or newer).

If you are on a Linux system that is not part of the "many" in manylinux or musl in musllinux, such as ClearLinux, building from source is usually fine, since the compilers on those systems are often quite new. It will just take longer to install when it is using the sdist instead of a wheel. All dependencies are header-only and included.

Conda-Forge

The boost-histogram package is available on conda-forge, as well. All supported variants are available.

conda install -c conda-forge boost-histogram

Source builds

For a source build, for example from an "SDist" package, the only requirements are a C++14 compatible compiler. The compiler requirements are dictated by Boost.Histogram's C++ requirements: gcc >= 5.5, clang >= 3.8, or msvc >= 14.1. You should have a version of pip less than 2-3 years old (10+).

Boost is not required or needed (this only depends on included header-only dependencies). You can install directly from GitHub if you would like.

python -m pip install git+https://github.com/scikit-hep/boost-histogram.git@develop

Developing

See CONTRIBUTING.md for details on how to set up a development environment.

Contributors

We would like to acknowledge the contributors that made this project possible (emoji key):


Henry Schreiner

🚧 πŸ’» πŸ“–

Hans Dembinski

🚧 πŸ’»

N!no

⚠️ πŸ“–

Jim Pivarski

πŸ€”

Nicholas Smith

πŸ›

physicscitizen

πŸ›

Chanchal Kumar Maji

πŸ“–

Doug Davis

πŸ›

Pierre Grimaud

πŸ“–

Beojan Stanislaus

πŸ›

Popinaodude

πŸ›

Congqiao Li

πŸ›

alexander-held

πŸ›

Chris Burr

πŸ“–

Konstantin Gizdov

πŸ“¦ πŸ›

Kyle Cranmer

πŸ“–

Aman Goel

πŸ“– πŸ’»

Jay Gohil

πŸ“–

This project follows the all-contributors specification.

Talks and other documentation/tutorial sources

The official documentation is here, and includes a quickstart.


Acknowledgements

This library was primarily developed by Henry Schreiner and Hans Dembinski.

Support for this work was provided by the National Science Foundation cooperative agreement OAC-1836650 (IRIS-HEP) and OAC-1450377 (DIANA/HEP). Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

More Repositories

1

awkward

Manipulate JSON-like data with NumPy-like idioms.
Python
832
star
2

uproot3

ROOT I/O in pure Python and NumPy.
Python
315
star
3

iminuit

Jupyter-friendly Python interface for C++ MINUIT2
Python
280
star
4

pyhf

pure-Python HistFactory implementation with tensors and autodiff
Python
251
star
5

uproot5

ROOT I/O in pure Python and NumPy.
Python
234
star
6

awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.
Python
215
star
7

mplhep

Extended histogram plotting on top of matplotlib and HEP collaboration compatible styling
Python
188
star
8

scikit-hep

Metapackage of Scikit-HEP project data analysis packages for Particle Physics.
Python
163
star
9

particle

Package to deal with particles, the PDG particle data table, PDGIDs, etc.
Python
149
star
10

root_numpy

The interface between ROOT and NumPy
Python
131
star
11

hist

Histogramming for analysis powered by boost-histogram
Python
127
star
12

root_pandas

A Python module for conveniently loading/saving ROOT files as pandas DataFrames
Python
109
star
13

histbook

Versatile, high-performance histogram toolkit for Numpy.
Jupyter Notebook
108
star
14

vector

Vector classes and utilities
Python
79
star
15

resample

Randomization-based inference in Python
Python
73
star
16

uproot-browser

A TUI viewer for ROOT files
Python
69
star
17

hepstats

Statistics tools and utilities.
Python
66
star
18

probfit

Cost function builder. For fitting distributions.
Jupyter Notebook
50
star
19

pylhe

Lightweight Python interface to read Les Houches Event (LHE) files
Python
39
star
20

decaylanguage

Package to parse decay files, describe and convert particle decays between digital representations.
Jupyter Notebook
38
star
21

vegascope

View Vega/Vega-Lite plots in your web browser from local or remote Python processes.
Python
36
star
22

numpythia

The interface between PYTHIA and NumPy
Cython
36
star
23

pyjet

The interface between FastJet and NumPy
C++
33
star
24

histoprint

Pretty print histograms to the console
Python
32
star
25

ragged

Manipulating ragged arrays in an Array API compliant way.
Python
29
star
26

cabinetry

design and steer profile likelihood fits
Python
25
star
27

fastjet

Jet-finding in the Scikit-HEP ecosystem.
Python
21
star
28

uproot3-methods

Pythonic behaviors for non-I/O related ROOT classes.
Python
21
star
29

hepunits

Units and constants in the HEP system of units
Python
21
star
30

pyhepmc

Easy-to-use Python bindings for HepMC3
Python
20
star
31

aghast

Aghast: aggregated, histogram-like statistics, sharable as Flatbuffers.
Python
17
star
32

scikit-hep-testdata

A common package to provide example files (e.g., ROOT) for testing and developing packages against.
C
13
star
33

formulate

Easy conversions between different styles of expressions
Python
12
star
34

scikit-hep.github.io

Pages defining the website of the Scikit-HEP project.
HTML
11
star
35

pyBumpHunter

Python implementation of the BumpHunter algorithm used by HEP community.
Jupyter Notebook
11
star
36

hepconvert

Python
11
star
37

uhi

Universal Histogram Interface
Python
9
star
38

scikit-hep-tutorials

Ecosystem tutorials, demos, examples
Jupyter Notebook
8
star
39

azure-wheel-helpers

Please use cibuildwheel instead!
Shell
8
star
40

NNDrone

Collection of tools and algorithms to enable conversion of HEP ML to mass usage model
Python
6
star
41

cuda-histogram

Histogramming tools on CUDA.
Python
6
star
42

scikit-hep-orgstats

Stats gathering tools for SciKit-HEP PyPI releases
Jupyter Notebook
3
star
43

manylinuxgcc

ManyLinux1 with modern GCC
Dockerfile
2
star
44

scikit-hep.github.io-source

Old sources for the Scikit-HEP org website pages.
Python
1
star