• This repository has been archived on 09/Jan/2023
  • Stars
    star
    109
  • Rank 319,077 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 10 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python module for conveniently loading/saving ROOT files as pandas DataFrames

⚠️root_pandas is deprecated and unmaintained⚠️

root_pandas is built upon root_numpy which has not been actively maintained in several years. This is mostly due to the emergence of new alternatives which are both faster and more flexible.

root_pandas: conveniently loading/saving ROOT files as pandas DataFrames

PyPI DOI Build Status Coverage Status

root_pandas is a convenience package built around the root_numpy library. It allows you to easily load and store pandas DataFrames using the columnar ROOT data format used in high energy physics.

It's modeled closely after the existing pandas API for reading and writing HDF5 files. This means that in many cases, it is possible to substitute the use of HDF5 with ROOT and vice versa.

On top of that, root_pandas offers several features that go beyond what pandas offers with read_hdf and to_hdf.

These include

  • Specifying multiple input filenames, in which case they are read as if they were one continuous file.
  • Selecting several columns at once using * globbing and {A,B} shell patterns.
  • Flattening source files containing arrays by storing one array element each in the DataFrame, duplicating any scalar variables.

Python versions supported:

Reading ROOT files

This is how you can read the contents of a ROOT file into a DataFrame:

from root_pandas import read_root

df = read_root('myfile.root')

If there are several ROOT trees in the input file, you have to specify the tree key:

df = read_root('myfile.root', 'mykey')

You can also directly read multiple ROOT files at once by passing a list of file names:

df = read_root(['file1.root', 'file2.root'], 'mykey')

In this case, each file must have the same set of columns under the given key.

Specific columns can be selected like this:

df = read_root('myfile.root', columns=['variable1', 'variable2'])

You can also use * in the column names to read in any matching branch:

df = read_root('myfile.root', columns=['variable*'])

In addition, you can use shell brace patterns as in

df = read_root('myfile.root', columns=['variable{1,2}'])

You can also use * and {a,b} simultaneously, and several times per string.

If you want to transform your variables using a ROOT selection string, you have to put a noexpand: prefix in front of the column name that you want to use the selection string in:

df = read_root('myfile.root', columns=['noexpand:sqrt(variable1)']

Working with stored arrays can be a bit inconventient in pandas. root_pandas makes it easy to flatten your input data, providing you with a DataFrame containing only scalars:

df = read_root('myfile.root', columns=['arrayvariable', 'othervariable'], flatten=['arrayvariable'])

Assuming the ROOT file contains the array [1, 2, 3] in the first arrayvariable column, flattening will expand this into three entries, where each contains one of the array elements. All other scalar entries are duplicated. The automatically created __array_index column also allows you to get the index that each array element had in its array before flattening.

There is also support for working with files that don't fit into memory: If the chunksize parameter is specified, read_root returns an iterator that yields DataFrames, each containing up to chunksize rows.

for df in read_root('bigfile.root', chunksize=100000):
    # process df here

If bigfile.root doesn't contain an index, the default indices of the individual DataFrame chunks will still increase continuously, as if they were parts of a single large DataFrame.

You can also combine any of the above options at the same time.

Reading in chunks also supports progress bars

from progressbar import ProgressBar
pbar = ProgressBar()
for df in pbar(read_root('bigfile.root', chunksize=100000)):
    # process df here

# or
from tqdm import tqdm
for df in tqdm(read_root('bigfile.root', chunksize=100000), unit='chunks'):
    # process df here

Writing ROOT files

root_pandas patches the pandas DataFrame to have a to_root method that allows you to save it into a ROOT file:

df.to_root('out.root', key='mytree')

You can also call the to_root function and specify the DataFrame as the first argument:

to_root(df, 'out.root', key='mytree')

By default, to_root erases the existing contents of the file. Use mode='a' to append:

for df in read_root('bigfile.root', chunksize=100000):
    df.to_root('out.root', mode='a')

Warning: When using this feature to stream data from one ROOT file into another, you shouldn't forget to os.remove the output file first, otherwise you will append more and more data to it on each run of your program.

The DataFrame index

When reading a ROOT file, root_pandas will automatically add a pandas index to the DataFrame, which starts at 1 and counts up for each entry. When writing the DataFrame to a ROOT file, it stores the DataFrame index in a __index__ branch. Currently, only single-dimensional indices are supported.

More Repositories

1

awkward

Manipulate JSON-like data with NumPy-like idioms.
Python
832
star
2

uproot3

ROOT I/O in pure Python and NumPy.
Python
315
star
3

iminuit

Jupyter-friendly Python interface for C++ MINUIT2
Python
280
star
4

pyhf

pure-Python HistFactory implementation with tensors and autodiff
Python
251
star
5

uproot5

ROOT I/O in pure Python and NumPy.
Python
234
star
6

awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.
Python
215
star
7

mplhep

Extended histogram plotting on top of matplotlib and HEP collaboration compatible styling
Python
188
star
8

scikit-hep

Metapackage of Scikit-HEP project data analysis packages for Particle Physics.
Python
163
star
9

particle

Package to deal with particles, the PDG particle data table, PDGIDs, etc.
Python
149
star
10

boost-histogram

Python bindings for the C++14 Boost::Histogram library
Jupyter Notebook
143
star
11

root_numpy

The interface between ROOT and NumPy
Python
131
star
12

hist

Histogramming for analysis powered by boost-histogram
Python
127
star
13

histbook

Versatile, high-performance histogram toolkit for Numpy.
Jupyter Notebook
108
star
14

vector

Vector classes and utilities
Python
79
star
15

resample

Randomization-based inference in Python
Python
73
star
16

uproot-browser

A TUI viewer for ROOT files
Python
69
star
17

hepstats

Statistics tools and utilities.
Python
66
star
18

probfit

Cost function builder. For fitting distributions.
Jupyter Notebook
50
star
19

pylhe

Lightweight Python interface to read Les Houches Event (LHE) files
Python
39
star
20

decaylanguage

Package to parse decay files, describe and convert particle decays between digital representations.
Jupyter Notebook
38
star
21

vegascope

View Vega/Vega-Lite plots in your web browser from local or remote Python processes.
Python
36
star
22

numpythia

The interface between PYTHIA and NumPy
Cython
36
star
23

pyjet

The interface between FastJet and NumPy
C++
33
star
24

histoprint

Pretty print histograms to the console
Python
32
star
25

ragged

Manipulating ragged arrays in an Array API compliant way.
Python
29
star
26

cabinetry

design and steer profile likelihood fits
Python
25
star
27

fastjet

Jet-finding in the Scikit-HEP ecosystem.
Python
21
star
28

uproot3-methods

Pythonic behaviors for non-I/O related ROOT classes.
Python
21
star
29

hepunits

Units and constants in the HEP system of units
Python
21
star
30

pyhepmc

Easy-to-use Python bindings for HepMC3
Python
20
star
31

aghast

Aghast: aggregated, histogram-like statistics, sharable as Flatbuffers.
Python
17
star
32

scikit-hep-testdata

A common package to provide example files (e.g., ROOT) for testing and developing packages against.
C
13
star
33

formulate

Easy conversions between different styles of expressions
Python
12
star
34

scikit-hep.github.io

Pages defining the website of the Scikit-HEP project.
HTML
11
star
35

pyBumpHunter

Python implementation of the BumpHunter algorithm used by HEP community.
Jupyter Notebook
11
star
36

hepconvert

Python
11
star
37

uhi

Universal Histogram Interface
Python
9
star
38

scikit-hep-tutorials

Ecosystem tutorials, demos, examples
Jupyter Notebook
8
star
39

azure-wheel-helpers

Please use cibuildwheel instead!
Shell
8
star
40

NNDrone

Collection of tools and algorithms to enable conversion of HEP ML to mass usage model
Python
6
star
41

cuda-histogram

Histogramming tools on CUDA.
Python
6
star
42

scikit-hep-orgstats

Stats gathering tools for SciKit-HEP PyPI releases
Jupyter Notebook
3
star
43

manylinuxgcc

ManyLinux1 with modern GCC
Dockerfile
2
star
44

scikit-hep.github.io-source

Old sources for the Scikit-HEP org website pages.
Python
1
star