• Stars
    star
    123
  • Rank 288,304 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated 8 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast & furious GroupBy operations for dask.array

GitHub Workflow CI Status pre-commit.ci status image Documentation Status

PyPI Conda-forge

NASA-80NSSC18M0156 NASA-80NSSC22K0345

flox

This project explores strategies for fast GroupBy reductions with dask.array. It used to be called dask_groupby It was motivated by

  1. Dask Dataframe GroupBy blogpost
  2. numpy_groupies in Xarray issue

(See a presentation about this package, from the Pangeo Showcase).

Acknowledgements

This work was funded in part by

  1. NASA-ACCESS 80NSSC18M0156 "Community tools for analysis of NASA Earth Observing System Data in the Cloud" (PI J. Hamman, NCAR),
  2. NASA-OSTFL 80NSSC22K0345 "Enhancing analysis of NASA data with the open-source Python Xarray Library" (PIs Scott Henderson, University of Washington; Deepak Cherian, NCAR; Jessica Scheick, University of New Hampshire), and
  3. NCAR's Earth System Data Science Initiative.

It was motivated by very very many discussions in the Pangeo community.

API

There are two main functions

  1. flox.groupby_reduce(dask_array, by_dask_array, "mean") "pure" dask array interface
  2. flox.xarray.xarray_reduce(xarray_object, by_dataarray, "mean") "pure" xarray interface; though work is ongoing to integrate this package in xarray.

Implementation

See the documentation for details on the implementation.

Custom reductions

flox implements all common reductions provided by numpy_groupies in aggregations.py. It also allows you to specify a custom Aggregation (again inspired by dask.dataframe), though this might not be fully functional at the moment. See aggregations.py for examples.

mean = Aggregation(
    # name used for dask tasks
    name="mean",
    # operation to use for pure-numpy inputs
    numpy="mean",
    # blockwise reduction
    chunk=("sum", "count"),
    # combine intermediate results: sum the sums, sum the counts
    combine=("sum", "sum"),
    # generate final result as sum / count
    finalize=lambda sum_, count: sum_ / count,
    # Used when "reindexing" at combine-time
    fill_value=0,
    # Used when any member of `expected_groups` is not found
    final_fill_value=np.nan,
)

More Repositories

1

xskillscore

Metrics for verifying forecasts
Python
222
star
2

xarray-tutorial

Xarray Tutorials
Jupyter Notebook
172
star
3

datatree

WIP implementation of a tree-like hierarchical data structure for xarray.
Python
169
star
4

xbatcher

Batch generation from xarray datasets
Python
163
star
5

xarray_leaflet

An xarray extension for tiled map plotting.
Python
161
star
6

cf-xarray

an accessor for xarray objects that interprets CF attributes
Python
155
star
7

xpublish

Publish Xarray Datasets via a REST API.
Python
116
star
8

pint-xarray

Interface for using pint with xarray, providing convenience accessors
Python
101
star
9

xeofs

Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis
Python
98
star
10

xvec

Vector data cubes for Xarray
Python
93
star
11

xarray-simlab

Xarray extension and framework for computer model simulations
Python
73
star
12

cupy-xarray

Interface for using cupy in xarray, providing convenience accessors.
Python
65
star
13

xwrf

A lightweight interface for working with the Weather Research and Forecasting (WRF) model output in Xarray.
Python
58
star
14

xarray-regrid

Regridding utility for xarray
Python
58
star
15

xoak

xarray extension that provides tree-based indexes used for selecting irregular, n-dimensional data.
Python
57
star
16

xdggs

Xarray extension for DGGS
Python
54
star
17

xarray-schema

Schema validation for Xarray objects
Python
39
star
18

sphinx-autosummary-accessors

sphinx extension to document pandas and xarray accessors
Python
13
star
19

xarray.dev

The Xarray landing page
JavaScript
12
star
20

cubed-xarray

Interface for using cubed with xarray
Python
11
star
21

issue-from-pytest-log

create issues from pytest-reportlog files
Python
10
star
22

xarray-contrib

Central repository for xarray-contrib organization
9
star
23

xncml

Tools for manipulating NcML (NetCDF Markup Language) files with/for xarray
Python
7
star
24

ci-trigger

A github action to detect trigger keywords in the summary line of commit messages
Shell
3
star
25

xwrf-data

Data repository for xwrf documentation, tutorials, testing
Python
2
star
26

xarray-array-testing

testing framework for testing duck array compatibility with xarray
Python
1
star