• Stars
    star
    261
  • Rank 156,630 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 6 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

GPU accelerated cross filtering with cuDF.

  cuxfilter

cuxfilter ( ku-cross-filter ) is a RAPIDS framework to connect web visualizations to GPU accelerated crossfiltering. Inspired by the javascript version of the original, it enables interactive and super fast multi-dimensional filtering of 100 million+ row tabular datasets via cuDF.

RAPIDS Viz

cuxfilter is one of the core projects of the “RAPIDS viz” team. Taking the axiom that “a slider is worth a thousand queries” from @lmeyerov to heart, we want to enable fast exploratory data analytics through an easier-to-use pythonic notebook interface.

As there are many fantastic visualization libraries available for the web, our general principle is not to create our own viz library, but to enhance others with faster acceleration, larger datasets, and better dev UX. Basically, we want to take the headache out of interconnecting multiple charts to a GPU backend, so you can get to visually exploring data faster.

By the way, cuxfilter is best used to interact with large (1 million+) tabular datasets. GPU’s are fast, but accessing that speedup requires some architecture overhead that isn’t worthwhile for small datasets.

For more detailed requirements, see below.

cuxfilter Architecture

The current version of cuxfilter leverages jupyter notebook and bokeh server to reduce architecture and installation complexity.

layout architecture

What is cuDataTiles?

cuxfilter implements cuDataTiles, a GPU accelerated version of data tiles based on the work of Falcon. When starting to interact with specific charts in a cuxfilter dashboard, values for the other charts are precomputed to allow for fast slider scrubbing without having to recalculate values.

Open Source Projects

cuxfilter wouldn’t be possible without using these great open source projects:

Where is the original cuxfilter and Mortgage Viz Demo?

The original version (0.2) of cuxfilter, most known for the backend powering the Mortgage Viz Demo, has been moved into the GTC-2018-mortgage-visualization branch branch. As it has a much more complicated backend and javascript API, we’ve decided to focus more on the streamlined notebook focused version here.

Usage

Example 1

Open In Studio Lab

Open In Colab

import cuxfilter

#update data_dir if you have downloaded datasets elsewhere
DATA_DIR = './data'
from cuxfilter.sampledata import datasets_check
datasets_check('auto_accidents', base_dir=DATA_DIR)

cux_df = cuxfilter.DataFrame.from_arrow(DATA_DIR+'/auto_accidents.arrow')
cux_df.data['ST_CASE'] = cux_df.data['ST_CASE'].astype('float64')

label_map = {1: 'Sunday',    2: 'Monday',    3: 'Tuesday',    4: 'Wednesday',   5: 'Thursday',    6: 'Friday',    7: 'Saturday',    9: 'Unknown'}
gtc_demo_red_blue_palette = [ "#3182bd", "#6baed6", "#7b8ed8", "#e26798", "#ff0068" , "#323232" ]

#declare charts
chart1 = cuxfilter.charts.scatter(x='dropoff_x', y='dropoff_y', aggregate_col='DAY_WEEK', aggregate_fn='mean',
                                color_palette=gtc_demo_red_blue_palette, tile_provider='CartoLight', unselected_alpha=0.2,
                                pixel_shade_type='linear')
chart2 = cuxfilter.charts.multi_select('YEAR')
chart3 = cuxfilter.charts.bar('DAY_WEEK', x_label_map=label_map)
chart4 = cuxfilter.charts.bar('MONTH')

#declare dashboard
d = cux_df.dashboard([chart1, chart3, chart4], sidebar=[chart2], layout=cuxfilter.layouts.feature_and_double_base, title='Auto Accident Dataset')

# run the dashboard as a webapp:
# Bokeh and Datashader based charts also have a `save` tool on the side toolbar, which can download and save the individual chart when interacting with the dashboard.
# d.show('jupyter-notebook/lab-url')

#run the dashboard within the notebook cell
d.app()

output dashboard

Example 2

Open In Studio Lab

Open In Colab

import cuxfilter

#update data_dir if you have downloaded datasets elsewhere
DATA_DIR = './data'
from cuxfilter.sampledata import datasets_check
datasets_check('mortgage', base_dir=DATA_DIR)

cux_df = cuxfilter.DataFrame.from_arrow(DATA_DIR + '/146M_predictions_v2.arrow')

MAPBOX_API_KEY= "<mapbox-api-key>"
geoJSONSource='https://raw.githubusercontent.com/rapidsai/cuxfilter/GTC-2018-mortgage-visualization/javascript/demos/GTC%20demo/src/data/zip3-ms-rhs-lessprops.json'

chart0 = cuxfilter.charts.choropleth( x='zip', color_column='delinquency_12_prediction', color_aggregate_fn='mean',
            elevation_column='current_actual_upb', elevation_factor=0.00001, elevation_aggregate_fn='sum',
            geoJSONSource=geoJSONSource, mapbox_api_key=MAPBOX_API_KEY, data_points=1000
)
chart2 = cuxfilter.charts.bar('delinquency_12_prediction',data_points=50)
chart3 = cuxfilter.charts.range_slider('borrower_credit_score',data_points=50)
chart1 = cuxfilter.charts.drop_down('dti')

#declare dashboard
d = cux_df.dashboard([chart0, chart2],sidebar=[chart3, chart1], layout=cuxfilter.layouts.feature_and_double_base,theme = cuxfilter.themes.light, title='Mortgage Dashboard')

# run the dashboard within the notebook cell
# Bokeh and Datashader based charts also have a `save` tool on the side toolbar, which can download and save the individual chart when interacting with the dashboard.
# d.app()

#run the dashboard as a webapp:
d.show('jupyter-notebook/lab-url')

output dashboard

Documentation

Full documentation can be found on the RAPIDS docs page.

Troubleshooting help can be found on our troubleshooting page.

General Dependencies

  • python
  • cudf
  • datashader
  • cupy
  • panel
  • bokeh
  • pyproj
  • geopandas
  • pyppeteer
  • jupyter-server-proxy

Quick Start

Please see the Demo Docker Repository, choosing a tag based on the NVIDIA CUDA version you’re running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize cuxfilter, cuDF and other RAPIDS libraries.

Installation

CUDA/GPU requirements

  • CUDA 11.2+
  • NVIDIA driver 450.80.02+
  • Pascal architecture or better (Compute Capability >=6.0)

Conda

cuxfilter can be installed with conda (miniconda, or the full Anaconda distribution) from the rapidsai channel:

For cuxfilter version == 23.08 :

# for CUDA 11.8
conda install -c rapidsai -c numba -c conda-forge -c nvidia \
    cuxfilter=23.08 python=3.10 cudatoolkit=11.8

For the nightly version of cuxfilter :

# for CUDA 11.8
conda install -c rapidsai-nightly -c numba -c conda-forge -c nvidia \
    cuxfilter python=3.10 cudatoolkit=11.8

Note: cuxfilter is supported only on Linux, and with Python versions 3.8 and later.

See the Get RAPIDS version picker for more OS and version info.

Build/Install from Source

See build instructions.

Troubleshooting

libxcomposite.so.1 not found error

If the await d.preview() throws a libxcomposite.so.1 not found error, execute the following commands:

apt-get update
apt-get install libxcomposite1 libxcursor1 libxdamage1 libxfixes3 libxi6 libxrandr2 libxtst6 libcups2 libxss1 libasound2 libpangocairo-1.0-0 libpango-1.0-0 libatk1.0-0 libgtk-3-0 libgdk-pixbuf2.0-0

bokeh server in jupyter lab

To run the bokeh server in a jupyter lab, install jupyterlab dependencies

conda install -c conda-forge jupyterlab
jupyter labextension install @pyviz/jupyterlab_pyviz
jupyter labextension install jupyterlab_bokeh

Download Datasets

  1. Auto download datasets

The notebooks inside python/notebooks already have a check function which verifies whether the example dataset is downloaded, and downloads it if it's not.

  1. Download manually

While in the directory you want the datasets to be saved, execute the following

Note: Auto Accidents dataset has corrupted coordinate data from the years 2012-2014

#go the the environment where cuxfilter is installed. Skip if in a docker container
source activate test_env

#download and extract the datasets
curl https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2015-01.csv --create-dirs -o ./nyc_taxi.csv
curl https://data.rapids.ai/viz-data/146M_predictions_v2.arrow.gz --create-dirs -o ./146M_predictions_v2.arrow.gz
curl https://data.rapids.ai/viz-data/auto_accidents.arrow.gz --create-dirs -o ./auto_accidents.arrow.gz

python -c "from cuxfilter.sampledata import datasets_check; datasets_check(base_dir='./')"

Guides and Layout Templates

Currently supported layout templates and example code can be found on the layouts page.

Currently Supported Charts

Library Chart type
bokeh bar, line
datashader scatter, scatter_geo, line, stacked_lines, heatmap, graph
panel_widgets range_slider, date_range_slider, float_slider, int_slider, drop_down, multi_select, card, number
custom view_dataframe
pydeck choropleth(3d and 2d)

Contributing Developers Guide

cuxfilter acts like a connector library and it is easy to add support for new libraries. The python/cuxfilter/charts/core directory has all the core chart classes which can be inherited and used to implement a few (viz related) functions and support dashboarding in cuxfilter directly.

You can see the examples to implement viz libraries in the bokeh and cudatashader directories. Let us know if you would like to add a chart by opening a feature request issue or submitting a PR.

For more details, check out the contributing guide.

Future Work

cuxfilter development is in early stages and on going. See what we are planning next on the projects page.

More Repositories

1

cudf

cuDF - GPU DataFrame Library
C++
8,319
star
2

cuml

cuML - RAPIDS Machine Learning Library
C++
3,864
star
3

cugraph

cuGraph - RAPIDS Graph Analytics Library
Cuda
1,668
star
4

cusignal

cuSignal - RAPIDS Signal Processing Library
Python
703
star
5

raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
Cuda
586
star
6

jupyterlab-nvdashboard

A JupyterLab extension for displaying dashboards of GPU usage.
TypeScript
582
star
7

notebooks

RAPIDS Sample Notebooks
Shell
577
star
8

cuspatial

CUDA-accelerated GIS and spatiotemporal algorithms
Jupyter Notebook
543
star
9

rmm

RAPIDS Memory Manager
C++
420
star
10

deeplearning

Jupyter Notebook
336
star
11

cucim

cuCIM - RAPIDS GPU-accelerated image processing library
Jupyter Notebook
333
star
12

dask-cuda

Utilities for Dask and CUDA interactions
Python
266
star
13

node

GPU-accelerated data science and visualization in node
TypeScript
170
star
14

clx

A collection of RAPIDS examples for security analysts, data scientists, and engineers to quickly get started applying RAPIDS and GPU acceleration to real-world cybersecurity use cases.
Jupyter Notebook
167
star
15

libgdf

[ARCHIVED] C GPU DataFrame Library
Cuda
138
star
16

dask-cudf

[ARCHIVED] Dask support for distributed GDF object --> Moved to cudf
Python
135
star
17

cloud-ml-examples

A collection of Machine Learning examples to get started with deploying RAPIDS in the Cloud
Jupyter Notebook
134
star
18

ucx-py

Python bindings for UCX
Python
118
star
19

gpu-bdb

RAPIDS GPU-BDB
Python
103
star
20

kvikio

KvikIO - High Performance File IO
Python
100
star
21

plotly-dash-rapids-census-demo

Jupyter Notebook
92
star
22

gputreeshap

C++
83
star
23

frigate

Frigate is a tool for automatically generating documentation for your Helm charts
Python
76
star
24

wholegraph

WholeGraph - large scale Graph Neural Networks
Cuda
75
star
25

spark-examples

[ARCHIVED] Moved to github.com/NVIDIA/spark-xgboost-examples
Jupyter Notebook
70
star
26

docker

Dockerfile templates for creating RAPIDS Docker Images
Shell
69
star
27

cuvs

cuVS - a library for vector search and clustering on the GPU
Jupyter Notebook
57
star
28

custrings

[ARCHIVED] GPU String Manipulation --> Moved to cudf
Cuda
46
star
29

docs

RAPIDS Documentation Site
HTML
34
star
30

cudf-alpha

[ARCHIVED] cuDF [alpha] - RAPIDS Merge of GoAi into cuDF
34
star
31

rapids-examples

Jupyter Notebook
31
star
32

nvgraph

C++
26
star
33

rapids-cmake

CMake
24
star
34

cuhornet

Cuda
24
star
35

cuDataShader

Jupyter Notebook
22
star
36

gpuci-build-environment

Common build environment used by gpuCI for building RAPIDS
Dockerfile
19
star
37

distributed-join

C++
19
star
38

devcontainers

Shell
18
star
39

dask-cuml

[ARCHIVED] Dask support for multi-GPU machine learning algorithms --> Moved to cuml
Python
16
star
40

integration

RAPIDS - combined conda package & integration tests for all of RAPIDS libraries
Shell
15
star
41

xgboost-conda

Conda recipes for xgboost
Jupyter Notebook
12
star
42

benchmark

Python
11
star
43

ucxx

C++
11
star
44

dependency-file-generator

Python
10
star
45

asvdb

Python
9
star
46

helm-chart

Shell
9
star
47

deployment

RAPIDS Deployment Documentation
Jupyter Notebook
9
star
48

miniforge-cuda

Dockerfile
9
star
49

ci-imgs

Dockerfile
7
star
50

dask-cugraph

Python
7
star
51

rapids.ai

rapids.ai web site
HTML
7
star
52

ptxcompiler

Python
6
star
53

GaaS

Python
5
star
54

rvc

Go
4
star
55

scikit-learn-nv

Python
4
star
56

ops-bot

A Probot application used by the Ops team for automation.
TypeScript
4
star
57

workflows

Shell
4
star
58

rapids-triton

C++
4
star
59

dask-build-environment

Build environments for various dask related projects on gpuCI
Dockerfile
3
star
60

roc

GitHub utilities for the RAPIDS Ops team
Go
3
star
61

multi-gpu-tools

Shell
3
star
62

detect-weak-linking

Python
3
star
63

dask-cuda-benchmarks

Python
2
star
64

shared-workflows

Reusable GitHub Actions workflows for RAPIDS CI
Shell
2
star
65

rapids_triton_pca_example

C++
2
star
66

cugunrock

Cuda
2
star
67

dgl-cugraph-build-environment

Dockerfile
2
star
68

projects

Jupyter Notebook
2
star
69

crossfit

Metric calculation library
Python
2
star
70

gpuci-mgmt

Mangement scripts for gpuCI
Shell
1
star
71

ansible-roles

1
star
72

code-share

C++
1
star
73

build-metrics-reporter

Python
1
star
74

cibuildwheel-imgs

Dockerfile
1
star
75

gpuci-tools

User tools for use within the gpuCI environment
Shell
1
star
76

pynvjitlink

Python
1
star
77

rapids-dask-dependency

Shell
1
star
78

sphinx-theme

This repository contains a Sphinx theme used for RAPIDS documentation
CSS
1
star