• Stars
    star
    315
  • Rank 132,951 (Top 3 %)
  • Language
    Rust
  • License
    BSD 3-Clause "New...
  • Created about 3 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Serverside scaling for Vega and Altair visualizations


VegaFusion provides serverside acceleration for the Vega visualization grammar. While not limited to Python, an initial application of VegaFusion is the acceleration of the Altair Python interface to Vega-Lite.

The core VegaFusion algorithms are implemented in Rust. Python integration is provided using PyO3 and JavaScript integration is provided using wasm-bindgen.

Binder

Documentation

See the documentation at https://vegafusion.io

Project Status

VegaFusion is a young project, but it is already fairly well tested and used in production at Hex. The integration test suite includes image comparisons with over 600 specifications from the Vega, Vega-Lite, and Altair galleries.

Quickstart 1: Overcome MaxRowsError with VegaFusion

The VegaFusion mime renderer can be used to overcome the Altair MaxRowsError by performing data-intensive aggregations on the server and pruning unused columns from the source dataset. First install the vegafusion Python package with the embed extras enabled

pip install "vegafusion[embed]"

Then open a Jupyter notebook (either the classic notebook or a notebook inside JupyterLab), and create an Altair histogram of a 1 million row flights dataset

import pandas as pd
import altair as alt

flights = pd.read_parquet(
    "https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet"
)

delay_hist = alt.Chart(flights).mark_bar().encode(
    alt.X("delay", bin=alt.Bin(maxbins=30)),
    alt.Y("count()")
)
delay_hist
---------------------------------------------------------------------------
MaxRowsError                              Traceback (most recent call last)
...
MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (5000). For information on how to plot larger datasets in Altair, see the documentation

This results in an Altair MaxRowsError, as by default Altair is configured to allow no more than 5,000 rows of data to be sent to the browser. This is a safety measure to avoid crashing the user's browser. The VegaFusion mime renderer can be used to overcome this limitation by performing data intensive transforms (e.g. filtering, binning, aggregation, etc.) in the Python kernel before the resulting data is sent to the web browser.

Run these two lines to import and enable the VegaFusion mime renderer

import vegafusion as vf
vf.enable()

Now the chart displays quickly without errors

delay_hist

Flight Delay Histogram

Quickstart 2: Extract transformed data

By default, data transforms in an Altair chart (e.g. filtering, binning, aggregation, etc.) are performed by the Vega JavaScript library running in the browser. This has the advantage of making the charts produced by Altair fully standalone, not requiring access to a running Python kernel to render properly. But it has the disadvantage of making it difficult to access the transformed data (e.g. the histogram bin edges and count values) from Python. Since VegaFusion evaluates these transforms in the Python kernel, it's possible to access then from Python using the vegafusion.transformed_data() function.

For example, the following code demonstrates how to access the histogram bin edges and counts for the example above:

import pandas as pd
import altair as alt
import vegafusion as vf

flights = pd.read_parquet(
    "https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet"
)

delay_hist = alt.Chart(flights).mark_bar().encode(
    alt.X("delay", bin=alt.Bin(maxbins=30)),
    alt.Y("count()")
)
vf.transformed_data(delay_hist)
bin_maxbins_30_delay bin_maxbins_30_delay_end __count
0 -20 0 419400
1 80 100 11000
2 0 20 392700
3 40 60 38400
4 60 80 21800
5 20 40 92700
6 100 120 5300
7 -40 -20 9900
8 120 140 3300
9 140 160 2000
10 160 180 1800
11 320 340 100
12 180 200 900
13 240 260 100
14 -60 -40 100
15 260 280 100
16 200 220 300
17 360 380 100

Quickstart 3: Accelerate interactive charts

While the VegaFusion mime renderer works great for non-interactive Altair charts, it's not as well suited for interactive charts visualizing large datasets. This is because the mime renderer does not maintain a live connection between the browser and the python kernel, so all the data that participates in an interaction must be sent to the browser.

To address this situation, VegaFusion provides a Jupyter Widget based renderer that does maintain a live connection between the chart in the browser and the Python kernel. In this configuration, selection operations (e.g. filtering to the extents of a brush selection) can be evaluated interactively in the Python kernel, which eliminates the need to transfer the full dataset to the client in order to maintain interactivity.

The VegaFusion widget renderer is provided by the vegafusion-jupyter package.

pip install "vegafusion-jupyter[embed]"

Instead of enabling the mime render with vf.enable(), the widget renderer is enabled with vf.enable_widget(). Here is a full example that uses the widget renderer to display an interactive Altair chart that implements linked histogram brushing for a 1 million row flights dataset.

import pandas as pd
import altair as alt
import vegafusion as vf

vf.enable_widget()

flights = pd.read_parquet(
    "https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet"
)

brush = alt.selection(type='interval', encodings=['x'])

# Define the base chart, with the common parts of the
# background and highlights
base = alt.Chart().mark_bar().encode(
    x=alt.X(alt.repeat('column'), type='quantitative', bin=alt.Bin(maxbins=20)),
    y='count()'
).properties(
    width=160,
    height=130
)

# gray background with selection
background = base.encode(
    color=alt.value('#ddd')
).add_selection(brush)

# blue highlights on the selected data
highlight = base.transform_filter(brush)

# layer the two charts & repeat
chart = alt.layer(
    background,
    highlight,
    data=flights
).transform_calculate(
    "time",
    "hours(datum.date)"
).repeat(column=["distance", "delay", "time"])
chart
flights_brush_histogram.mov

Histogram binning, aggregation, and selection filtering are now evaluated in the Python kernel process with efficient parallelization, and only the aggregated data (one row per histogram bar) is sent to the browser.

You can see that the VegaFusion widget renderer maintains a live connection to the Python kernel by noticing that the Python kernel is running as the selection region is created or moved. You can also notice the VegaFusion logo in the dropdown menu button.

Motivation for VegaFusion

Vega makes it possible to create declarative JSON specifications for rich interactive visualizations that are fully self-contained. They can run entirely in a web browser without requiring access to an external database or a Python kernel.

For datasets of a few thousand rows or fewer, this architecture results in extremely smooth and responsive interactivity. However, this architecture does not scale very well to datasets of hundreds of thousands of rows or more. This is the problem that VegaFusion aims to solve.

DataFusion integration

Apache Arrow DataFusion is an SQL compatible query engine that integrates with the Rust implementation of Apache Arrow. VegaFusion uses DataFusion to implement many of the Vega transforms, and it compiles the Vega expression language directly into the DataFusion expression language. In addition to being quite fast, a particularly powerful characteristic of DataFusion is that it provides many interfaces that can be extended with custom Rust logic. For example, VegaFusion defines many custom UDFs that are designed to implement the precise semantics of the Vega expression language and the Vega expression functions.

License

As of version 1.0, VegaFusion is licensed under the BSD-3 license. This is the same license used by Vega, Vega-Lite, and Altair.

Prior versions were released under the AGPLv3 license.

About the Name

There are two meanings behind the name "VegaFusion"

  • It's a reference to the Apache Arrow DataFusion library which is used to implement many of the supported Vega transforms
  • Vega and Altair are named after stars, and stars are powered by nuclear fusion

Building VegaFusion

If you're interested in building VegaFusion from source, see BUILD.md

Roadmap

Supporting serverside acceleration for Altair in Jupyter was chosen as the first application of VegaFusion, but there are a lot of exciting ways that VegaFusion can be extended in the future. For more information, see the Roadmap.

More Repositories

1

vega

A visualization grammar.
JavaScript
10,564
star
2

altair

Declarative statistical visualization library for Python
Python
9,105
star
3

vega-lite

A concise grammar of interactive graphics, built on Vega.
TypeScript
4,260
star
4

ts-json-schema-generator

Generate JSON schema from your Typescript sources
TypeScript
1,440
star
5

voyager

Visualization Tool for Data Exploration
TypeScript
1,405
star
6

lyra

An interactive, graphical Visualization Design Environment (VDE)
TypeScript
1,042
star
7

falcon

Brushing and linking for big data
Jupyter Notebook
943
star
8

datalib

JavaScript data utility library.
JavaScript
727
star
9

ipyvega

IPython/Jupyter notebook module for Vega and Vega-Lite
Jupyter Notebook
372
star
10

polestar

Lightweight Tableau-style interface for visual analysis, built on Vega-lite.
JavaScript
370
star
11

react-vega

Convert Vega spec into React class conveniently
TypeScript
363
star
12

vega-embed

Publish Vega visualizations as embedded web components with interactive parameters.
TypeScript
352
star
13

compassql

CompassQL Query Language for visualization recommendation.
TypeScript
258
star
14

vega-datasets

Common repository for example datasets used by Vega-related projects
Python
255
star
15

vega-lite-api

A JavaScript API for Vega-Lite.
JavaScript
198
star
16

editor

Editor/IDE for Vega and Vega-Lite
TypeScript
134
star
17

vega-themes

Themes for stylized Vega and Vega-Lite visualizations.
TypeScript
104
star
18

vl-convert

Utilities for converting Vega-Lite specs from the command line and Python
Rust
96
star
19

vega-desktop

App for viewing visualizations created in Vega or Vega-lite
JavaScript
87
star
20

vega-tooltip

Tooltip Plugin for Vega-Lite
TypeScript
77
star
21

vega.github.io

The Vega landing page.
HTML
70
star
22

svelte-vega

Svelte component for Vega and Vega-Lite
Svelte
69
star
23

react-vega-lite

react + vega-lite
JavaScript
68
star
24

compass

Visualization Recommendation Engine, powered by Vega-Lite Specification Language
JavaScript
56
star
25

vega-loader-arrow

Data loader for the Apache Arrow format.
JavaScript
51
star
26

scalable-vega

A demo of scaling Vega to millions of records
TypeScript
45
star
27

vega-webgl-renderer

WebGL renderer for Vega.
JavaScript
41
star
28

vega-dataflow

Reactive dataflow processing.
JavaScript
39
star
29

vega-editor

[Deprecated] Please use https://github.com/vega/editor! (Link to deployed old editor: http://vega.github.io/vega-editor)
JavaScript
39
star
30

dataflow-api

JavaScript API for dataflow processing.
JavaScript
38
star
31

vega-plus

Make Vega charts of large datasets
TypeScript
36
star
32

vega-scenegraph

Vega scenegraph and renderers.
JavaScript
34
star
33

voyager2

Deprecated version of Voyager 2 (in Angular), please use https://github.com/vega/voyager.
JavaScript
30
star
34

schema

JSON schema for Vega and Vega-Lite
27
star
35

vega-lite-tutorials

Compilation of Vega-Lite & Altair Tutorials
Jupyter Notebook
24
star
36

vega-expression

Vega expression parser and code generator.
JavaScript
24
star
37

vega-lib

Include Vega in projects using minimal dependencies.
HTML
21
star
38

dataflow-vis

Experimental Vega Dataflow Visualization
JavaScript
20
star
39

vega-view

View component for Vega visualizations.
JavaScript
20
star
40

vega-label

Labeling algorithm for Vega.
JavaScript
19
star
41

datalib-sketch

Probabilistic data structures for large or streaming data sets.
JavaScript
19
star
42

vega-render-service

A service to render Vega visualizations
TypeScript
18
star
43

editor-backend

Backend for the Vega Editor
TypeScript
18
star
44

vega-bundler

Compile optimized Vega and Vega-Lite bundles.
JavaScript
17
star
45

altair_ally

Altair Ally is a companion package to Altair, which provides a few shortcuts to create common plots for exploratory data analysis.
Python
16
star
46

vega-lite-ui

Common UI Library that powers Polestar and Voyager
JavaScript
13
star
47

vega-tutorials

Interactive tutorials for learning Vega.
JavaScript
13
star
48

vega-renderer-webgl

WebGL Renderer extension for Vega
JavaScript
13
star
49

vega-parser

Parse Vega specifications to runtime dataflows.
JavaScript
13
star
50

vega-vscode

Vega Language Plug-in for Visual Studio Code
TypeScript
12
star
51

vega-statistics

Statistical routines and probability distributions.
JavaScript
12
star
52

voyager-server

TypeScript
11
star
53

roadmap

The Vega and Vega-Lite Roadmap
9
star
54

vega-geo

Geographic data transforms for Vega dataflows.
JavaScript
8
star
55

vega-typings

Typings for Vega
TypeScript
8
star
56

vega-lite-transforms2sql

Convert extracted Vega-Lite transforms to SQL for scalable visualizations
TypeScript
6
star
57

vega-runtime

Runtime support for Vega dataflows.
JavaScript
6
star
58

vega-webgpu

WebGPU Renderer Extension for Vega
TypeScript
6
star
59

vega-crossfilter

Indexed cross-filtering for Vega dataflows.
JavaScript
6
star
60

vega-lite-to-api

Convert Vega-Lite JSON spec to Vega-Lite JS API
TypeScript
6
star
61

vega-hierarchy

Hierarchical layout transforms for Vega dataflows.
JavaScript
5
star
62

vega-logging

Vega logging utilities.
JavaScript
5
star
63

vega-util

JavaScript utilities for Vega.
JavaScript
4
star
64

vue-vega

Vue component for Vega and Vega-Lite
TypeScript
4
star
65

voyager-electron

JavaScript
4
star
66

vl-convert-service

Vercel service wrapping vl-convert for use by the Vega editor
Python
4
star
67

vega-wordcloud

Wordcloud layout algorithm for Vega dataflows.
JavaScript
3
star
68

vega-dataflow-examples

Example applications driven by Vega dataflows.
JavaScript
3
star
69

vega-loader

Network request and file loading utilities.
JavaScript
3
star
70

voyager-docs

3
star
71

vega-projection

Projections for cartographic mapping.
JavaScript
3
star
72

vega-event-selector

A CSS-inspired language to select, sequence, and compose DOM events into event streams.
JavaScript
3
star
73

SciPy2024-Altair-Tutorial

Materials for Vega-Altair tutorial at SciPy 2024
Jupyter Notebook
3
star
74

vega-lite-v1

Copy of Vega-Lite 1.x website
TypeScript
2
star
75

vega-lite-params-proposal

2
star
76

vega-transforms

Data processing transforms for Vega dataflows.
JavaScript
2
star
77

vega-lite-dev-config

Version-controlled build config for easy re-use and sharing
TypeScript
2
star
78

vega-force

Force simulation transform for Vega dataflows.
JavaScript
2
star
79

vega-benchmarks

Scripts to benchmark Vega's performance
HTML
2
star
80

vega-lite-v4

Copy of the Vega-Lite 4 Website
JavaScript
2
star
81

ts-api

Typescript to API generator
TypeScript
2
star
82

vega-lite-shorthand

Shorthand Syntax for Vega-Lite
2
star
83

logos

Vega and Vega-Lite Logos
2
star
84

vega-embed-v2

Vega-Embed for Vega 2 and Vega-Lite 1
JavaScript
1
star
85

vega-gist

Client-side library for managing Vega GitHub gists.
1
star
86

vega-voronoi

Voronoi diagram transform for Vega dataflows.
JavaScript
1
star
87

vega-scale

Scales and color schemes for visual encoding.
JavaScript
1
star
88

vega-lite-v3

Copy of the Vega-Lite 3 Website
TypeScript
1
star
89

vega-view-transforms

View-specific transforms for Vega dataflows.
JavaScript
1
star
90

vega-canvas

Canvas and Image object instantiation utilities.
JavaScript
1
star
91

vega-encode

Visual encoding transforms for Vega dataflows.
JavaScript
1
star