• Stars
    star
    378
  • Rank 113,272 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Interactive 2D scatter plot widget for Jupyter Lab and Notebook. Scales to millions of points!

jupyter-scatter

pypi version build status API docs notebook examples tutorial

An interactive scatter plot widget for Jupyter Notebook, Lab, and Google Colab
that can handle millions of points and supports view linking.


Demo

Why? Imagine trying to explore an embedding space of millions of data points. Besides plotting the space as a 2D scatter, the exploration typically involves three things: First, we want to interactively adjust the view (e.g., via panning & zooming) and the visual point encoding (e.g., the point color, opacity, or size). Second, we want to be able to select/highlight points. And third, we want to compare multiple embeddings (e.g., via animation, color, or point connections). The goal of jupyter-scatter is to support all three requirements and scale to millions of points.

How? Internally, jupyter-scatter uses regl-scatterplot for rendering and ipywidgets for linking the scatter plot to the iPython kernel.

Index

  1. Install
  2. Get Started
  3. API docs
  4. Examples
  5. Development

Install

pip install jupyter-scatter

If you are using JupyterLab <=2:

jupyter labextension install @jupyter-widgets/jupyterlab-manager jupyter-scatter

For a minimal working example, take a look at test-environments.

Get Started

To play with the following examples yourself, open notebooks/get-started.ipynb.

Note

Also check out our full-blown tutorial that we first presented at the SciPy '23 conference.

Simplest Example

In the simplest case, you can pass the x/y coordinates to the plot function as follows:

import jscatter
import numpy as np

x = np.random.rand(500)
y = np.random.rand(500)

jscatter.plot(x, y)

Simplest scatter plotexample

Pandas Example

Say your data is stored in a Pandas dataframe like the following:

import pandas as pd

# Just some random float and int values
data = np.random.rand(500, 4)
df = pd.DataFrame(data, columns=['mass', 'speed', 'pval', 'group'])
# We'll convert the `group` column to strings to ensure it's recognized as
# categorical data. This will come in handy in the advanced example.
df['group'] = df['group'].map(lambda c: chr(65 + round(c)), na_action=None)
x y value group
0 0.13 0.27 0.51 G
1 0.87 0.93 0.80 B
2 0.10 0.25 0.25 F
3 0.03 0.90 0.01 G
4 0.19 0.78 0.65 D

You can then visualize this data by referencing column names:

jscatter.plot(data=df, x='mass', y='speed')
Show the resulting scatter plot Pandas scatter plot example

Advanced example

Often you want to customize the visual encoding, such as the point color, size, and opacity.

jscatter.plot(
  data=df,
  x='mass',
  y='speed',
  size=8, # static encoding
  color_by='group', # data-driven encoding
  opacity_by='density', # view-driven encoding
)

Advanced scatter plot example

In the above example, we chose a static point size of 8. In contrast, the point color is data-driven and assigned based on the categorical group value. The point opacity is view-driven and defined dynamically by the number of points currently visible in the view.

Also notice how jscatter uses an appropriate color map by default based on the data type used for color encoding. In this examples, jscatter uses the color blindness safe color map from Okabe and Ito as the data type is categorical and the number of categories is less than 9.

Important: in order for jscatter to recognize categorical data, the dtype of the corresponding column needs to be category!

You can, of course, customize the color map and many other parameters of the visual encoding as shown next.

Functional API Example

The flat API can get overwhelming when you want to customize a lot of properties. Therefore, jscatter provides a functional API that groups properties by type and exposes them via meaningfully-named methods.

scatter = jscatter.Scatter(data=df, x='mass', y='speed')
scatter.selection(df.query('mass < 0.5').index)
scatter.color(by='mass', map='plasma', order='reverse')
scatter.opacity(by='density')
scatter.size(by='pval', map=[2, 4, 6, 8, 10])
scatter.height(480)
scatter.background('black')
scatter.show()

Functional API scatter plot example

When you update properties dynamically, i.e., after having called scatter.show(), the plot will update automatically. For instance, try calling scatter.xy('speed', 'mass')and you will see how the points are mirrored along the diagonal.

Moreover, all arguments are optional. If you specify arguments, the methods will act as setters and change the properties. If you call a method without any arguments it will act as a getter and return the property (or properties). For example, scatter.selection() will return the currently selected points.

Finally, the scatter plot is interactive and supports two-way communication. Hence, if you select some point with the lasso tool and then call scatter.selection() you will get the current selection.

Linking Scatter Plots

To explore multiple scatter plots and have their view, selection, and hover interactions link, use jscatter.link().

jscatter.link([
  jscatter.Scatter(data=embeddings, x='pcaX', y='pcaY', **config),
  jscatter.Scatter(data=embeddings, x='tsneX', y='tsneY', **config),
  jscatter.Scatter(data=embeddings, x='umapX', y='umapY', **config),
  jscatter.Scatter(data=embeddings, x='caeX', y='caeY', **config)
], rows=2)
linked-scatters-480.mp4

See notebooks/linking.ipynb for more details.

Visualize Millions of Data Points

With jupyter-scatter you can easily visualize and interactively explore datasets with millions of points.

In the following we're visualizing 5 million points generated with the Rössler attractor.

points = np.asarray(roesslerAttractor(5000000))
jscatter.plot(points[:,0], points[:,1], height=640)
5M-roessler-attractor-480.mp4

See notebooks/examples.ipynb for more details.

Google Colab

While jscatter is primarily developed for Jupyter Lab and Notebook, it also runs just fine in Google Colab. See jupyter-scatter-colab-test.ipynb for an example.


Development

Setting up a development environment

Requirements:

Installation:

git clone https://github.com/flekschas/jupyter-scatter/ jscatter && cd jscatter
conda env create -f environment.yml && conda activate jscatter
pip install -e ".[test]"

After Changing Python code: simply restart the kernel.

After Changing JavaScript code: do cd js && npm run build. Alternatively you can run npm run watch and rebundle the code on the fly.

Setting up a test environment

Go to test-environment and follow the detailed instructions

More Repositories

1

svelte-simple-modal

A simple, small, and content-agnostic modal for Svelte v3 and v4
Svelte
422
star
2

piling.js

A general framework and library for exploring thousands of small multiples
JavaScript
225
star
3

regl-scatterplot

Scalable WebGL-based scatter plot library build with Regl
JavaScript
192
star
4

simple-world-map

A simple SVG world map with ISO 3166-1 annotations
71
star
5

regl-line

Flat 2D and 3D line rending with Regl for WebGL
TypeScript
51
star
6

owl2neo4j

Convert OWL to labeled property graph and import into Neo4J
Java
45
star
7

sbb

Semantic Body Browser - a tool for graphically exploring an organism's body.
JavaScript
35
star
8

jupyter-scatter-tutorial

Jupyter Scatter Tutorial (that was first presented at SciPy '23)
Jupyter Notebook
20
star
9

d3-list-graph

D3 layout for a graph composed of adjacent lists of nodes
JavaScript
17
star
10

higlass-scalable-insets

Scalable Insets for HiGlass: a new technique for interactively exploring and navigating large numbers of annotated patterns in multiscale visual spaces such as gigapixel images, matrices, or maps.
Jupyter Notebook
16
star
11

hipiler

Visual exploration of large genome interaction matrices with interactive small multiples.
JavaScript
13
star
12

pub-sub

A tiny 0.8 KB pub-sub event library that supports cross-window messaging and async event broadcasting
TypeScript
12
star
13

enhancer-gene-vis

A tool for visualizing ABC enhancer-gene connections in the context of genetic variants.
TypeScript
12
star
14

utils

A very opinionated set of small handy utility functions
JavaScript
8
star
15

piling.js-react

Template for using piling.js in a React app
JavaScript
6
star
16

treemap

D3-driven AngularJS treemap app.
JavaScript
5
star
17

higlass-fancy

A collection of fancy HiGlass view configs
3
star
18

line-seg-intersect

Fast testing whether two line segments intersect
JavaScript
3
star
19

svelte-transitions-fade-scale

A custom transition function to fade and scale in at the same time.
JavaScript
3
star
20

spinner

CSS 3 animated spinner
CSS
2
star
21

peax-experiment

Perceived pattern similarity comparison user study with Peax
Jupyter Notebook
2
star
22

higlass-image

A collection of tracks for viewing image data in HiGlass
JavaScript
2
star
23

image-tiles-to-sqlite

Convert a directory of image tiles into a SQLite database
Python
2
star
24

higlass-jupyter

HiGlass Jupyter Notebook Extension
Jupyter Notebook
1
star
25

fetch-geojson-snippets

Small script that fetches and saved snippets of GeoJSON annotations
Python
1
star
26

graph-map

D3-based treemap visualization for graph-like polyhierarchical data.
JavaScript
1
star
27

apache-arrow-typescript

Apache Arrow TypeScript Test
TypeScript
1
star
28

with-raf

Request animation frame throttling
JavaScript
1
star
29

project-sbb

Project page for the Semantic Body Browser
HTML
1
star
30

flowtype

Rewrite of FlowType in ES6 which doesn't require jQuery
JavaScript
1
star
31

higlass-geojson

GeoJSON Track for HiGlass
JavaScript
1
star
32

peax-avocado

Avocado encoder model for Peax
Python
1
star
33

hipiler-server

[OUTDATED: Use HiGlass Server instead] The HiPiler Server
Python
1
star