• Stars
    star
    164
  • Rank 230,032 (Top 5 %)
  • Language
    JavaScript
  • License
    Apache License 2.0
  • Created about 5 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Understanding the theory behind UMAP

Understanding UMAP

Dimensionality reduction is a powerful tool for machine learning practitioners to visualize and understand large, high dimensional datasets. One of the most widely used techniques for visualization is t-SNE, but its performance suffers with large datasets and using it correctly can be challenging.

UMAP is a new technique by McInnes et al. that offers a number of advantages over t-SNE, most notably increased speed and better preservation of the data's global structure. In this article, we'll take a look at the theory behind UMAP in order to better understand how the algorithm works, how to use it effectively, and how its performance compares with t-SNE.

yarn
yarn dev

Publishing to github pages

yarn pub

To develop figures individually

yarn dev:cech
yarn dev:hyperparameters
yarn dev:mammoth-umap
yarn dev:mammoth-tsne
yarn dev:supplement
yarn dev:toy
yarn dev:toy_comparison

Data preprocessing

For the mammoth figures, the raw 3D data was downsampled to 50,000 points before being projected with UMAP / t-SNE. These 50,000 points were then randomly subsampled to 10,000 points in order to minimize the payload size.

Understanding UMAP uses a few tricks to make the data payloads for some of the interactive figures small enough to download in a reasonable time. The mammoth figures use a 10-bit encoding scheme to compress the 10,000 data points into a significantly smaller payload. The hyperparameters and toy_comparison figures precompute UMAP embeddings for all of their different combinations, then use the same 10-bit encoding scheme to compress the data.

yarn preprocess:hyperparameters
yarn preprocess:mammoth
yarn preprocess:toy_comparison

More Repositories

1

facets

Visualizations for machine learning datasets
Jupyter Notebook
7,345
star
2

lit

The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
TypeScript
3,482
star
3

saliency

Framework-agnostic implementation for state-of-the-art saliency methods (XRAI, BlurIG, SmoothGrad, and more).
Jupyter Notebook
951
star
4

what-if-tool

Source code/webpage/demos for the What-If Tool
HTML
909
star
5

umap-js

JavaScript implementation of UMAP
JavaScript
375
star
6

llm-comparator

LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.
JavaScript
286
star
7

knowyourdata

A tool to help researchers and product teams understand datasets with the goal of improving data quality, and mitigating fairness and bias issues.
CSS
281
star
8

wordcraft

✨✍️ Wordcraft is an AI-powered text editor with an emphasis on short story writing
TypeScript
239
star
9

datacardsplaybook

The Data Cards Playbook helps dataset producers and publishers adopt a people-centered approach to transparency in dataset documentation.
TypeScript
170
star
10

scatter-gl

Interactive 3D / 2D webgl-accelerated scatter plot point renderer
TypeScript
168
star
11

federated-learning

Federated learning experiment using TensorFlow.js
TypeScript
160
star
12

interpretability

PAIR.withgoogle.com and friend's work on interpretability methods
JavaScript
147
star
13

ai-explorables

https://pair.withgoogle.com/explorables/
Jupyter Notebook
59
star
14

cococo

𝄡 Collaborative Convolutional Counterpoint
TypeScript
46
star
15

cam-scroller

Cam Scroller is an open-source Chrome extension that uses your webcam and deeplearn.js to enable scrolling through webpages using custom gestures that you define.
JavaScript
33
star
16

font-explorer

Font latent space explorer using tensorflow.js
Vue
32
star
17

clinical-vis

A javascript medical record visualization (https://arxiv.org/abs/1810.05798)
HTML
26
star
18

megaplot

TypeScript
19
star
19

depth-maps-art-and-illusions

TypeScript
18
star
20

pair-code.github.io

HTML
18
star
21

farsight

In situ interactive widgets for responsible AI 🌱
TypeScript
17
star
22

tiny-transformers

Jupyter Notebook
16
star
23

recommendation-rudders

TypeScript
13
star
24

covid19_symptom_dataset

JavaScript
12
star
25

thehardway

Supplementary code repository to accompany Tic-Tac-Toe the Hard Way podcast
JavaScript
11
star
26

jax-recommenders

Python
9
star
27

autonotes

AutoNotes is an experimental prototype for AI-powered notetaking, with features including hierarchical tagging, "chat with your notes," and highlights.
TypeScript
8
star
28

book-viz

Visualizing multilevel structure in books with sentence embeddings.
Jupyter Notebook
6
star
29

model-alignment

Model Alignment is a python library from the PAIR team that enable users to create model prompts through user feedback instead of manual prompt writing and editing. The technique makes use of constitutional principles to align prompts to users' desired values.
Python
6
star
30

waterfall-of-meaning

TypeScript
5
star
31

deliberate-lab

Platform for running online research experiments on human + LLM group dynamics.
TypeScript
4
star
32

deeplearnjs-legacy-loader

Deprecated: Legacy TensorFlow model loader for deeplearn.js
Python
3
star
33

colormap

JavaScript
3
star
34

adversarial-nibbler-vis

An interactive visualization interface for exploring and analyzing the Adversarial Nibbler dataset
TypeScript
3
star
35

auto-histograms

Python
2
star
36

ml-vis-experiments

Jupyter Notebook
1
star
37

deeplearnjs-docs

TypeScript
1
star