• Stars
    star
    150
  • Rank 238,598 (Top 5 %)
  • Language
    JavaScript
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated 9 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Understanding the theory behind UMAP

Understanding UMAP

Dimensionality reduction is a powerful tool for machine learning practitioners to visualize and understand large, high dimensional datasets. One of the most widely used techniques for visualization is t-SNE, but its performance suffers with large datasets and using it correctly can be challenging.

UMAP is a new technique by McInnes et al. that offers a number of advantages over t-SNE, most notably increased speed and better preservation of the data's global structure. In this article, we'll take a look at the theory behind UMAP in order to better understand how the algorithm works, how to use it effectively, and how its performance compares with t-SNE.

yarn
yarn dev

Publishing to github pages

yarn pub

To develop figures individually

yarn dev:cech
yarn dev:hyperparameters
yarn dev:mammoth-umap
yarn dev:mammoth-tsne
yarn dev:supplement
yarn dev:toy
yarn dev:toy_comparison

Data preprocessing

For the mammoth figures, the raw 3D data was downsampled to 50,000 points before being projected with UMAP / t-SNE. These 50,000 points were then randomly subsampled to 10,000 points in order to minimize the payload size.

Understanding UMAP uses a few tricks to make the data payloads for some of the interactive figures small enough to download in a reasonable time. The mammoth figures use a 10-bit encoding scheme to compress the 10,000 data points into a significantly smaller payload. The hyperparameters and toy_comparison figures precompute UMAP embeddings for all of their different combinations, then use the same 10-bit encoding scheme to compress the data.

yarn preprocess:hyperparameters
yarn preprocess:mammoth
yarn preprocess:toy_comparison

More Repositories

1

facets

Visualizations for machine learning datasets
Jupyter Notebook
7,308
star
2

lit

The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
TypeScript
3,386
star
3

saliency

Framework-agnostic implementation for state-of-the-art saliency methods (XRAI, BlurIG, SmoothGrad, and more).
Jupyter Notebook
927
star
4

what-if-tool

Source code/webpage/demos for the What-If Tool
HTML
881
star
5

umap-js

JavaScript implementation of UMAP
JavaScript
344
star
6

knowyourdata

A tool to help researchers and product teams understand datasets with the goal of improving data quality, and mitigating fairness and bias issues.
CSS
273
star
7

wordcraft

✨✍️ Wordcraft is an AI-powered text editor with an emphasis on short story writing
TypeScript
208
star
8

federated-learning

Federated learning experiment using TensorFlow.js
TypeScript
159
star
9

datacardsplaybook

The Data Cards Playbook helps dataset producers and publishers adopt a people-centered approach to transparency in dataset documentation.
TypeScript
157
star
10

scatter-gl

Interactive 3D / 2D webgl-accelerated scatter plot point renderer
TypeScript
156
star
11

interpretability

PAIR.withgoogle.com and friend's work on interpretability methods
JavaScript
109
star
12

ai-explorables

https://pair.withgoogle.com/explorables/
Jupyter Notebook
51
star
13

cococo

𝄑 Collaborative Convolutional Counterpoint
TypeScript
45
star
14

cam-scroller

Cam Scroller is an open-source Chrome extension that uses your webcam and deeplearn.js to enable scrolling through webpages using custom gestures that you define.
JavaScript
33
star
15

font-explorer

Font latent space explorer using tensorflow.js
Vue
31
star
16

clinical-vis

A javascript medical record visualization (https://arxiv.org/abs/1810.05798)
HTML
25
star
17

megaplot

TypeScript
19
star
18

pair-code.github.io

HTML
17
star
19

depth-maps-art-and-illusions

TypeScript
16
star
20

thehardway

Supplementary code repository to accompany Tic-Tac-Toe the Hard Way podcast
JavaScript
11
star
21

covid19_symptom_dataset

JavaScript
11
star
22

recommendation-rudders

TypeScript
10
star
23

jax-recommenders

Python
8
star
24

farsight

In situ interactive widgets for responsible AI 🌱
TypeScript
7
star
25

book-viz

Visualizing multilevel structure in books with sentence embeddings.
Jupyter Notebook
6
star
26

waterfall-of-meaning

TypeScript
4
star
27

tiny-transformers

Jupyter Notebook
4
star
28

deeplearnjs-legacy-loader

Deprecated: Legacy TensorFlow model loader for deeplearn.js
Python
3
star
29

colormap

JavaScript
3
star
30

ml-vis-experiments

Jupyter Notebook
1
star
31

deeplearnjs-docs

TypeScript
1
star
32

auto-histograms

Python
1
star