Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Clojure

Haskell

Java

CSS

Crystal

Python

Ruby

HTML

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Ruby

Nix

C#

Zig

JavaScript

Java

Perl

Go

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇰🇲 Comoros

🇵🇷 Puerto Rico

🇺🇬 Uganda

🇬🇲 The Gambia

🇧🇸 The Bahamas

🇸🇷 Suriname

🇮🇩 Indonesia

🇲🇿 Mozambique

All Countries Compare Countries

PAIR-code/understanding-umap

Stars
164
Rank 230,032 (Top 5 %)
Language
JavaScript
License
Apache License 2.0
Created about 5 years ago
Updated about 2 months ago

PAIR-code/understanding-umap

PAIR-code

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Understanding the theory behind UMAP

Understanding UMAP

Dimensionality reduction is a powerful tool for machine learning practitioners to visualize and understand large, high dimensional datasets. One of the most widely used techniques for visualization is t-SNE, but its performance suffers with large datasets and using it correctly can be challenging.

UMAP is a new technique by McInnes et al. that offers a number of advantages over t-SNE, most notably increased speed and better preservation of the data's global structure. In this article, we'll take a look at the theory behind UMAP in order to better understand how the algorithm works, how to use it effectively, and how its performance compares with t-SNE.

yarn
yarn dev

Publishing to github pages

yarn pub

To develop figures individually

yarn dev:cech
yarn dev:hyperparameters
yarn dev:mammoth-umap
yarn dev:mammoth-tsne
yarn dev:supplement
yarn dev:toy
yarn dev:toy_comparison

Data preprocessing

For the mammoth figures, the raw 3D data was downsampled to 50,000 points before being projected with UMAP / t-SNE. These 50,000 points were then randomly subsampled to 10,000 points in order to minimize the payload size.

Understanding UMAP uses a few tricks to make the data payloads for some of the interactive figures small enough to download in a reasonable time. The mammoth figures use a 10-bit encoding scheme to compress the 10,000 data points into a significantly smaller payload. The hyperparameters and toy_comparison figures precompute UMAP embeddings for all of their different combinations, then use the same 10-bit encoding scheme to compress the data.

yarn preprocess:hyperparameters
yarn preprocess:mammoth
yarn preprocess:toy_comparison

facets

Visualizations for machine learning datasets

Jupyter Notebook

lit

The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.

saliency

Framework-agnostic implementation for state-of-the-art saliency methods (XRAI, BlurIG, SmoothGrad, and more).

Jupyter Notebook

what-if-tool

Source code/webpage/demos for the What-If Tool

umap-js

JavaScript implementation of UMAP

llm-comparator

LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.

knowyourdata

A tool to help researchers and product teams understand datasets with the goal of improving data quality, and mitigating fairness and bias issues.

wordcraft

✨✍️ Wordcraft is an AI-powered text editor with an emphasis on short story writing

datacardsplaybook

The Data Cards Playbook helps dataset producers and publishers adopt a people-centered approach to transparency in dataset documentation.

scatter-gl

Interactive 3D / 2D webgl-accelerated scatter plot point renderer

federated-learning

Federated learning experiment using TensorFlow.js

interpretability

PAIR.withgoogle.com and friend's work on interpretability methods

ai-explorables

https://pair.withgoogle.com/explorables/

Jupyter Notebook

cococo

𝄡 Collaborative Convolutional Counterpoint

cam-scroller

Cam Scroller is an open-source Chrome extension that uses your webcam and deeplearn.js to enable scrolling through webpages using custom gestures that you define.

font-explorer

Font latent space explorer using tensorflow.js

clinical-vis

A javascript medical record visualization (https://arxiv.org/abs/1810.05798)

megaplot

depth-maps-art-and-illusions

pair-code.github.io

farsight

In situ interactive widgets for responsible AI 🌱

tiny-transformers

Jupyter Notebook

recommendation-rudders

covid19_symptom_dataset

thehardway

Supplementary code repository to accompany Tic-Tac-Toe the Hard Way podcast

jax-recommenders

autonotes

AutoNotes is an experimental prototype for AI-powered notetaking, with features including hierarchical tagging, "chat with your notes," and highlights.

book-viz

Visualizing multilevel structure in books with sentence embeddings.

Jupyter Notebook

model-alignment

Model Alignment is a python library from the PAIR team that enable users to create model prompts through user feedback instead of manual prompt writing and editing. The technique makes use of constitutional principles to align prompts to users' desired values.

waterfall-of-meaning

deliberate-lab

Platform for running online research experiments on human + LLM group dynamics.

deeplearnjs-legacy-loader

Deprecated: Legacy TensorFlow model loader for deeplearn.js

colormap

adversarial-nibbler-vis

An interactive visualization interface for exploring and analyzing the Adversarial Nibbler dataset

auto-histograms

ml-vis-experiments

Jupyter Notebook

deeplearnjs-docs