• This repository has been archived on 27/Jul/2024
  • Stars
    star
    7,345
  • Rank 5,270 (Top 0.2 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Visualizations for machine learning datasets

Introduction

The facets project contains two visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive.

The visualizations are implemented as Polymer web components, backed by Typescript code and can be easily embedded into Jupyter notebooks or webpages.

Live demos of the visualizations can be found on the Facets project description page.

Facets Overview

Overview visualization of UCI census data

Overview gives a high-level view of one or more data sets. It produces a visual feature-by-feature statistical analysis, and can also be used to compare statistics across two or more data sets. The tool can process both numeric and string features, including multiple instances of a number or string per feature.

Overview can help uncover issues with datasets, including the following:

  • Unexpected feature values
  • Missing feature values for a large number of examples
  • Training/serving skew
  • Training/test/validation set skew

Key aspects of the visualization are outlier detection and distribution comparison across multiple datasets. Interesting values (such as a high proportion of missing data, or very different distributions of a feature across multiple datasets) are highlighted in red. Features can be sorted by values of interest such as the number of missing values or the skew between the different datasets.

The python code to generate the statistics for visualization can be installed through pip install facets-overview. As of version 1.1.0, the facets-overview package requires a version of protobuf at version 3.20.0 or later.

Details about Overview usage can be found in its README.

Facets Dive

Dive visualization of UCI census data

Dive is a tool for interactively exploring up to tens of thousands of multidimensional data points, allowing users to seamlessly switch between a high-level overview and low-level details. Each example is a represented as single item in the visualization and the points can be positioned by faceting/bucketing in multiple dimensions by their feature values. Combining smooth animation and zooming with faceting and filtering, Dive makes it easy to spot patterns and outliers in complex data sets.

Details about Dive usage can be found in its README.

Setup

Usage in Google Colabratory/Jupyter Notebooks

Using Facets in Google Colabratory and Jupyter notebooks can be seen in this notebook. These notebooks work without the need to first download/install this repository.

Both Facets visualizations make use of HTML imports. So in order to use them, you must first load the appropriate polyfill, through <script src="https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js"></script>, as shown in the demo notebooks in this repo.

Note that for using Facets Overview in a Jupyter notebook, there are two considerations:

  1. In the notebook, you will need to change the path that the Facets Overview python code is loaded from to the correct path given where your notebook kernel is run from.
  2. You must also have the Protocol Buffers python runtime library installed: https://github.com/google/protobuf/tree/master/python. If you used pip or anaconda to install Jupyter, you can use the same tool to install the runtime library.

When visualizing a large amount of data in Dive in a Juypter notebook, as is done in the Dive demo Jupyter notebook, you will need to start the notebook server with an increased IOPub data rate. This can be done with the command jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000.

Code Installation

git clone https://github.com/PAIR-code/facets
cd facets

Building the Visualizations

If you make code changes to the visualization and would like to rebuild them, follow these directions:

  1. Install bazel: https://bazel.build/
  2. Build the visualizations: bazel build facets:facets_jupyter (run from the facets top-level directory)

Using the rebuilt Visualizations in a Jupyter notebook

If you want to use the visualizations you built locally in a Jupyter notebook, follow these directions:

  1. Move the resulting vulcanized html file from the build step into the facets-dist directory: cp -f bazel-bin/facets/facets-jupyter.html facets-dist/
  2. Install the visualizations into Jupyter as an nbextension.
  • If jupyter was installed with pip, you can use jupyter nbextension install facets-dist/ if jupyter was installed system-wide or jupyter nbextension install facets-dist/ --user if installed per-user (run from the facets top-level directory). You do not need to run any follow-up jupyter nbextension enable command for this extension.
  • Alternatively, you can manually install the nbextension by finding your jupyter installation's share/jupyter/nbextensions folder and copying the facets-dist directory into it.
  1. In the notebook cell's HTML link tag that loads the built facets html, load from /nbextensions/facets-dist/facets-jupyter.html, which is the locally installed facets distribution. from the previous step.

Known Issues

  • The Facets visualizations currently work only in Chrome - Issue 9.

Disclaimer: This is not an official Google product

More Repositories

1

lit

The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
TypeScript
3,482
star
2

saliency

Framework-agnostic implementation for state-of-the-art saliency methods (XRAI, BlurIG, SmoothGrad, and more).
Jupyter Notebook
951
star
3

what-if-tool

Source code/webpage/demos for the What-If Tool
HTML
909
star
4

umap-js

JavaScript implementation of UMAP
JavaScript
375
star
5

llm-comparator

LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.
JavaScript
286
star
6

knowyourdata

A tool to help researchers and product teams understand datasets with the goal of improving data quality, and mitigating fairness and bias issues.
CSS
281
star
7

wordcraft

✨✍️ Wordcraft is an AI-powered text editor with an emphasis on short story writing
TypeScript
239
star
8

datacardsplaybook

The Data Cards Playbook helps dataset producers and publishers adopt a people-centered approach to transparency in dataset documentation.
TypeScript
170
star
9

scatter-gl

Interactive 3D / 2D webgl-accelerated scatter plot point renderer
TypeScript
168
star
10

understanding-umap

Understanding the theory behind UMAP
JavaScript
164
star
11

federated-learning

Federated learning experiment using TensorFlow.js
TypeScript
160
star
12

interpretability

PAIR.withgoogle.com and friend's work on interpretability methods
JavaScript
147
star
13

ai-explorables

https://pair.withgoogle.com/explorables/
Jupyter Notebook
59
star
14

cococo

𝄑 Collaborative Convolutional Counterpoint
TypeScript
46
star
15

cam-scroller

Cam Scroller is an open-source Chrome extension that uses your webcam and deeplearn.js to enable scrolling through webpages using custom gestures that you define.
JavaScript
33
star
16

font-explorer

Font latent space explorer using tensorflow.js
Vue
32
star
17

clinical-vis

A javascript medical record visualization (https://arxiv.org/abs/1810.05798)
HTML
26
star
18

megaplot

TypeScript
19
star
19

depth-maps-art-and-illusions

TypeScript
18
star
20

pair-code.github.io

HTML
18
star
21

farsight

In situ interactive widgets for responsible AI 🌱
TypeScript
17
star
22

tiny-transformers

Jupyter Notebook
16
star
23

recommendation-rudders

TypeScript
13
star
24

covid19_symptom_dataset

JavaScript
12
star
25

thehardway

Supplementary code repository to accompany Tic-Tac-Toe the Hard Way podcast
JavaScript
11
star
26

jax-recommenders

Python
9
star
27

autonotes

AutoNotes is an experimental prototype for AI-powered notetaking, with features including hierarchical tagging, "chat with your notes," and highlights.
TypeScript
8
star
28

book-viz

Visualizing multilevel structure in books with sentence embeddings.
Jupyter Notebook
6
star
29

model-alignment

Model Alignment is a python library from the PAIR team that enable users to create model prompts through user feedback instead of manual prompt writing and editing. The technique makes use of constitutional principles to align prompts to users' desired values.
Python
6
star
30

waterfall-of-meaning

TypeScript
5
star
31

deliberate-lab

Platform for running online research experiments on human + LLM group dynamics.
TypeScript
4
star
32

deeplearnjs-legacy-loader

Deprecated: Legacy TensorFlow model loader for deeplearn.js
Python
3
star
33

colormap

JavaScript
3
star
34

adversarial-nibbler-vis

An interactive visualization interface for exploring and analyzing the Adversarial Nibbler dataset
TypeScript
3
star
35

auto-histograms

Python
2
star
36

ml-vis-experiments

Jupyter Notebook
1
star
37

deeplearnjs-docs

TypeScript
1
star