• Stars
    star
    104
  • Rank 330,604 (Top 7 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 5 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An Interactive Tool for Scalable and Reproducible Error Analysis.

Errudite

Errudite is an interactive tool for scalable, reproducible, and counterfactual error analysis. Errudite provides an expressive domain-specific language for extracting relevant features of linguistic data, which allows users to visualize data attributes, group relevant instances, and perform counterfactual analysis across all available validation data.

Getting Started

  1. Read our blog post which explains the core idea of Errudite.
  2. Watch this video demo that contains the highlights of Errudite's functions & use cases
  3. Get set up quickly
  4. Try Errudite's user interface on machine comprehension
  5. Try the tutorials on JupyterLab notebooks
  6. Read the documentation

Citation

If you are interested in this work, please see our ACL 2019 research paper and consider citing our work:

@inproceedings{2019-errudite,
    title = {Errudite: Scalable, Reproducible, and Testable Error Analysis},
    author = {Wu, Tongshuang and Ribeiro, Marco Tulio and Heer, Jeffrey and Weld Daniel S.},
  booktitle={the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)},
    year = {2019},
    url = {https://www.aclweb.org/anthology/P19-1073.pdf},
}

Quick Start

Installation

PIP

Errudite requires Python 3.6.x. The package is avaiable through pip: Just install it in your Python environment and you're good to go!

# create the virtual environment
virtualenv --no-site-packages -p python3.6 venv
# activate venv
source venv/bin/activate
# install errudite
pip install errudite

Install from source

You can also install Errudite by cloning our git repository:

git clone https://github.com/uwdata/errudite

Create a Python 3.6 virtual environment, and install Errudite in editable mode by running:

pip install --editable .

This will make errudite available on your system but it will use the sources from the local clone you made of the source repository.

  1. mysql_config not found for Pattern: See similar solutions here.

GUI Server

Errudite has a UI wrapped for Machine Comprehension and Visual Question Answering tasks. The interface integrates all the key analysis functions (e.g., inspecting instance attributes, grouping similar instances, rewriting instances), It also provides exploration support such as visualizing data distributions, suggesting potential queries, and presenting the grouping and rewriting results. While not strictly necessary, it makes their application much more straightforward.

Note that the GUI is meant to be released as-is -- We do not expect it to be extended to other tasks. As such, the frontend code is not as well-documented as the backend code. If you are interested in using Errudite for your own task, please consider using Errudite package in JupyterLab. It wraps almost all the Errudite functions (except for query auto-complete and programming-by-demonstration), and allows you to customize for your own task.

To get a taste of GUI for the machine comprehension task, you should first download a cache folder for preprocessed SQuAD instances, which will help you skip the process of running your own preprocessing. Say we want to use the preprocessed SQuAD dataset, and we want to download the preprocessed data folder to ~/caches/:

python -m errudite.download --cache_folder_name squad-10570 --cache_path ~/caches/

Commands:
    cache_folder_name
                A folder name. Currently, we allow downloading the following:
                squad-100, squad-10570.
    cache_path  A local path where you want to save the cache folder to.

Then, we need to start the server:

# the model relies on Allennlp, so make sure you install that first.
# If you run into issues installing it, please refer to Allennlp's official page: https://github.com/allenai/allennlp
pip install allennlp==0.9.0
source venv/bin/activate
python -m errudite.server --config_file config.yml

Commands:
    config_file
                A yaml config file path.

The config file looks like the following (or in config.yml):

task: qa # the task, should be "qa" and "vqa".
cache_path:  ~/caches/squad-10570 # the cached folder: {cache_path}/{cache_folder_name}/
model_metas: # a model.
- name: bidaf
  model_class: bidaf # an implemented model class
  model_path: # a local model file path
  # an online path to an Allennlp model
  model_online_path: https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.09.15-charpad.tar.gz
  description: Pretrained model from Allennlp, for the BiDAF model (QA)
attr_file_name: null # It set, to load previously saved analysis.
group_file_name: null
rewrite_file_name: null

Then visit http://localhost:5000/ in your web browser.

JupyterLab Tutorial (and task extension)

Besides used in a GUI, errudite also serves as a general python package. The tutorial goes through:

  1. Preprocessing the data, and extending Errudite to different tasks & predictors
  2. Creating data attributes and data groups with a domain specific language (or your customized functions).
  3. Creating rewrite rules with the domain specific language (or your customized functions).

To go through the tutorial, do the following steps:

# clone the repo
git clone https://github.com/uwdata/errudite
# initial folder: errudite/
# create the virtual environment
virtualenv --no-site-packages -p python3.6 venv
# activate venv
source venv/bin/activate

# run the default setup script
pip install --editable .

# get to the tutorial folder, and start!
cd tutorials
pip install -r requirements_tutorial.txt
jupyter lab

More Repositories

1

visualization-curriculum

A data visualization curriculum of interactive notebooks.
Jupyter Notebook
1,275
star
2

arquero

Query processing and transformation of array-backed data tables.
JavaScript
1,251
star
3

mosaic

An extensible framework for linking databases and interactive views.
JavaScript
688
star
4

draco

Visualization Constraints and Weight Learning
TypeScript
222
star
5

d3-tutorials

D3 Tutorials for CSE512 Data Visualization Course at University of Washington
HTML
170
star
6

imMens

Real-Time Visual Querying of Big Data
HTML
168
star
7

living-papers

Authoring tools for scholarly communication. Create interactive web pages or formal research papers from markdown source.
TeX
129
star
8

termite-data-server

Data Server for Topic Models
Python
120
star
9

gemini

A grammar and recommender system for animated transitions in Vega/Vega-Lite
JavaScript
103
star
10

vsup

Code for generating Value-Suppressing Uncertainty Palettes for use in D3 charts.
JavaScript
77
star
11

latent-space-cartography

Visual analysis of vector space embeddings
HTML
74
star
12

setcola

High-Level Constraints for Graph Layout
JavaScript
72
star
13

boba

Specifying and executing multiverse analysis
Python
62
star
14

termite-visualizations

[development moved to termite-data-server]
Python
61
star
15

rev

REV: Reverse-Engineering Visualizations
Python
60
star
16

graphscape

A directed graph model of the visualization design space, using Vega-Lite.
JavaScript
58
star
17

fast-kde

Fast, approximate Gaussian kernel density estimation.
JavaScript
56
star
18

bayesian-surprise

Bayesian Weighting for De-Biasing Thematic Maps
TeX
54
star
19

gestrec

A JavaScript implementation of the Protractor gesture recognizer.
JavaScript
36
star
20

perceptual-kernels

Data & source code for the perceptual kernels study
HTML
33
star
21

ellipsis

Visualization Storytelling Components
JavaScript
31
star
22

visual-embedding

Data & source code for the visual embedding model
MATLAB
31
star
23

boba-visualizer

A visual analysis tool for exploring multiverse outcomes
JavaScript
31
star
24

color-naming-in-different-languages

JavaScript
27
star
25

papers-vsup

Visualize uncertainty
TeX
27
star
26

arquero-sql

Database backend support for Arquero
JavaScript
24
star
27

arquero-worker

Worker thread support for Arquero.
JavaScript
22
star
28

living-papers-template

A Living Papers article starter template.
22
star
29

mosaic-framework-example

Using Mosaic and DuckDB within Observable Framework
TypeScript
22
star
30

dziban

Context-Aware, Recommender-Powered Visualization Authoring
Jupyter Notebook
21
star
31

draco-vis

Draco on the web
TypeScript
18
star
32

flechette

Fast, lightweight access to Apache Arrow data.
JavaScript
18
star
33

diagnostics

Topic Model Diagnostics
JavaScript
14
star
34

vegaserver

A simple node server that renders vega specs to SVG or PNG.
JavaScript
13
star
35

visual-encoding-effectiveness-data

Supplement material for "Assessing Effects of Task and Data Distribution on the Effectiveness of Visual Encodings".
JavaScript
13
star
36

divi

Automatically interact with SVG charts.
JavaScript
10
star
37

quantitative-color-data

Data for quantitative colormap study
R
10
star
38

citation-query

Retrieve paper citatation data from doi.org and Semantic Scholar.
JavaScript
10
star
39

arquero-arrow

Arrow serialization support for Arquero.
JavaScript
9
star
40

verp

The VERP Explorer
JavaScript
8
star
41

termite-stm

[development moved to termite-data-server]
Python
8
star
42

code-augmentation

Code augmentation editor
JavaScript
7
star
43

aggregate-animation-data

Supplement material for "Designing Animated Transitions to Convey Aggregate Operations"
JavaScript
7
star
44

vega-dataflow

Reactive dataflow processing.
JavaScript
7
star
45

trend-bias

Experiments on trend-fitting
TeX
6
star
46

termite-treetm

[development moved to termite-data-server]
Python
6
star
47

flights-arrow

Flight Dataset as Apache Arrow in Different Sizes
6
star
48

living-papers-paper

The UIST'23 Living Papers research paper and supplemental material.
JavaScript
5
star
49

fast-kde-benchmarks

Research archive of methods and benchmarks for fast, approximate Gaussian kernel density estimation.
JavaScript
5
star
50

gemini-supplemental-material

Supplemental material for "Gemini: A Grammar and Recommender System for Animated Transitions in Statistical Graphics"
HTML
5
star
51

uwdata.github.io

UW Interactive Data Lab web page
Svelte
5
star
52

palette-analyzer

Analyzes the local and global distances in [RGB, LAB, UCS, Color Names] model, given a palette.
HTML
5
star
53

draco-learn

Learning Weights for Draco
Python
4
star
54

draco-editor

The Draco Online Editor
CSS
4
star
55

datalib

We've moved! Please see https://github.com/vega/datalib
3
star
56

file-cache

File-based cache for JSON-serializable data.
JavaScript
3
star
57

istc-explorer

JavaScript
2
star
58

draco-analysis

Notebooks for Draco
Jupyter Notebook
2
star
59

draco-tools

Tools for Draco
JavaScript
2
star
60

living-papers-examples

Example Living Papers Articles
JavaScript
2
star
61

draco-tuner

An interactive application to modify Draco's knowledge base
TypeScript
1
star