• Stars
    star
    4,286
  • Rank 10,060 (Top 0.2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Visual analysis and diagnostic tools to facilitate machine learning model selection.

Yellowbrick

Build Status Coverage Status Total Alerts Language Grade: Python PyPI version Documentation Status Black DOI JOSS Binder

Visual analysis and diagnostic tools to facilitate machine learning model selection.

Banner

What is Yellowbrick?

Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the scikit-learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines scikit-learn with matplotlib in the best tradition of the scikit-learn documentation, but to produce visualizations for your machine learning workflow!

For complete documentation on the Yellowbrick API, a gallery of available visualizers, the contributor's guide, tutorials and teaching resources, frequently asked questions, and more, please visit our documentation at www.scikit-yb.org.

Installing Yellowbrick

Yellowbrick is compatible with Python 3.4 or later and also depends on scikit-learn and matplotlib. The simplest way to install Yellowbrick and its dependencies is from PyPI with pip, Python's preferred package installer.

$ pip install yellowbrick

Note that Yellowbrick is an active project and routinely publishes new releases with more visualizers and updates. In order to upgrade Yellowbrick to the latest version, use pip as follows.

$ pip install -U yellowbrick

You can also use the -U flag to update scikit-learn, matplotlib, or any other third party utilities that work well with Yellowbrick to their latest versions.

If you're using Anaconda (recommended for Windows users), you can take advantage of the conda utility to install Yellowbrick:

conda install -c districtdatalabs yellowbrick

Using Yellowbrick

The Yellowbrick API is specifically designed to play nicely with scikit-learn. Here is an example of a typical workflow sequence with scikit-learn and Yellowbrick:

Feature Visualization

In this example, we see how Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm and then returns them ranked as a lower left triangle diagram.

from yellowbrick.features import Rank2D

visualizer = Rank2D(
    features=features, algorithm='covariance'
)
visualizer.fit(X, y)                # Fit the data to the visualizer
visualizer.transform(X)             # Transform the data
visualizer.show()                   # Finalize and render the figure

Model Visualization

In this example, we instantiate a scikit-learn classifier and then use Yellowbrick's ROCAUC class to visualize the tradeoff between the classifier's sensitivity and specificity.

from sklearn.svm import LinearSVC
from yellowbrick.classifier import ROCAUC

model = LinearSVC()
visualizer = ROCAUC(model)
visualizer.fit(X,y)
visualizer.score(X,y)
visualizer.show()

For additional information on getting started with Yellowbrick, view the Quick Start Guide in the documentation and check out our examples notebook.

Contributing to Yellowbrick

Yellowbrick is an open source project that is supported by a community who will gratefully and humbly accept any contributions you might make to the project. Large or small, any contribution makes a big difference; and if you've never contributed to an open source project before, we hope you will start with Yellowbrick!

If you are interested in contributing, check out our contributor's guide. Beyond creating visualizers, there are many ways to contribute:

  • Submit a bug report or feature request on GitHub Issues.
  • Contribute a Jupyter notebook to our examples gallery.
  • Assist us with user testing.
  • Add to the documentation or help with our website, scikit-yb.org.
  • Write unit or integration tests for our project.
  • Answer questions on our issues, mailing list, Stack Overflow, and elsewhere.
  • Translate our documentation into another language.
  • Write a blog post, tweet, or share our project with others.
  • Teach someone how to use Yellowbrick.

As you can see, there are lots of ways to get involved and we would be very happy for you to join us! The only thing we ask is that you abide by the principles of openness, respect, and consideration of others as described in the Python Software Foundation Code of Conduct.

For more information, checkout the CONTRIBUTING.md file in the root of the repository or the detailed documentation at Contributing to Yellowbrick

Yellowbrick Datasets

Yellowbrick gives easy access to several datasets that are used for the examples in the documentation and testing. These datasets are hosted in our CDN and must be downloaded for use. Typically, when a user calls one of the data loader functions, e.g. load_bikeshare() the data is automatically downloaded if it's not already on the user's computer. However, for development and testing, or if you know you will be working without internet access, it might be easier to simply download all the data at once.

The data downloader script can be run as follows:

$ python -m yellowbrick.download

This will download the data to the fixtures directory inside of the Yellowbrick site packages. You can specify the location of the download either as an argument to the downloader script (use --help for more details) or by setting the $YELLOWBRICK_DATA environment variable. This is the preferred mechanism because this will also influence how data is loaded in Yellowbrick.

Note: Developers who have downloaded data from Yellowbrick versions earlier than v1.0 may experience some problems with the older data format. If this occurs, you can clear out your data cache as follows:

$ python -m yellowbrick.download --cleanup

This will remove old datasets and download the new ones. You can also use the --no-download flag to simply clear the cache without re-downloading data. Users who are having difficulty with datasets can also use this or they can uninstall and reinstall Yellowbrick using pip.

Citing Yellowbrick

We would be glad if you used Yellowbrick in your scientific publications! If you do, please cite us using the citation guidelines.

Affiliations

District Data Labs NumFOCUS Affiliated Project

More Repositories

1

baleen

An automated ingestion service for blogs to construct a corpus for NLP research.
Python
86
star
2

machine-learning

Code & Data for Introduction to Machine Learning with Scikit-Learn
Jupyter Notebook
81
star
3

intro-to-nltk

Code and Notebooks for the Natural Language Processing with Python course.
Jupyter Notebook
66
star
4

blog-files

Public code files for the DDL blog
Python
56
star
5

cultivar

Multidimensional data explorer and visualization tool.
HTML
52
star
6

entity-resolution

Tutorial code and data for the entity resolution workshops.
Python
45
star
7

science-bookclub

Generating the next read for our book club- with Data Science!
Python
40
star
8

PyCon2016

Code bases, tutorials, posters, and other content for PyCon2016.
JavaScript
38
star
9

partisan-discourse

A web application that identifies party in political discourse and an example of operationalized machine learning.
Python
27
star
10

spark-workshop

Data and code for "Fast Data Applications with Spark and Python"
Python
25
star
11

yellowbrick-docs-zh

Chinese translation of Yellowbrick documentation
Python
19
star
12

brookings-nlp

Teaching materials for the text analytics course
Jupyter Notebook
18
star
13

minke

Graph extraction and NLP analysis for Baleen Corpora
Python
18
star
14

minimum-entropy

Minimum Entropy is a DDL hosted question/answer site for beginners who need answers to Data Science questions.
Python
16
star
15

PyCon2017

Resources and materials related to PyCon 2017.
HTML
11
star
16

django-data-product

An example data product using Django
Python
8
star
17

ceb-training

Notebooks and materials for DDL/CEB training.
Jupyter Notebook
7
star
18

topicmaps

Fast topic survey with associated word cloud visualization on completion.
HTML
7
star
19

Brookings_Python_DS

Jupyter Notebook
6
star
20

diconf

Notebooks and code for "Visual Pipelines for Text Analysis" at the Data Intelligence Conference: June 23, 2017.
Jupyter Notebook
5
star
21

city-dash

City Intelligence Dashboard Project
Jupyter Notebook
5
star
22

yellowbrick-docs-tr

Turkish translation of Yellowbrick documentation
Python
5
star
23

brookings

Teaching materials for web scraping class
Jupyter Notebook
5
star
24

bigtooth

Finding how common the strangers in your life are (reword)
Python
5
star
25

dod-ds-overview

Data Science and Big Data Overview Training
Jupyter Notebook
5
star
26

navyfcu-ml

Notebooks and data for Machine Learning course.
HTML
4
star
27

dos-managers-executives

Business Data Analysis for Managers and Executives Training
Jupyter Notebook
4
star
28

yellowbrick-datasets

Yellowbrick datasets management and deployment scripts.
Python
4
star
29

logbook

A simple web application for activity tracking and event aggregation.
Python
4
star
30

03-data-bandits

DATA BANDITS
JavaScript
3
star
31

brookings-sql

Teaching materials for the SQL course
R
3
star
32

dos-advanced-excel

Advanced Excel and Power BI Training
Jupyter Notebook
3
star
33

semnet-similarity

NLE implementation of similarity computation using semantic networks.
Python
3
star
34

content-optimization

Jupyter Notebook
2
star
35

03-mineralytics

HTML
2
star
36

mapreduce

A multiprocess implementation of MapReduce in Python
Python
2
star
37

supervised_ml_R

Code and slides for supervised machine learning in R
HTML
2
star
38

pycon2018

resources for pycon 2018
1
star
39

04-team4

Repository for Incubator 4 Team 4
CoffeeScript
1
star
40

transportation-project-1

Transportation Project for District Data Labs Incubator
Python
1
star
41

02-ppm-data

Private repo for PPM Data team.
Jupyter Notebook
1
star
42

03-EMU

Python
1
star
43

yellowbrick-docs-es

Spanish translation of the Yellowbrick documentation
Python
1
star
44

political_history

A machine learning approach to recording and analyzing the 2016 election.
Jupyter Notebook
1
star
45

sports-project-1

Retail Project for District Data Labs Incubator
Python
1
star
46

03-censusables

Private repo for Team 7.
JavaScript
1
star
47

04-team5

Repository for Incubator 4 Team 5
Jupyter Notebook
1
star
48

02-synthesizers

DDL Incubator 2.0 repository for the Synthesizers team.
Python
1
star
49

02-labormatch

Private repo for team Labor Match
Python
1
star
50

company-clustering

Intuitive Hierarchical Text-Based Clustering Research Project
Jupyter Notebook
1
star