• Stars
    star
    2,921
  • Rank 15,527 (Top 0.4 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A python library for decision tree visualization and model interpretation.

dtreeviz : Decision Tree Visualization

Description

A python library for decision tree visualization and model interpretation. Decision trees are the fundamental building block of gradient boosting machines and Random Forests(tm), probably the two most popular machine learning models for structured data. Visualizing decision trees is a tremendous aid when learning how these models work and when interpreting models. The visualizations are inspired by an educational animation by R2D3; A visual introduction to machine learning. Please see How to visualize decision trees for deeper discussion of our decision tree visualization library and the visual design decisions we made.

Currently dtreeviz supports: scikit-learn, XGBoost, Spark MLlib, LightGBM, and Tensorflow. See Installation instructions.

Authors

With major code and visualization clean up contributions done by Matthew Epland (@mepland).

Sample Visualizations

Tree visualizations

Prediction path explanations

Leaf information

Feature space exploration

Regression

Classification

Classification boundaries

As a utility function, dtreeviz provides dtreeviz.decision_boundaries() that illustrates one and two-dimensional feature space for classifiers, including colors that represent probabilities, decision boundaries, and misclassified entities. This method is not limited to tree models, by the way, and should work with any model that answers method predict_proba(). That means any model from scikit-learn should work (but we also made it work with Keras models that define predict()). (As it does not work with trees specifically, the function does not use adaptors obtained from dtreeviz.model().) See classifier-decision-boundaries.ipynb.


Sometimes it's helpful to see animations that change some of the hyper parameters. If you look in notebook classifier-boundary-animations.ipynb, you will see code that generates animations such as the following (animated png files):

Quick start

See Installation instructions then take a look at the specific notebooks for the supported ML library you're using:

To interopt with these different libraries, dtreeviz uses an adaptor object, obtained from function dtreeviz.model(), to extract model information necessary for visualization. Given such an adaptor object, all of the dtreeviz functionality is available to you using the same programmer interface. The basic dtreeviz usage recipe is:

  1. Import dtreeviz and your decision tree library
  2. Acquire and load data into memory
  3. Train a classifier or regressor model using your decision tree library
  4. Obtain a dtreeviz adaptor model using
    viz_model = dtreeviz.model(your_trained_model,...)
  5. Call dtreeviz functions, such as
    viz_model.view() or viz_model.explain_prediction_path(sample_x)

Example

Here's a complete example Python file that displays the following tree in a popup window:

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

import dtreeviz

iris = load_iris()
X = iris.data
y = iris.target

clf = DecisionTreeClassifier(max_depth=4)
clf.fit(X, y)

viz_model = dtreeviz.model(clf,
                           X_train=X, y_train=y,
                           feature_names=iris.feature_names,
                           target_name='iris',
                           class_names=iris.target_names)

v = viz_model.view()     # render as SVG into internal object 
v.show()                 # pop up window
v.save("/tmp/iris.svg")  # optionally save as svg

In a notebook, you can render inline without calling show(). Just call view():

viz_model.view()       # in notebook, displays inline

Installation

Install anaconda3 on your system, if not already done.

You might verify that you do not have conda-installed graphviz-related packages installed because dtreeviz needs the pip versions; you can remove them from conda space by doing:

conda uninstall python-graphviz
conda uninstall graphviz

To install (Python >=3.6 only), do this (from Anaconda Prompt on Windows!):

pip install dtreeviz             # install dtreeviz for sklearn
pip install dtreeviz[xgboost]    # install XGBoost related dependency
pip install dtreeviz[pyspark]    # install pyspark related dependency
pip install dtreeviz[lightgbm]   # install LightGBM related dependency
pip install dtreeviz[tensorflow_decision_forests]   # install tensorflow_decision_forests related dependency
pip install dtreeviz[all]        # install all related dependencies

This should also pull in the graphviz Python library (>=0.9), which we are using for platform specific stuff.

Limitations. Only svg files can be generated at this time, which reduces dependencies and dramatically simplifies install process.

Please email Terence with any helpful notes on making dtreeviz work (better) on other platforms. Thanks!

For your specific platform, please see the following subsections.

Mac

Make sure to have the latest XCode installed and command-line tools installed. You can run xcode-select --install from the command-line to install those if XCode is already installed. You also have to sign the XCode license agreement, which you can do with sudo xcodebuild -license from command-line. The brew install shown next needs to build graphviz, so you need XCode set up properly.

You need the graphviz binary for dot. Make sure you have latest version (verified on 10.13, 10.14):

brew reinstall graphviz

Just to be sure, remove dot from any anaconda installation, for example:

rm ~/anaconda3/bin/dot

From command line, this command

dot -Tsvg

should work, in the sense that it just stares at you without giving an error. You can hit control-C to escape back to the shell. Make sure that you are using the right dot as installed by brew:

$ which dot
/usr/local/bin/dot
$ ls -l $(which dot)
lrwxr-xr-x  1 parrt  wheel  33 May 26 11:04 /usr/local/bin/dot@ -> ../Cellar/graphviz/2.40.1/bin/dot
$

Limitations. Jupyter notebook has a bug where they do not show .svg files correctly, but Juypter Lab has no problem.

Linux (Ubuntu 18.04)

To get the dot binary do:

sudo apt install graphviz

Limitations. The view() method works to pop up a new window and images appear inline for jupyter notebook but not jupyter lab (It gets an error parsing the SVG XML.) The notebook images also have a font substitution from the Arial we use and so some text overlaps. Only .svg files can be generated on this platform.

Windows 10

(Make sure to pip install graphviz, which is common to all platforms, and make sure to do this from Anaconda Prompt on Windows!)

Download graphviz-2.38.msi and update your Path environment variable. Add C:\Program Files (x86)\Graphviz2.38\bin to User path and C:\Program Files (x86)\Graphviz2.38\bin\dot.exe to System Path. It's windows so you might need a reboot after updating that environment variable. You should see this from the Anaconda Prompt:

(base) C:\Users\Terence Parr>where dot
C:\Program Files (x86)\Graphviz2.38\bin\dot.exe

(Do not use conda install -c conda-forge python-graphviz as you get an old version of graphviz python library.)

Verify from the Anaconda Prompt that this works (capital -V not lowercase -v):

dot -V

If it doesn't work, you have a Path problem. I found the following test programs useful. The first one sees if Python can find dot:

import os
import subprocess
proc = subprocess.Popen(['dot','-V'])
print( os.getenv('Path') )

The following version does the same thing except uses graphviz Python libraries backend support utilities, which is what we use in dtreeviz:

import graphviz.backend as be
cmd = ["dot", "-V"]
stdout, stderr = be.run(cmd, capture_output=True, check=True, quiet=False)
print( stderr )

If you are having issues with run command you can try copying the following files from: https://github.com/xflr6/graphviz/tree/master/graphviz.

Place them in the AppData\Local\Continuum\anaconda3\Lib\site-packages\graphviz folder.

Clean out the pycache directory too.

For graphviz windows install 8.0.5 and python interface v0.18+ :

import graphviz.backend as be
cmd = ["dot", "-V"]
stdout = be.execute.run_check(cmd, capture_output=True, check=True, quiet=False)
print( stdout )

Jupyter Lab and Jupyter notebook both show the inline .svg images well.

Verify graphviz installation

Try making text file t.dot with content digraph T { A -> B } (paste that into a text editor, for example) and then running this from the command line:

dot -Tsvg -o t.svg t.dot

That should give a simple t.svg file that opens properly. If you get errors from dot, it will not work from the dtreeviz python code. If it can't find dot then you didn't update your PATH environment variable or there is some other install issue with graphviz.

Limitations

Finally, don't use IE to view .svg files. Use Edge as they look much better. I suspect that IE is displaying them as a rasterized not vector images. Only .svg files can be generated on this platform.

Install dtreeviz locally

Make sure to follow the install guidelines above.

To push the dtreeviz library to your local egg cache (force updates) during development, do this (from anaconda prompt on Windows):

python setup.py install -f

E.g., on Terence's box, it add /Users/parrt/anaconda3/lib/python3.6/site-packages/dtreeviz-2.2.2-py3.6.egg.

Feedback

We welcome info from users on how they use dtreeviz, what features they'd like, etc... via email (to parrt) or via an issue.

Useful Resources

License

This project is licensed under the terms of the MIT license, see LICENSE.

More Repositories

1

lolviz

A simple Python data-structure visualization tool for lists of lists, lists, dictionaries; primarily for use in Jupyter notebooks / presentations
Jupyter Notebook
823
star
2

tensor-sensor

The goal of this library is to generate more helpful exception messages for matrix algebra expressions for numpy, pytorch, jax, tensorflow, keras, fastai.
Jupyter Notebook
746
star
3

random-forest-importances

Code to compute permutation and drop-column importances in Python scikit-learn models
Jupyter Notebook
596
star
4

bookish

A tool that translates augmented markdown into HTML or latex
Java
449
star
5

msds621

Course notes for MSDS621 at Univ of San Francisco, introduction to machine learning
Jupyter Notebook
346
star
6

simple-virtual-machine

A simple VM for a talk on building VMs
Java
207
star
7

simple-virtual-machine-C

Same as simple-virtual-machine but in C
C
136
star
8

msds692

MSAN692 Data Acquisition
HTML
125
star
9

msds501

Course notes for MSDS501, computational boot camp, at the University of San Francisco
Jupyter Notebook
123
star
10

cs652

University of San Francisco CS652 -- Programming Languages
Java
112
star
11

fundamentals-of-deep-learning

Course notes and notebooks to teach the fundamentals of how deep learning works; uses PyTorch.
Jupyter Notebook
73
star
12

msds689

Course syllabus, notes, projects for USF's MSDS689
Jupyter Notebook
64
star
13

stratx

stratx is a library for A Stratification Approach to Partial Dependence for Codependent Variables
TeX
62
star
14

ml-articles

Articles on machine learning
Jupyter Notebook
61
star
15

cs601

USF CS601 lecture notes and sample code
Java
54
star
16

msds593

MSDS593 -- Exploratory data analysis (EDA) at the University of San Francisco
Jupyter Notebook
25
star
17

website-explained.ai

The website content for explained.ai
Jupyter Notebook
23
star
18

msan501-old

USF MSAN501 lecture notes and sample code
TeX
21
star
19

mini-markdown

Parser for small subset of markdown
Java
20
star
20

cs345

CS345 Programming Languages at University of San Francisco
19
star
21

AniML-java

A Java implementation of random forest machine learning algorithm / classifier
Java
9
star
22

website-mlbook

Public repo to host website for public releases of mlbook html
HTML
8
star
23

bash-git-prompt

My own variation on the bash git prompt
Python
8
star
24

autodx

Simple automatic differentiation via operator overloading for educational purposes
TeX
7
star
25

data-acquisition

Data acquisition certificate (part of http://www.sfdatainstitute.org Course number CAS-DI-DAPY-001.
HTML
7
star
26

parrtlib

Parrt's Java library with useful functions
Java
6
star
27

gmdh

Experiment with GMDH polynomial computation-graph nodes
Python
5
star
28

msan501-starterkit

A starter kit with tests and skeleton code for the computational analytics boot camp, MSAN501, at the University of San Francisco.
Python
5
star
29

bild

A simple build utility written in Python, though I'll use to build java projects.
Python
5
star
30

c_unit

A C unit testing rig in the spirit of junit.
C
4
star
31

sample-jetbrains-plugin

A sample jetbrains plugin that uses ANTLR for lexing/parsing.
Java
4
star
32

java-neural-net

A simple neural network in java using particle swarm optimization.
Java
4
star
33

playdl

Playing with deep learning
Jupyter Notebook
3
star
34

antlr4-demo-simple-lang

Simple language grammar and listener for talk demos
Java
3
star
35

hash-duo

Explore building a hash table with two different hash functions that balances chain length
C++
3
star
36

selfnet

Playing with self-organizing deep learning neural networks
Jupyter Notebook
2
star
37

pltvid

A simple library to capture multiple matplotlib plots as a movie.
Jupyter Notebook
2
star
38

gpu-test

A test of OpenCL use on OS X, XCode. Simple vector squaring.
C
2
star
39

learn-git

1
star
40

gradle-antlr-plugin

The Official Gradle ANTLR plugin
1
star
41

cs601-webmail-skeleton

Some goodies to help start the CS601 webmail project
Java
1
star
42

cs601-webmail-st-skeleton

StringTemplate-based version of webmail skeleon
Java
1
star
43

inclass

1
star
44

foobar

1
star
45

website-book.explained.ai

HTML
1
star
46

demo

test for class
Java
1
star
47

website-faculty-parrt

My faculty web page
HTML
1
star
48

annotation-processor

Java
1
star