• Stars
    star
    318
  • Rank 127,135 (Top 3 %)
  • Language
    R
  • License
    GNU General Publi...
  • Created almost 5 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

πŸ“ Interactive Studio for Explanatory Model Analysis

Interactive Studio for Explanatory Model Analysis

CRAN_Status_Badge R build status Codecov test coverage JOSS-status

Overview

The modelStudio package automates the explanatory analysis of machine learning predictive models. It generates advanced interactive model explanations in the form of a serverless HTML site with only one line of code. This tool is model-agnostic, therefore compatible with most of the black-box predictive models and frameworks (e.g.Β mlr/mlr3, xgboost, caret, h2o, parsnip, tidymodels, scikit-learn, lightgbm, keras/tensorflow).

The main modelStudio() function computes various (instance and model-level) explanations and produces aΒ customisable dashboard, which consists of multiple panels for plots with their short descriptions. It is possible to easily save the dashboard andΒ share it with others. Tools for Explanatory Model Analysis unite with tools for Exploratory Data Analysis to give a broad overview of the model behavior.

explain COVID-19   R & Python examples   More resources   Interactive EMA  

The modelStudio package is a part of the DrWhy.AI universe.

Installation

# Install from CRAN:
install.packages("modelStudio")

# Install the development version from GitHub:
devtools::install_github("ModelOriented/modelStudio")

Simple demo

library("DALEX")
library("ranger")
library("modelStudio")

# fit a model
model <- ranger(score ~., data = happiness_train)

# create an explainer for the model    
explainer <- explain(model,
                     data = happiness_test,
                     y = happiness_test$score,
                     label = "Random Forest")

# make a studio for the model
modelStudio(explainer)

Save the output in the form of a HTML file - Demo Dashboard.

R & Python examples more


The modelStudio() function uses DALEX explainers created with DALEX::explain() or DALEXtra::explain_*().

# packages for the explainer objects
install.packages("DALEX")
install.packages("DALEXtra")

mlr dashboard

Make a studio for the regression ranger model on the apartments data.

code
# load packages and data
library(mlr)
library(DALEXtra)
library(modelStudio)

data <- DALEX::apartments

# split the data
index <- sample(1:nrow(data), 0.7*nrow(data))
train <- data[index,]
test <- data[-index,]

# fit a model
task <- makeRegrTask(id = "apartments", data = train, target = "m2.price")
learner <- makeLearner("regr.ranger", predict.type = "response")
model <- train(learner, task)

# create an explainer for the model
explainer <- explain_mlr(model,
                         data = test,
                         y = test$m2.price,
                         label = "mlr")

# pick observations
new_observation <- test[1:2,]
rownames(new_observation) <- c("id1", "id2")

# make a studio for the model
modelStudio(explainer, new_observation)

xgboost dashboard

Make a studio for the classification xgboost model on the titanic data.

code
# load packages and data
library(xgboost)
library(DALEX)
library(modelStudio)

data <- DALEX::titanic_imputed

# split the data
index <- sample(1:nrow(data), 0.7*nrow(data))
train <- data[index,]
test <- data[-index,]

train_matrix <- model.matrix(survived ~.-1, train)
test_matrix <- model.matrix(survived ~.-1, test)

# fit a model
xgb_matrix <- xgb.DMatrix(train_matrix, label = train$survived)
params <- list(max_depth = 3, objective = "binary:logistic", eval_metric = "auc")
model <- xgb.train(params, xgb_matrix, nrounds = 500)

# create an explainer for the model
explainer <- explain(model,
                     data = test_matrix,
                     y = test$survived,
                     type = "classification",
                     label = "xgboost")

# pick observations
new_observation <- test_matrix[1:2, , drop=FALSE]
rownames(new_observation) <- c("id1", "id2")

# make a studio for the model
modelStudio(explainer, new_observation)

The modelStudio() function uses dalex explainers created with dalex.Explainer().

:: package for the Explainer object
pip install dalex -U

Use pickle Python module and reticulate R package to easily make a studio for a model.

# package for pickle load
install.packages("reticulate")

scikit-learn dashboard

Make a studio for the regression Pipeline SVR model on the fifa data.

code

First, use dalex in Python:

# load packages and data
import dalex as dx
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from numpy import log

data = dx.datasets.load_fifa()
X = data.drop(columns=['overall', 'potential', 'value_eur', 'wage_eur', 'nationality'], axis=1)
y = log(data.value_eur)

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y)

# fit a pipeline model
model = Pipeline([('scale', StandardScaler()), ('svm', SVR())])
model.fit(X_train, y_train)

# create an explainer for the model
explainer = dx.Explainer(model, data=X_test, y=y_test, label='scikit-learn')

# pack the explainer into a pickle file
explainer.dump(open('explainer_scikitlearn.pickle', 'wb'))

Then, use modelStudio in R:

# load the explainer from the pickle file
library(reticulate)
explainer <- py_load_object("explainer_scikitlearn.pickle", pickle = "pickle")

# make a studio for the model
library(modelStudio)
modelStudio(explainer, B = 5)

lightgbm dashboard

Make a studio for the classification Pipeline LGBMClassifier model on the titanic data.

code

First, use dalex in Python:

# load packages and data
import dalex as dx
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from lightgbm import LGBMClassifier

data = dx.datasets.load_titanic()
X = data.drop(columns='survived')
y = data.survived

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y)

# fit a pipeline model
numerical_features = ['age', 'fare', 'sibsp', 'parch']
numerical_transformer = Pipeline(
  steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
  ]
)
categorical_features = ['gender', 'class', 'embarked']
categorical_transformer = Pipeline(
  steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
  ]
)

preprocessor = ColumnTransformer(
  transformers=[
    ('num', numerical_transformer, numerical_features),
    ('cat', categorical_transformer, categorical_features)
  ]
)

classifier = LGBMClassifier(n_estimators=300)

model = Pipeline(
  steps=[
    ('preprocessor', preprocessor),
    ('classifier', classifier)
  ]
)
model.fit(X_train, y_train)

# create an explainer for the model
explainer = dx.Explainer(model, data=X_test, y=y_test, label='lightgbm')

# pack the explainer into a pickle file
explainer.dump(open('explainer_lightgbm.pickle', 'wb')) 

Then, use modelStudio in R:

# load the explainer from the pickle file
library(reticulate)
explainer <- py_load_object("explainer_lightgbm.pickle", pickle = "pickle")

# make a studio for the model
library(modelStudio)
modelStudio(explainer)

Save & share

Save modelStudio as a HTML file using buttons on the top of the RStudio Viewer or with r2d3::save_d3_html().

Citations

If you use modelStudio, please cite our JOSS article:

@article{baniecki2019modelstudio,
  title   = {{modelStudio: Interactive Studio with Explanations for ML Predictive Models}},
  author  = {Hubert Baniecki and Przemyslaw Biecek},
  journal = {Journal of Open Source Software},
  year    = {2019},
  volume  = {4},
  number  = {43},
  pages   = {1798},
  url     = {https://doi.org/10.21105/joss.01798}
}

For a description and evaluation of the Interactive EMA process, refer to our DAMI article:

@article{baniecki2023grammar,
  title   = {The grammar of interactive explanatory model analysis},
  author  = {Hubert Baniecki and Dariusz Parzych and Przemyslaw Biecek},
  journal = {Data Mining and Knowledge Discovery},
  year    = {2023},
  pages   = {1--37},
  url     = {https://doi.org/10.1007/s10618-023-00924-w}
}

More resources

Acknowledgments

Work on this package was financially supported by the National Science Centre (Poland) grant 2016/21/B/ST6/02176 and National Centre for Research and Development grant POIR.01.01.01-00-0328/17.

More Repositories

1

DALEX

moDel Agnostic Language for Exploration and eXplanation
Python
1,318
star
2

DrWhy

DrWhy is the collection of tools for eXplainable AI (XAI). It's based on shared principles and simple grammar for exploration, explanation and visualisation of predictive models.
R
670
star
3

randomForestExplainer

A set of tools to understand what is happening inside a Random Forest
R
226
star
4

modelDown

modelDown generates a website with HTML summaries for predictive models
R
119
star
5

forester

Trees are all you need
HTML
107
star
6

survex

Explainable Machine Learning in Survival Analysis
R
89
star
7

fairmodels

Flexible tool for bias detection, visualization, and mitigation
R
82
star
8

iBreakDown

Break Down with interactions for local explanations (SHAP, BreakDown, iBreakDown)
R
79
star
9

treeshap

Compute SHAP values for your tree-based models using the TreeSHAP algorithm
R
75
star
10

shapviz

R package for SHAP plots
R
63
star
11

DALEXtra

Extensions for the DALEX package
R
62
star
12

auditor

Model verification, validation, and error analysis
R
58
star
13

shapper

An R wrapper of SHAP python library
R
58
star
14

ingredients

Effects and Importances of Model Ingredients
R
37
star
15

live

Local Interpretable (Model-agnostic) Visual Explanations - model visualization for regression problems and tabular data based on LIME method. Available on CRAN
R
34
star
16

SAFE

Surrogate Assisted Feature Extraction
Python
33
star
17

DALEX-docs

Documentation for the DALEX project
Jupyter Notebook
33
star
18

kernelshap

Efficient R implementation of SHAP
R
30
star
19

ArenaR

Data generator for Arena - interactive XAI dashboard
R
29
star
20

rSAFE

Surrogate Assisted Feature Extraction in R
R
28
star
21

EIX

Structure mining for xgboost model
R
25
star
22

factorMerger

Set of tools to support results from post hoc testing
R
24
star
23

EMMA

Evaluation of Methods for dealing with Missing data in Machine Learning algorithms
HTML
23
star
24

xspliner

Explain black box with GLM
R
23
star
25

EloML

R package EloML: Elo rating system for machine learning models
R
23
star
26

Arena

Interactive XAI dashboard
Vue
22
star
27

MAIR

Monitoring of AI Regulations
HTML
19
star
28

pyCeterisParibus

Python library for Ceteris Paribus Plots (What-if plots)
Python
19
star
29

xai2shiny

Create Shiny application with model exploration from explainers
R
19
star
30

drifter

Concept Drift and Concept Shift Detection for Predictive Models
R
18
star
31

localModel

LIME-like explanations with interpretable features based on Ceteris Paribus curves. Now on CRAN.
R
14
star
32

vivo

Variable importance via oscillations
R
14
star
33

corrgrapher

Visualize correlations between variables
R
13
star
34

metaMIMIC

Jupyter Notebook
10
star
35

EvidenceBasedML

Evidence-Based Machine Learning
9
star
36

weles

Python
9
star
37

triplot

Triplot: Instance- and data-level explanations for the groups of correlated features.
R
9
star
38

xai2cloud

Create web API from model explainers
R
8
star
39

FairPAN

R
7
star
40

AI-strategies-papers-regulations-monitoring

Monitoring of AI strategies, papers, and regulations
Jupyter Notebook
7
star
41

xaibot

XAI chat bot for Titanic model - created with plumber
JavaScript
7
star
42

piBreakDown

python version of iBreakDown
Python
4
star
43

RME

Recurrent Memory Explainer
Python
3
star
44

mogger

Logger for Predictive Models
Java
2
star
45

ceterisParibus2

Very experimental version of the ceterisParibus package.
Jupyter Notebook
2
star
46

DrWhyTemplate

CSS
2
star
47

shimex

R Package for Exploring Models with Shiny App
R
2
star
48

DALEX2

Explain! Package with core wrappers for DrWhy universe.
R
2
star
49

ModelDevelopmentProcess

Source codes for Model Development Process plots
HTML
1
star
50

Hex4DrWhy

Shiny app for logo prototyping
R
1
star