• Stars
    star
    101
  • Rank 323,913 (Top 7 %)
  • Language
    R
  • License
    Other
  • Created over 6 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Bayesian comparisons of models using resampled statistics

tidyposterior

R-CMD-check Codecov test coverage CRAN_Status_Badge Downloads

This package can be used to conduct post hoc analyses of resampling results generated by models.

For example, if two models are evaluated with the root mean squared error (RMSE) using 10-fold cross-validation, there are 10 paired statistics. These can be used to make comparisons between models without involving a test set.

There is a rich literature on the analysis of model resampling results such as McLachlan’s Discriminant Analysis and Statistical Pattern Recognition and the references therein. This package follows the spirit of Benavoli et al (2017).

tidyposterior uses Bayesian generalized linear models for this purpose and can be considered an upgraded version of the caret::resamples() function. The package works with rsample objects natively but any results in a data frame can be used.

See Chapter 11 of Tidy Models with R for examples and more details.

Installation

You can install the released version of tidyposterior from CRAN with:

install.packages("tidyposterior")

And the development version from GitHub with:

# install.packages("pak")
pak::pak("tidymodels/tidyposterior")

Example

To illustrate, here are some example objects using 10-fold cross-validation for a simple two-class problem:

library(tidymodels)
#> ── Attaching packages ────────────────────────────────────── tidymodels 1.1.0 ──
#> βœ” broom        1.0.5          βœ” recipes      1.0.8     
#> βœ” dials        1.2.0          βœ” rsample      1.2.0     
#> βœ” dplyr        1.1.2          βœ” tibble       3.2.1     
#> βœ” ggplot2      3.4.3          βœ” tidyr        1.3.0     
#> βœ” infer        1.0.4          βœ” tune         1.1.2     
#> βœ” modeldata    1.2.0          βœ” workflows    1.1.3     
#> βœ” parsnip      1.1.1.9000     βœ” workflowsets 1.0.1     
#> βœ” purrr        1.0.2          βœ” yardstick    1.2.0
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> βœ– purrr::discard() masks scales::discard()
#> βœ– dplyr::filter()  masks stats::filter()
#> βœ– dplyr::lag()     masks stats::lag()
#> βœ– recipes::step()  masks stats::step()
#> β€’ Search for functions across packages at https://www.tidymodels.org/find/
library(tidyposterior)

data(two_class_dat, package = "modeldata")

set.seed(100)
folds <- vfold_cv(two_class_dat)

We can define two different models (for simplicity, with no tuning parameters).

logistic_reg_glm_spec <-
  logistic_reg() %>%
  set_engine('glm')

mars_earth_spec <-
  mars(prod_degree = 1) %>%
  set_engine('earth') %>%
  set_mode('classification')

For tidymodels, the [tune::fit_resamples()] function can be used to estimate performance for each model/resample:

rs_ctrl <- control_resamples(save_workflow = TRUE)

logistic_reg_glm_res <- 
  logistic_reg_glm_spec %>% 
  fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl)

mars_earth_res <- 
  mars_earth_spec %>% 
  fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl)

From these, there are several ways to pass the results to the perf_mod() function. The most general approach is to have a data frame with the resampling labels (i.e., one or more id columns) as well as columns for each model that you would like to compare.

For the model results above, [tune::collect_metrics()] can be used along with some basic data manipulation steps:

logistic_roc <- 
  collect_metrics(logistic_reg_glm_res, summarize = FALSE) %>% 
  dplyr::filter(.metric == "roc_auc") %>% 
  dplyr::select(id, logistic = .estimate)

mars_roc <- 
  collect_metrics(mars_earth_res, summarize = FALSE) %>% 
  dplyr::filter(.metric == "roc_auc") %>% 
  dplyr::select(id, mars = .estimate)

resamples_df <- full_join(logistic_roc, mars_roc, by = "id")
resamples_df
#> # A tibble: 10 Γ— 3
#>    id     logistic  mars
#>    <chr>     <dbl> <dbl>
#>  1 Fold01    0.856 0.845
#>  2 Fold02    0.933 0.951
#>  3 Fold03    0.934 0.937
#>  4 Fold04    0.864 0.858
#>  5 Fold05    0.847 0.854
#>  6 Fold06    0.911 0.840
#>  7 Fold07    0.867 0.858
#>  8 Fold08    0.886 0.876
#>  9 Fold09    0.898 0.898
#> 10 Fold10    0.906 0.894

We can then give this directly to perf_mod():

set.seed(101)
roc_model_via_df <- perf_mod(resamples_df, iter = 2000)
#> 
#> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 0.000216 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.16 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.334 seconds (Warm-up)
#> Chain 1:                0.099 seconds (Sampling)
#> Chain 1:                0.433 seconds (Total)
#> Chain 1: 
#> 
#> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
#> Chain 2: 
#> Chain 2: Gradient evaluation took 5e-06 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2: 
#> Chain 2: 
#> Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 2: 
#> Chain 2:  Elapsed Time: 0.333 seconds (Warm-up)
#> Chain 2:                0.138 seconds (Sampling)
#> Chain 2:                0.471 seconds (Total)
#> Chain 2: 
#> 
#> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
#> Chain 3: 
#> Chain 3: Gradient evaluation took 6e-06 seconds
#> Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
#> Chain 3: Adjust your expectations accordingly!
#> Chain 3: 
#> Chain 3: 
#> Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 3: 
#> Chain 3:  Elapsed Time: 0.323 seconds (Warm-up)
#> Chain 3:                0.098 seconds (Sampling)
#> Chain 3:                0.421 seconds (Total)
#> Chain 3: 
#> 
#> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
#> Chain 4: 
#> Chain 4: Gradient evaluation took 4e-06 seconds
#> Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
#> Chain 4: Adjust your expectations accordingly!
#> Chain 4: 
#> Chain 4: 
#> Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 4: 
#> Chain 4:  Elapsed Time: 0.324 seconds (Warm-up)
#> Chain 4:                0.098 seconds (Sampling)
#> Chain 4:                0.422 seconds (Total)
#> Chain 4:

From this, the posterior distributions for each model can be obtained from the tidy() method:

roc_model_via_df %>% 
  tidy() %>% 
  ggplot(aes(x = posterior)) + 
  geom_histogram(bins = 40, col = "blue", fill = "blue", alpha = .4) + 
  facet_wrap(~ model, ncol = 1) + 
  xlab("Area Under the ROC Curve")

Faceted histogram chart. Area Under the ROC Curve along the x-axis, count along the y-axis. The two facets are logistic and mars. Both histogram looks fairly normally distributed, with a mean of 0.89 for logistic and 0.88 for mars. The full range is 0.84 to 0.93.

See contrast_models() for how to analyze these distributions

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

More Repositories

1

broom

Convert statistical analysis objects from R into tidy format
R
1,402
star
2

tidymodels

Easily install and load the tidymodels packages
R
727
star
3

infer

An R package for tidyverse-friendly statistical inference
R
702
star
4

corrr

Explore correlations in R
R
580
star
5

TMwR

Code and content for "Tidy Modeling with R"
RMarkdown
552
star
6

parsnip

A tidy unified interface to models
R
550
star
7

recipes

Pipeable steps for feature engineering and data preprocessing to prepare for modeling
R
534
star
8

yardstick

Tidy methods for measuring model performance
R
354
star
9

rsample

Classes and functions to create and summarize resampling objects
R
318
star
10

stacks

An R package for tidy stacked ensemble modeling
R
282
star
11

tidypredict

Run predictions inside the database
R
256
star
12

tune

Tools for tidy parameter tuning
R
248
star
13

workflows

Modeling Workflows
R
193
star
14

textrecipes

Extra recipes for Text Processing
R
154
star
15

embed

Extra recipes for predictor embeddings
R
140
star
16

themis

Extra recipes steps for dealing with unbalanced data
R
138
star
17

butcher

Reduce the size of model objects saved to disk
R
130
star
18

censored

Parsnip wrappers for survival models
R
122
star
19

dials

Tools for creating tuning parameter values
R
110
star
20

probably

Tools for post-processing class probability estimates
R
108
star
21

tidyclust

A tidy unified interface to clustering models
R
103
star
22

tidymodels.org-legacy

Legacy Source of tidymodels.org
HTML
100
star
23

aml-training

The most recent version of the Applied Machine Learning notes
HTML
100
star
24

hardhat

Construct Modeling Packages
R
99
star
25

workflowsets

Create a collection of modeling workflows
R
88
star
26

usemodels

Boilerplate Code for tidymodels
R
85
star
27

modeldb

Run models inside a database using R
R
79
star
28

workshops

Website and materials for tidymodels workshops
JavaScript
73
star
29

multilevelmod

Parsnip wrappers for mixed-level and hierarchical models
R
72
star
30

spatialsample

Create and summarize spatial resampling objects πŸ—Ί
R
69
star
31

learntidymodels

Learn tidymodels with interactive learnr primers
R
64
star
32

brulee

High-Level Modeling Functions with 'torch'
R
61
star
33

finetune

Additional functions for model tuning
R
61
star
34

shinymodels

R
44
star
35

applicable

Quantify extrapolation of new samples given a training set
R
43
star
36

model-implementation-principles

recommendations for creating R modeling packages
HTML
40
star
37

bonsai

parsnip wrappers for tree-based models
R
40
star
38

rules

parsnip extension for rule-based models
R
39
star
39

planning

Documents to plan and discuss future development
36
star
40

discrim

Wrappers for discriminant analysis and naive Bayes models for use with the parsnip package
R
28
star
41

baguette

parsnip Model Functions for Bagging
R
23
star
42

modeldata

Data Sets Used by tidymodels Packages
R
22
star
43

poissonreg

parsnip wrappers for Poisson regression
R
22
star
44

extratests

Integration and other testing for tidymodels
R
20
star
45

agua

Create and evaluate models using 'tidymodels' and 'h2o'
R
20
star
46

plsmod

Model Wrappers for Projection Methods
R
14
star
47

tidymodels.org

Source of tidymodels.org
JavaScript
13
star
48

cloudstart

RStudio Cloud ☁️ resources to accompany tidymodels.org
12
star
49

desirability2

Desirability Functions for Multiparameter Optimization
R
7
star
50

modeldatatoo

More Data Sets Useful for Modeling Examples
R
5
star
51

.github

GitHub contributing guidelines for tidymodels packages
4
star
52

modelenv

Provide Tools to Register Models for use in Tidymodels
R
3
star
53

survivalauc

What the Package Does (One Line, Title Case)
C
2
star