• Stars
    star
    558
  • Rank 77,125 (Top 2 %)
  • Language
    R
  • License
    Other
  • Created over 6 years ago
  • Updated 5 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tidy unified interface to models

parsnip a drawing of a parsnip on a beige background

R-CMD-check Codecov test coverage CRAN status Downloads lifecycle

Introduction

The goal of parsnip is to provide a tidy, unified interface to models that can be used to try a range of models without getting bogged down in the syntactical minutiae of the underlying packages.

Installation

# The easiest way to get parsnip is to install all of tidymodels:
install.packages("tidymodels")

# Alternatively, install just parsnip:
install.packages("parsnip")

# Or the development version from GitHub:
# install.packages("pak")
pak::pak("tidymodels/parsnip")

Getting started

One challenge with different modeling functions available in R that do the same thing is that they can have different interfaces and arguments. For example, to fit a random forest regression model, we might have:

# From randomForest
rf_1 <- randomForest(
  y ~ ., 
  data = dat, 
  mtry = 10, 
  ntree = 2000, 
  importance = TRUE
)

# From ranger
rf_2 <- ranger(
  y ~ ., 
  data = dat, 
  mtry = 10, 
  num.trees = 2000, 
  importance = "impurity"
)

# From sparklyr
rf_3 <- ml_random_forest(
  dat, 
  intercept = FALSE, 
  response = "y", 
  features = names(dat)[names(dat) != "y"], 
  col.sample.rate = 10,
  num.trees = 2000
)

Note that the model syntax can be very different and that the argument names (and formats) are also different. This is a pain if you switch between implementations.

In this example:

  • the type of model is “random forest”,
  • the mode of the model is “regression” (as opposed to classification, etc), and
  • the computational engine is the name of the R package.

The goals of parsnip are to:

  • Separate the definition of a model from its evaluation.
  • Decouple the model specification from the implementation (whether the implementation is in R, spark, or something else). For example, the user would call rand_forest instead of ranger::ranger or other specific packages.
  • Harmonize argument names (e.g. n.trees, ntrees, trees) so that users only need to remember a single name. This will help across model types too so that trees will be the same argument across random forest as well as boosting or bagging.

Using the example above, the parsnip approach would be:

library(parsnip)

rand_forest(mtry = 10, trees = 2000) %>%
  set_engine("ranger", importance = "impurity") %>%
  set_mode("regression")
#> Random Forest Model Specification (regression)
#> 
#> Main Arguments:
#>   mtry = 10
#>   trees = 2000
#> 
#> Engine-Specific Arguments:
#>   importance = impurity
#> 
#> Computational engine: ranger

The engine can be easily changed. To use Spark, the change is straightforward:

rand_forest(mtry = 10, trees = 2000) %>%
  set_engine("spark") %>%
  set_mode("regression")
#> Random Forest Model Specification (regression)
#> 
#> Main Arguments:
#>   mtry = 10
#>   trees = 2000
#> 
#> Computational engine: spark

Either one of these model specifications can be fit in the same way:

set.seed(192)
rand_forest(mtry = 10, trees = 2000) %>%
  set_engine("ranger", importance = "impurity") %>%
  set_mode("regression") %>%
  fit(mpg ~ ., data = mtcars)
#> parsnip model object
#> 
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(x = maybe_data_frame(x), y = y, mtry = min_cols(~10,      x), num.trees = ~2000, importance = ~"impurity", num.threads = 1,      verbose = FALSE, seed = sample.int(10^5, 1)) 
#> 
#> Type:                             Regression 
#> Number of trees:                  2000 
#> Sample size:                      32 
#> Number of independent variables:  10 
#> Mtry:                             10 
#> Target node size:                 5 
#> Variable importance mode:         impurity 
#> Splitrule:                        variance 
#> OOB prediction error (MSE):       5.976917 
#> R squared (OOB):                  0.8354559

A list of all parsnip models across different CRAN packages can be found at https://www.tidymodels.org/find/parsnip.

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

More Repositories

1

broom

Convert statistical analysis objects from R into tidy format
R
1,410
star
2

tidymodels

Easily install and load the tidymodels packages
R
738
star
3

infer

An R package for tidyverse-friendly statistical inference
R
704
star
4

corrr

Explore correlations in R
R
585
star
5

TMwR

Code and content for "Tidy Modeling with R"
RMarkdown
560
star
6

recipes

Pipeable steps for feature engineering and data preprocessing to prepare for modeling
R
543
star
7

yardstick

Tidy methods for measuring model performance
R
363
star
8

rsample

Classes and functions to create and summarize resampling objects
R
323
star
9

stacks

An R package for tidy stacked ensemble modeling
R
284
star
10

tune

Tools for tidy parameter tuning
R
260
star
11

tidypredict

Run predictions inside the database
R
259
star
12

workflows

Modeling Workflows
R
196
star
13

textrecipes

Extra recipes for Text Processing
R
157
star
14

embed

Extra recipes for predictor embeddings
R
140
star
15

themis

Extra recipes steps for dealing with unbalanced data
R
139
star
16

butcher

Reduce the size of model objects saved to disk
R
129
star
17

censored

Parsnip wrappers for survival models
R
123
star
18

dials

Tools for creating tuning parameter values
R
111
star
19

probably

Tools for post-processing class probability estimates
R
110
star
20

tidyclust

A tidy unified interface to clustering models
R
104
star
21

tidyposterior

Bayesian comparisons of models using resampled statistics
R
101
star
22

hardhat

Construct Modeling Packages
R
100
star
23

tidymodels.org-legacy

Legacy Source of tidymodels.org
HTML
100
star
24

aml-training

The most recent version of the Applied Machine Learning notes
HTML
100
star
25

workflowsets

Create a collection of modeling workflows
R
88
star
26

usemodels

Boilerplate Code for tidymodels
R
85
star
27

modeldb

Run models inside a database using R
R
80
star
28

workshops

Website and materials for tidymodels workshops
JavaScript
76
star
29

multilevelmod

Parsnip wrappers for mixed-level and hierarchical models
R
73
star
30

spatialsample

Create and summarize spatial resampling objects 🗺
R
69
star
31

learntidymodels

Learn tidymodels with interactive learnr primers
R
66
star
32

brulee

High-Level Modeling Functions with 'torch'
R
63
star
33

finetune

Additional functions for model tuning
R
61
star
34

shinymodels

R
46
star
35

applicable

Quantify extrapolation of new samples given a training set
R
44
star
36

model-implementation-principles

recommendations for creating R modeling packages
HTML
41
star
37

bonsai

parsnip wrappers for tree-based models
R
41
star
38

rules

parsnip extension for rule-based models
R
39
star
39

planning

Documents to plan and discuss future development
37
star
40

discrim

Wrappers for discriminant analysis and naive Bayes models for use with the parsnip package
R
28
star
41

baguette

parsnip Model Functions for Bagging
R
23
star
42

modeldata

Data Sets Used by tidymodels Packages
R
22
star
43

poissonreg

parsnip wrappers for Poisson regression
R
22
star
44

agua

Create and evaluate models using 'tidymodels' and 'h2o'
R
21
star
45

extratests

Integration and other testing for tidymodels
R
20
star
46

tidymodels.org

Source of tidymodels.org
JavaScript
16
star
47

plsmod

Model Wrappers for Projection Methods
R
14
star
48

cloudstart

RStudio Cloud ☁️ resources to accompany tidymodels.org
12
star
49

modeldatatoo

More Data Sets Useful for Modeling Examples
R
7
star
50

desirability2

Desirability Functions for Multiparameter Optimization
R
7
star
51

.github

GitHub contributing guidelines for tidymodels packages
4
star
52

modelenv

Provide Tools to Register Models for use in Tidymodels
R
4
star
53

tailor

Sandbox for a postprocessor object.
R
2
star
54

survivalauc

What the Package Does (One Line, Title Case)
C
2
star