• Stars
    star
    256
  • Rank 152,303 (Top 4 %)
  • Language
    R
  • License
    Other
  • Created over 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Run predictions inside the database

tidypredict

R-CMD-check CRAN_Status_Badge Codecov test coverage

The main goal of tidypredict is to enable running predictions inside databases. It reads the model, extracts the components needed to calculate the prediction, and then creates an R formula that can be translated into SQL. In other words, it is able to parse a model such as this one:

model <- lm(mpg ~ wt + cyl, data = mtcars)

tidypredict can return a SQL statement that is ready to run inside the database. Because it uses dplyr’s database interface, it works with several databases back-ends, such as MS SQL:

tidypredict_sql(model, dbplyr::simulate_mssql())
## <SQL> (39.6862614802529 + (`wt` * -3.19097213898375)) + (`cyl` * -1.5077949682598)

Installation

Install tidypredict from CRAN using:

install.packages("tidypredict")

Or install the development version using devtools as follows:

install.packages("remotes")
remotes::install_github("tidymodels/tidypredict")

Functions

tidypredict has only a few functions, and it is not expected that number to grow much. The main focus at this time is to add more models to support.

Function Description
tidypredict_fit() Returns an R formula that calculates the prediction
tidypredict_sql() Returns a SQL query based on the formula from tidypredict_fit()
tidypredict_to_column() Adds a new column using the formula from tidypredict_fit()
tidypredict_test() Tests tidyverse predictions against the model’s native predict() function
tidypredict_interval() Same as tidypredict_fit() but for intervals (only works with lm and glm)
tidypredict_sql_interval() Same as tidypredict_sql() but for intervals (only works with lm and glm)
parse_model() Creates a list spec based on the R model
as_parsed_model() Prepares an object to be recognized as a parsed model

How it works

Instead of translating directly to a SQL statement, tidypredict creates an R formula. That formula can then be used inside dplyr. The overall workflow would be as illustrated in the image above, and described here:

  1. Fit the model using a base R model, or one from the packages listed in Supported Models
  2. tidypredict reads model, and creates a list object with the necessary components to run predictions
  3. tidypredict builds an R formula based on the list object
  4. dplyr evaluates the formula created by tidypredict
  5. dplyr translates the formula into a SQL statement, or any other interfaces.
  6. The database executes the SQL statement(s) created by dplyr

Parsed model spec

tidypredict writes and reads a spec based on a model. Instead of simply writing the R formula directly, splitting the spec from the formula adds the following capabilities:

  1. No more saving models as .rds - Specifically for cases when the model needs to be used for predictions in a Shiny app.
  2. Beyond R models - Technically, anything that can write a proper spec, can be read into tidypredict. It also means, that the parsed model spec can become a good alternative to using PMML.

Supported models

The following models are supported by tidypredict:

  • Linear Regression - lm()
  • Generalized Linear model - glm()
  • Random Forest models - randomForest::randomForest()
  • Random Forest models, via ranger - ranger::ranger()
  • MARS models - earth::earth()
  • XGBoost models - xgboost::xgb.Booster.complete()
  • Cubist models - Cubist::cubist()
  • Tree models, via partykit - partykit::ctree()

parsnip

tidypredict supports models fitted via the parsnip interface. The ones confirmed currently work in tidypredict are:

  • lm() - parsnip: linear_reg() with “lm” as the engine.
  • randomForest::randomForest() - parsnip: rand_forest() with “randomForest” as the engine.
  • ranger::ranger() - parsnip: rand_forest() with “ranger” as the engine.
  • earth::earth() - parsnip: mars() with “earth” as the engine.

broom

The tidy() function from broom works with linear models parsed via tidypredict

pm <- parse_model(lm(wt ~ ., mtcars))
tidy(pm)
## # A tibble: 11 × 2
##    term        estimate
##    <chr>          <dbl>
##  1 (Intercept) -0.231  
##  2 mpg         -0.0417 
##  3 cyl         -0.0573 
##  4 disp         0.00669
##  5 hp          -0.00323
##  6 drat        -0.0901 
##  7 qsec         0.200  
##  8 vs          -0.0664 
##  9 am           0.0184 
## 10 gear        -0.0935 
## 11 carb         0.249

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

More Repositories

1

broom

Convert statistical analysis objects from R into tidy format
R
1,402
star
2

tidymodels

Easily install and load the tidymodels packages
R
727
star
3

infer

An R package for tidyverse-friendly statistical inference
R
702
star
4

corrr

Explore correlations in R
R
580
star
5

TMwR

Code and content for "Tidy Modeling with R"
RMarkdown
552
star
6

parsnip

A tidy unified interface to models
R
550
star
7

recipes

Pipeable steps for feature engineering and data preprocessing to prepare for modeling
R
534
star
8

yardstick

Tidy methods for measuring model performance
R
354
star
9

rsample

Classes and functions to create and summarize resampling objects
R
318
star
10

stacks

An R package for tidy stacked ensemble modeling
R
282
star
11

tune

Tools for tidy parameter tuning
R
248
star
12

workflows

Modeling Workflows
R
193
star
13

textrecipes

Extra recipes for Text Processing
R
154
star
14

embed

Extra recipes for predictor embeddings
R
140
star
15

themis

Extra recipes steps for dealing with unbalanced data
R
138
star
16

butcher

Reduce the size of model objects saved to disk
R
130
star
17

censored

Parsnip wrappers for survival models
R
122
star
18

dials

Tools for creating tuning parameter values
R
110
star
19

probably

Tools for post-processing class probability estimates
R
108
star
20

tidyclust

A tidy unified interface to clustering models
R
103
star
21

tidyposterior

Bayesian comparisons of models using resampled statistics
R
101
star
22

tidymodels.org-legacy

Legacy Source of tidymodels.org
HTML
100
star
23

aml-training

The most recent version of the Applied Machine Learning notes
HTML
100
star
24

hardhat

Construct Modeling Packages
R
99
star
25

workflowsets

Create a collection of modeling workflows
R
88
star
26

usemodels

Boilerplate Code for tidymodels
R
85
star
27

modeldb

Run models inside a database using R
R
79
star
28

workshops

Website and materials for tidymodels workshops
JavaScript
73
star
29

multilevelmod

Parsnip wrappers for mixed-level and hierarchical models
R
72
star
30

spatialsample

Create and summarize spatial resampling objects 🗺
R
69
star
31

learntidymodels

Learn tidymodels with interactive learnr primers
R
64
star
32

brulee

High-Level Modeling Functions with 'torch'
R
61
star
33

finetune

Additional functions for model tuning
R
61
star
34

shinymodels

R
44
star
35

applicable

Quantify extrapolation of new samples given a training set
R
43
star
36

model-implementation-principles

recommendations for creating R modeling packages
HTML
40
star
37

bonsai

parsnip wrappers for tree-based models
R
40
star
38

rules

parsnip extension for rule-based models
R
39
star
39

planning

Documents to plan and discuss future development
36
star
40

discrim

Wrappers for discriminant analysis and naive Bayes models for use with the parsnip package
R
28
star
41

baguette

parsnip Model Functions for Bagging
R
23
star
42

modeldata

Data Sets Used by tidymodels Packages
R
22
star
43

poissonreg

parsnip wrappers for Poisson regression
R
22
star
44

extratests

Integration and other testing for tidymodels
R
20
star
45

agua

Create and evaluate models using 'tidymodels' and 'h2o'
R
20
star
46

plsmod

Model Wrappers for Projection Methods
R
14
star
47

tidymodels.org

Source of tidymodels.org
JavaScript
13
star
48

cloudstart

RStudio Cloud ☁️ resources to accompany tidymodels.org
12
star
49

desirability2

Desirability Functions for Multiparameter Optimization
R
7
star
50

modeldatatoo

More Data Sets Useful for Modeling Examples
R
5
star
51

.github

GitHub contributing guidelines for tidymodels packages
4
star
52

modelenv

Provide Tools to Register Models for use in Tidymodels
R
3
star
53

survivalauc

What the Package Does (One Line, Title Case)
C
2
star