• Stars
    star
    1,445
  • Rank 32,555 (Top 0.7 %)
  • Language
    R
  • License
    Other
  • Created about 10 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Convert statistical analysis objects from R into tidy format

broom

R build status CRAN status Downloads

Overview

broom summarizes key information about models in tidy tibble()s. broom provides three verbs to make it convenient to interact with model objects:

  • tidy() summarizes information about model components
  • glance() reports information about the entire model
  • augment() adds informations about observations to a dataset

For a detailed introduction, please see vignette("broom").

broom tidies 100+ models from popular modelling packages and almost all of the model objects in the stats package that comes with base R. vignette("available-methods") lists method availability.

If you arenโ€™t familiar with tidy data structures and want to know how they can make your life easier, we highly recommend reading Hadley Wickhamโ€™s Tidy Data.

Installation

# we recommend installing the entire tidyverse 
# modeling set, which includes broom:
install.packages("tidymodels")

# alternatively, to install just broom:
install.packages("broom")

# to get the development version from GitHub:
install.packages("pak")
pak::pak("tidymodels/broom")

If you find a bug, please file a minimal reproducible example in the issues.

Usage

tidy() produces a tibble() where each row contains information about an important component of the model. For regression models, this often corresponds to regression coefficients. This is can be useful if you want to inspect a model or create custom visualizations.

library(broom)

fit <- lm(Volume ~ Girth + Height, trees)
tidy(fit)
#> # A tibble: 3 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)  -58.0       8.64      -6.71 2.75e- 7
#> 2 Girth          4.71      0.264     17.8  8.22e-17
#> 3 Height         0.339     0.130      2.61 1.45e- 2

glance() returns a tibble with exactly one row of goodness of fitness measures and related statistics. This is useful to check for model misspecification and to compare many models.

glance(fit)
#> # A tibble: 1 x 12
#>   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
#>       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
#> 1     0.948         0.944  3.88      255. 1.07e-18     2  -84.5  177.  183.
#> # โ€ฆ with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

augment adds columns to a dataset, containing information such as fitted values, residuals or cluster assignments. All columns added to a dataset have . prefix to prevent existing columns from being overwritten.

augment(fit, data = trees)
#> # A tibble: 31 x 9
#>    Girth Height Volume .fitted .resid .std.resid   .hat .sigma   .cooksd
#>    <dbl>  <dbl>  <dbl>   <dbl>  <dbl>      <dbl>  <dbl>  <dbl>     <dbl>
#>  1   8.3     70   10.3    4.84  5.46      1.50   0.116    3.79 0.0978   
#>  2   8.6     65   10.3    4.55  5.75      1.60   0.147    3.77 0.148    
#>  3   8.8     63   10.2    4.82  5.38      1.53   0.177    3.78 0.167    
#>  4  10.5     72   16.4   15.9   0.526     0.140  0.0592   3.95 0.000409 
#>  5  10.7     81   18.8   19.9  -1.07     -0.294  0.121    3.95 0.00394  
#>  6  10.8     83   19.7   21.0  -1.32     -0.370  0.156    3.94 0.00840  
#>  7  11       66   15.6   16.2  -0.593    -0.162  0.115    3.95 0.00114  
#>  8  11       75   18.2   19.2  -1.05     -0.277  0.0515   3.95 0.00138  
#>  9  11.1     80   22.6   21.4   1.19      0.321  0.0920   3.95 0.00348  
#> 10  11.2     75   19.9   20.2  -0.288    -0.0759 0.0480   3.95 0.0000968
#> # โ€ฆ with 21 more rows

Contributing

We welcome contributions of all types!

For questions and discussions about tidymodels packages, modeling, and machine learning, please post on Posit Community. If you think you have encountered a bug, please submit an issue. Either way, learn how to create and share a reprex (a minimal, reproducible example), to clearly communicate about your code. Check out further details on contributing guidelines for tidymodels packages and how to get help.

If you have never directly contributed to an R package before, broom is an excellent place to start. Find an issue with the Beginner Friendly tag and comment that youโ€™d like to take it on and weโ€™ll help you get started.

Generally, too, we encourage typo corrections, bug reports, bug fixes and feature requests. Feedback on the clarity of the documentation is especially valuable!

If you are interested in adding tidier methods for new model objects, please read this article on the tidymodels website.

We have a Contributor Code of Conduct. By participating in broom you agree to abide by its terms.

More Repositories

1

tidymodels

Easily install and load the tidymodels packages
R
747
star
2

infer

An R package for tidyverse-friendly statistical inference
R
723
star
3

parsnip

A tidy unified interface to models
R
595
star
4

corrr

Explore correlations in R
R
588
star
5

TMwR

Code and content for "Tidy Modeling with R"
RMarkdown
585
star
6

recipes

Pipeable steps for feature engineering and data preprocessing to prepare for modeling
R
558
star
7

yardstick

Tidy methods for measuring model performance
R
367
star
8

rsample

Classes and functions to create and summarize resampling objects
R
335
star
9

stacks

An R package for tidy stacked ensemble modeling
R
294
star
10

tune

Tools for tidy parameter tuning
R
268
star
11

tidypredict

Run predictions inside the database
R
259
star
12

workflows

Modeling Workflows
R
201
star
13

textrecipes

Extra recipes for Text Processing
R
159
star
14

themis

Extra recipes steps for dealing with unbalanced data
R
141
star
15

embed

Extra recipes for predictor embeddings
R
141
star
16

butcher

Reduce the size of model objects saved to disk
R
130
star
17

censored

Parsnip wrappers for survival models
R
123
star
18

probably

Tools for post-processing class probability estimates
R
113
star
19

dials

Tools for creating tuning parameter values
R
111
star
20

tidyclust

A tidy unified interface to clustering models
R
107
star
21

tidyposterior

Bayesian comparisons of models using resampled statistics
R
102
star
22

hardhat

Construct Modeling Packages
R
100
star
23

aml-training

The most recent version of the Applied Machine Learning notes
HTML
100
star
24

tidymodels.org-legacy

Legacy Source of tidymodels.org
HTML
99
star
25

workshops

Website and materials for tidymodels workshops
JavaScript
92
star
26

workflowsets

Create a collection of modeling workflows
R
92
star
27

usemodels

Boilerplate Code for tidymodels
R
85
star
28

modeldb

Run models inside a database using R
R
80
star
29

multilevelmod

Parsnip wrappers for mixed-level and hierarchical models
R
74
star
30

spatialsample

Create and summarize spatial resampling objects ๐Ÿ—บ
R
70
star
31

learntidymodels

Learn tidymodels with interactive learnr primers
R
68
star
32

brulee

High-Level Modeling Functions with 'torch'
R
67
star
33

finetune

Additional functions for model tuning
R
62
star
34

bonsai

parsnip wrappers for tree-based models
R
51
star
35

shinymodels

R
46
star
36

applicable

Quantify extrapolation of new samples given a training set
R
46
star
37

model-implementation-principles

recommendations for creating R modeling packages
HTML
41
star
38

rules

parsnip extension for rule-based models
R
40
star
39

planning

Documents to plan and discuss future development
37
star
40

discrim

Wrappers for discriminant analysis and naive Bayes models for use with the parsnip package
R
28
star
41

baguette

parsnip Model Functions for Bagging
R
24
star
42

modeldata

Data Sets Used by tidymodels Packages
R
22
star
43

poissonreg

parsnip wrappers for Poisson regression
R
22
star
44

agua

Create and evaluate models using 'tidymodels' and 'h2o'
R
21
star
45

extratests

Integration and other testing for tidymodels
R
20
star
46

tidymodels.org

Source of tidymodels.org
JavaScript
19
star
47

plsmod

Model Wrappers for Projection Methods
R
14
star
48

cloudstart

RStudio Cloud โ˜๏ธ resources to accompany tidymodels.org
12
star
49

orbital

Turn Tidymodels Workflows Into Series of Equations
R
12
star
50

desirability2

Desirability Functions for Multiparameter Optimization
R
10
star
51

modeldatatoo

More Data Sets Useful for Modeling Examples
R
7
star
52

.github

GitHub contributing guidelines for tidymodels packages
4
star
53

modelenv

Provide Tools to Register Models for use in Tidymodels
R
4
star
54

tailor

Sandbox for a postprocessor object.
R
2
star
55

survivalauc

What the Package Does (One Line, Title Case)
C
2
star