Bootstrapping made easy and tidy with slipper
You've heard of broom for tidying up your R functions. slipper is an R package for tidy/easy bootstrapping. There are already a bunch of good bootstrapping packages out there including bootstrap and boot. You can also bootstrap with dplyr and broom or with purrr and modelr.
But I'm too dumb for any of those. So slipper includes some simple,pipeable bootstrapping functions for me
install
with devtools
:
devtools::install_github('jtleek/slipper')
use
There are only two functions in this package.
Call slipper
to bootstrap any function that returns
a single value.
slipper(mtcars,mean(mpg),B=100)
slipper is built to work with pipes and the tidyverse too.
mtcars %>% slipper(mean(mpg),B=100)
The output is a data frame with the values of the function on the original data set and the bootstrapped replicates. You can calculate confidence intervals using summarize
mtcars %>% slipper(mean(mpg),B=100) %>%
filter(type=="bootstrap") %>%
summarize(ci_low = quantile(value,0.025),
ci_high = quantile(value,0.975))
You can also bootstrap linear models using slipper_lm
just pass the data frame and the formula you want to fit on the original data and on the bootstrap samples.
slipper_lm(mtcars,mpg ~ cyl,B=100)
This is also pipeable
mtcars %>% slipper_lm(mpg ~ cyl,B=100)
The default behavior is to bootstrap complete cases, but if you want to bootstrap residuals set boot_resid=TRUE
mtcars %>% slipper_lm(mpg ~ cyl,B=100,boot_resid=TRUE)
You can calculate bootstrap confidence intervals in the same way as you do for slipper
.
mtcars %>% slipper_lm(mpg ~ cyl,B=100) %>%
filter(type=="bootstrap",term=="cyl") %>%
summarize(ci_low = quantile(value,0.025),
ci_high = quantile(value,0.975))
Finally if you want to do a bootstrap hypothesis test you can pass a formula and a nested null formula. formula
must every term in null_formula
and one additional one you want to test.
# Bootstrap hypothesis test -
# here I've added one to the numerator
# and denominator because bootstrap p-values should
# never be zero.
mtcars %>%
slipper_lm(mpg ~ cyl, null_formula = mpg ~ 1,B=1000) %>%
filter(term=="cyl") %>%
summarize(num = sum(abs(value) >= abs(value[1])),
den = n(),
pval = num/den)
That's basically it for now. Would love some help/pull requests/fixes as this is my first attempt at getting into the tidyverse :).