• Stars
    star
    116
  • Rank 303,894 (Top 6 %)
  • Language
    R
  • Created almost 8 years ago
  • Updated about 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Tidy and easy bootstrapping

Bootstrapping made easy and tidy with slipper

You've heard of broom for tidying up your R functions. slipper is an R package for tidy/easy bootstrapping. There are already a bunch of good bootstrapping packages out there including bootstrap and boot. You can also bootstrap with dplyr and broom or with purrr and modelr.

But I'm too dumb for any of those. So slipper includes some simple,pipeable bootstrapping functions for me

install

with devtools:

devtools::install_github('jtleek/slipper')

use

There are only two functions in this package.

Call slipper to bootstrap any function that returns a single value.

slipper(mtcars,mean(mpg),B=100)

slipper is built to work with pipes and the tidyverse too.

mtcars %>% slipper(mean(mpg),B=100)

The output is a data frame with the values of the function on the original data set and the bootstrapped replicates. You can calculate confidence intervals using summarize

mtcars %>% slipper(mean(mpg),B=100) %>%
  filter(type=="bootstrap") %>% 
  summarize(ci_low = quantile(value,0.025),
            ci_high = quantile(value,0.975))

You can also bootstrap linear models using slipper_lm just pass the data frame and the formula you want to fit on the original data and on the bootstrap samples.

 slipper_lm(mtcars,mpg ~ cyl,B=100)

This is also pipeable

mtcars %>% slipper_lm(mpg ~ cyl,B=100)

The default behavior is to bootstrap complete cases, but if you want to bootstrap residuals set boot_resid=TRUE

mtcars %>% slipper_lm(mpg ~ cyl,B=100,boot_resid=TRUE)

You can calculate bootstrap confidence intervals in the same way as you do for slipper.

mtcars %>% slipper_lm(mpg ~ cyl,B=100) %>% 
 filter(type=="bootstrap",term=="cyl") %>%
  summarize(ci_low = quantile(value,0.025),
            ci_high = quantile(value,0.975))

Finally if you want to do a bootstrap hypothesis test you can pass a formula and a nested null formula. formula must every term in null_formula and one additional one you want to test.

# Bootstrap hypothesis test - 
# here I've added one to the numerator
# and denominator because bootstrap p-values should 
# never be zero.

mtcars %>% 
  slipper_lm(mpg ~ cyl, null_formula = mpg ~ 1,B=1000) %>%
    filter(term=="cyl") %>%
    summarize(num = sum(abs(value) >= abs(value[1])),
                                den = n(),
                                pval = num/den)

That's basically it for now. Would love some help/pull requests/fixes as this is my first attempt at getting into the tidyverse :).

More Repositories

1

datasharing

The Leek group guide to data sharing
6,414
star
2

dataanalysis

The lecture slides for Coursera's Data Analysis class
JavaScript
754
star
3

rpackages

R package development - the Leek group way!
513
star
4

genomicspapers

The Leek group guide to genomics papers
452
star
5

reviews

Writing reviews of academic papers
444
star
6

readingpapers

A guide to reading scientific papers
444
star
7

firstpaper

286
star
8

talkguide

The Leek Group Guide to Giving Talks
255
star
9

capitalIn21stCenturyinR

Piketty in R
HTML
212
star
10

genstats

Statistics course for JHU Genomic Data Science Sequence
HTML
142
star
11

careerplanning

A career planning guide.
118
star
12

modules

JavaScript
96
star
13

tidypvals

An R package with several million published p-values in tidy data sets.
HTML
74
star
14

ads2020

Advanced Data Science 2020 Edition
CSS
73
star
15

futureofstats

Take Homes from the Unconference on the Future of Statistics #futureofstats
33
star
16

sva-devel

R
28
star
17

swfdr

R code for calculating the Science-wise False Discovery Rate
R
26
star
18

papr

Paper app
HTML
19
star
19

svaseq

Analysis for svaseq paper
17
star
20

genstats_site

Site for Genomic Data Science Class
HTML
16
star
21

advdatasci15

Advanced Data Science @ JHU Biostats
HTML
16
star
22

jtleek.github.io

Website
HTML
15
star
23

jhsph753and4

Class github repository for 751 and 2; doctoral classes in the Department of Biostatistics at Johns Hopkins
JavaScript
14
star
24

courses

Courses taught by Jeff
14
star
25

protocols

This will be a directory of lab analysis protocols.
HTML
13
star
26

data

Data resources created by the Leek group
11
star
27

talks

Slides from presentations
11
star
28

leekasso

Code for comparing the top 10 predictors to the lasso/debiased lasso
R
11
star
29

books

Books by Jeff Leek
11
star
30

jobs

Jobs
10
star
31

datascientist

datascientist
R
10
star
32

gdspi

Genomic Data Science for PIs Curriculum Outline
9
star
33

intro-ml-2018

HTML
8
star
34

healthvis

An Interactive Health Visualization Package
Python
8
star
35

escalatr

A package for making R markdown websites.
7
star
36

advdatasci16

HTML
7
star
37

datawomenontwitter

A list of women doing great data things on Twitter (started here:http://simplystatistics.org/2014/09/09/a-non-comprehensive-list-of-awesome-female-data-people-on-twitter/)
7
star
38

simplystats

R
6
star
39

cshlcg-labs

Cold Spring Harbor Labs Computational Genomics
6
star
40

advdatasci_swirl

HTML
5
star
41

ai

A few AI resources that I've found interesting or that we are working on
5
star
42

software

Leek group software
4
star
43

tspreg

An R package for performing top-scoring pairs regression.
R
4
star
44

jhsph-irb-research-plan-template

JHSPH IRB form
4
star
45

advdatasci-swirl

HTML
4
star
46

googleCite

googleCite is a function for creating a wordcloud of your google scholar citations page.
4
star
47

replication_paper

Replication paper
HTML
3
star
48

sva

This is a read-only mirror of the Bioconductor SVN repository. Package Homepage: http://bioconductor.org/packages/devel/bioc/html/sva.html Bug Reports: https://support.bioconductor.org/p/new/post/?tag_val=sva.
R
3
star
49

graduate

3
star
50

testrepository

testrepository
3
star
51

svaruv

2
star
52

advdatasci-project

Awesome project!
HTML
2
star
53

jhsph753

Web page for JHSPH Advanced Methods/Applied Statistics
JavaScript
2
star
54

sisg

SISG Module 6
HTML
2
star
55

practicecourse

Practice course for CDS
1
star
56

newproject

This is my new project.
1
star
57

simplystats_analysis

Wrapping up!
R
1
star
58

gcd

Getting and cleaning data reboot
1
star
59

hr-in-ds

A collaborative white paper on challenges and opportunities with human resources for data science positions
1
star
60

portfolio

This is my Data Science Specialization Portfolio
1
star
61

jhudash-refugee

Code to collect data for the #jhudash refugee project
HTML
1
star
62

iap

This is the repository for the inference after prediction package
R
1
star
63

rfitbit

An R package to download and play with fitbit data
1
star
64

inclassfeb62014

In class project repo
Shell
1
star
65

sisbid-rstudio

1
star
66

alg-fairness-app-wireframe

Shiny app wireframe
1
star
67

rdsmGeneSig

A deterministic statistical machine (http://simplystatistics.org/2012/08/27/a-deterministic-statistical-machine/) for calculating and validating a gene signature.
R
1
star