This is the accompanying GitHub repository to a work in progress paper by Aaron Peikert and Andreas M. Brandmaier .
Abstract
In this tutorial, we describe a workflow to ensure long-term reproducibility of R-based data analyses. The workflow leverages established tools and practices from software engineering. It combines the benefits of various open-source software tools including R Markdown, Git, Make, and Docker, whose interplay ensures seamless integration of version management, dynamic report generation conforming to various journal styles, and full cross-platform and long-term computational reproducibility. The workflow ensures meeting the primary goals that 1) the reporting of statistical results is consistent with the actual statistical results (dynamic report generation), 2) the analysis exactly reproduces at a later point in time even if the computing platform or software is changed (computational reproducibility), and 3) changes at any time (during development and post-publication) are tracked, tagged, and documented while earlier versions of both data and code remain accessible. While the research community increasingly recognizes dynamic document generation and version management as tools to ensure reproducibility, we demonstrate with practical examples that these alone are not sufficient to ensure long-term computational reproducibility. Combining containerization, dependence management, version management, and dynamic document generation, the proposed workflow increases scientific productivity by facilitating later reproducibility and reuse of code and data.
Resources
Tool | How to install? | How to learn? |
---|---|---|
Windows only: Chocolately |
Visit chocolatey.org. | Chocolately installs software for you, it is installed and called from the terminal/command prompt. To open the comand prompt, press Windows+X and then click on βCommand Promptβ or βCommand Prompt (Admin).β |
OS X only: Homebrew |
Visit brew.sh. | Homebrew installs software for you. It is installed and called from the terminal/command prompt. To open the terminal press Command + Space to open Spotlight and then type βTerminalβ and double click on the top search result. |
R | Windows: Use Chocolately (from the terminal). choco install -y r.project OS X: Use Homebrew. brew install r |
Read: R for Data Science |
Rstudio | Windows: Use Chocolately (from the terminal). choco install -y r.studio OS X: Use Homebrew (from the terminal). brew install --cask rstudio |
Skim the cheatsheet |
rmarkdown | Within Rstudio, type into the R-console:install.packages("rmarkdown") |
Read the cheatsheet. Skim R Markdown: The Definitive Guide |
Git | Windows: Use Chocolately (from the terminal). choco install -y git OS X: Git gets installed with Homebrew. Nothing to do. |
Read Part IV Git fundamentals And skim the rest of Happy Git and Gitub for the useR. |
GitHub | Create an account on: github.com And apply for Student/Researcher Benefits |
Read Part II Connect Git, GitHub, RStudio And III Early GitHub Wins. |
Make | Windows: Use chocolately. choco install -y make OS X: Make is preinstalled on OS X. Nothing to do. |
Read Minimal Make |
Docker | Windows: Use chocolately. choco install -y docker-desktop OS X: Use Homebrew (from the terminal). brew install --cask docker Linux: Follow steps described in: Post-installation steps for Linux |
Read An Introduction to Rocker: Docker Containers for R. |
Compile
The following paragraphs describe how you can obtain a copy of the source files of our manuscript describing reproducible workflows, and create the PDF. Either, you can go the βstandardβ way of downloading a local copy of the repository and knit the manuscript file in R, or you can use the reproducible workflow as suggested and use Make to create a container and build the final PDF file in exactly the same virtual computational environment that we used to render the PDF.
Standard Way
Requires: Git
, RStudio
, pandoc
, pandoc-citeproc
& rmarkdown
.
Open RStudio -> File -> New Project -> Version Control -> Git
Insert:
https://github.com/aaronpeikert/reproducible-research.git
Open manuscript.Rmd
click on Knit
.
Using a Reproducible Workflow
Does not require R or RStudio, but make
& docker
.
Execute in Terminal:
git clone https://github.com/aaronpeikert/reproducible-research.git
cd reproducible-research
make build
make all DOCKER=TRUE
Note: Windows user need to manually edit the Makefile
and set
current_path to the current directory and use make all DOCKER=TRUE WINDOWS=TRUE
. We hope that future releases of Docker for Windows will
not require that workaround.
Rebuild Everything
In case you experience some unexpected behavior with this workflow, you
should check that you have the most recent version (git pull
), rebuild
the docker image (make build
) and force the rebuild of all targets
(make -B DOCKER
).
git pull && make rebuild && make -B DOCKER=TRUE
Session Info
sessioninfo::session_info()
## β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
## setting value
## version R version 3.6.1 (2019-07-05)
## os Debian GNU/Linux 9 (stretch)
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Etc/UTC
## date 2021-05-01
##
## β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
## package * version date lib source
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1)
## backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
## cli 2.0.0 2019-12-09 [1] CRAN (R 3.6.1)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1)
## digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.1)
## evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.1)
## fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.1)
## glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.1)
## here * 0.1 2017-05-28 [1] CRAN (R 3.6.1)
## hms 0.5.2 2019-10-30 [1] CRAN (R 3.6.1)
## htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1)
## knitr 1.26 2019-11-12 [1] CRAN (R 3.6.1)
## magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1)
## pander * 0.6.3 2018-11-06 [1] CRAN (R 3.6.1)
## pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.1)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
## R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1)
## Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1)
## readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.1)
## rlang 0.4.2 2019-11-23 [1] CRAN (R 3.6.1)
## rmarkdown 2.0 2019-12-12 [1] CRAN (R 3.6.1)
## rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.1)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
## stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.1)
## stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.1)
## tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.1)
## vctrs 0.2.1 2019-12-17 [1] CRAN (R 3.6.1)
## withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.1)
## xfun 0.11 2019-11-12 [1] CRAN (R 3.6.1)
## yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.1)
## zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.1)
##
## [1] /usr/local/lib/R/site-library
## [2] /usr/local/lib/R/library