• Stars
    star
    670
  • Rank 67,354 (Top 2 %)
  • Language
    R
  • License
    Other
  • Created over 7 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

rrtools: Tools for Writing Reproducible Research in R

rrtools: Tools for Writing Reproducible Research in R

R-CMD-check Launch Rstudio Binder

Motivation

The goal of rrtools is to provide instructions, templates, and functions for making a basic compendium suitable for writing a reproducible journal article or report with R. This package documents the key steps and provides convenient functions for quickly creating a new research compendium. The approach is based on Marwick (2017), Marwick et al. (2018), and Wickham’s (2017) work using the R package structure as the basis for a research compendium.

rrtools provides a template for doing scholarly writing in a literate programming environment using Quarto, an open-source scientific and technical publishing system. It also allows for isolation of your computational environment using Docker, package versioning using renv, and continuous integration using GitHub Actions. It makes a convenient starting point for writing a journal article or report.

The functions in rrtools allow you to use R to easily follow the best practices outlined in several major scholarly publications on reproducible research. In addition to those cited above, Wilson et al. (2017), Piccolo & Frampton (2016), Stodden & Miguez (2014) and rOpenSci (2017) are important sources that have influenced our approach to this package.

Installation

To explore and test rrtools without installing anything, click the Binder badge above to start RStudio in a browser tab that includes the contents of this GitHub repository. In that environment you can browse the files, install rrtools, and make a test compendium without altering anything on your computer.

You can install rrtools from GitHub with these lines of R code (Windows users are recommended to install a separate program, Rtools, before proceeding with this step):

if (!require("devtools")) install.packages("devtools")
devtools::install_github("benmarwick/rrtools")

How to use

To create a reproducible research compendium step-by-step using the rrtools approach, follow these detailed instructions. We use RStudio, and recommend it, but is not required for these steps to work. We recommend copy-pasting these directly into your console, and editing the options before running. We don’t recommend saving these lines in a script in your project: they are meant to be once-off setup functions.

0. Create a Git-managed directory linked to an online repository

  • It is possible to use rrtools without Git, but usually we want our research compendium to be managed by the version control software Git. The free online book Happy Git With R has details on how to do this. In brief, there are two methods to get started:
    • New project on GitHub first, then download to RStudio: Start on Github, Gitlab, or a similar web service, and create an empty repository called pkgname (you should use a different name, please follow the rules below) on that service. Then clone that repository to have a local empty directory on your computer, called pkgname, that is linked to this remote repository. Please see our wiki for a step-by-step walk-though of this method, illustrated with screenshots.
    • New project in RStudio first, then connect to GitHub/GitLab: An alternative approach is to create a local, empty, directory called pkgname on your computer (e.g. in your Desktop or Downloads folder), and initialize it with Git (git init), then create a GitHub/GitLab repository and connect your local project to the remote repository.
  • Whichever of those two methods that you choose, you continue by staging, commiting and pushing every future change in the repository with Git.
  • Your pkgname must follow some rules for everything to work, it must:
    • … contain only ASCII letters, numbers, and ‘.’
    • … have at least two characters
    • … start with a letter (not a number)
    • … not end with ‘.’

1. rrtools::use_compendium("pkgname")

  • if you started with a new project on GitHub first, run rrtools::use_compendium(), if you started with a new project in RStudio first, run rrtools::use_compendium("pkgname")
  • this uses usethis::create_package() to create a basic R package in the pkgname directory, and then, if you’re using RStudio, opens the project. If you’re not using RStudio, it sets the working directory to the pkgname directory.
  • we need to:
    • edit the DESCRIPTION file (located in your pkgname directory) to include accurate metadata, e.g. your ORCID and email address
    • periodically update the Imports: section of the DESCRIPTION file with the names of packages used in the code we write in the qmd document(s) by running rrtools::add_dependencies_to_description()

2. usethis::use_mit_license(copyright_holder = "My Name")

  • this adds a reference to the MIT license in the DESCRIPTION file and generates a LICENSE file listing the name provided as the copyright holder
  • to use a different license, replace this line with any of the licenses mentioned here: ?usethis::use_mit_license()

3. rrtools::use_readme_rmd()

  • this generates README.Rmd and renders it to README.md, ready to display on GitHub. It contains:
    • a template citation to show others how to cite your project. Edit this to include the correct title and DOI.
    • license information for the text, figures, code and data in your compendium
  • this also adds two other markdown files: a code of conduct for users CONDUCT.md, and basic instructions for people who want to contribute to your project CONTRIBUTING.md, including for first-timers to git and GitHub.
  • this adds a .binder/Dockerfile that makes Binder work, if your compendium is hosted online. Currently configured for GitHub, but easily adapted for elsewhere (e.g. Zenodo, Figshare, Dataverse, etc.)
  • render this document after each change to refresh README.md, which is the file that GitHub displays on the repository home page

4. rrtools::use_analysis()

  • this function has three location = options: top_level to create a top-level analysis/ directory, inst to create an inst/ directory (so that all the sub-directories are available after the package is installed), and vignettes to create a vignettes/ directory (and automatically update the DESCRIPTION). The default is a top-level analysis/.
  • for each option, the contents of the sub-directories are the same, with the following (using the default analysis/ for example):
analysis/
|
├── paper/
│   ├── paper.qmd       # this is the main document to edit
│   └── references.bib  # this contains the reference list information

├── figures/            # location of the figures produced by the qmd
|
├── data/
│   ├── raw_data/       # data obtained from elsewhere
│   └── derived_data/   # data generated during the analysis
|
└── templates
    ├── journal-of-archaeological-science.csl
    |                   # this sets the style of citations & reference list
    ├── template.docx   # used to style the output of the paper.qmd
    └── template.Rmd
  • the paper.qmd is ready to write in and render with Quarto. It includes:
    • a YAML header that identifies the references.bib file and the supplied csl file (to style the reference list)
    • a colophon that adds some git commit details to the end of the document. This means that the output file (HTML/PDF/Word) is always traceable to a specific state of the code.
  • the references.bib file has just one item to demonstrate the format. It is ready to insert more reference details.
  • you can replace the supplied csl file with a different citation style from https://github.com/citation-style-language/
  • we recommend using the RStudio 2022.07 or higher to efficiently insert citations from your Zotero library while writing in an qmd file (see here for detailed setup and use information to connect your RStudio to your Zotero)
  • remember that the Imports: field in the DESCRIPTION file must include the names of all packages used in analysis documents (e.g. paper.qmd). We have a helper function rrtools::add_dependencies_to_description() that will scan the qmd file, identify libraries used in there, and add them to the DESCRIPTION file.
  • this function has an data_in_git = argument, which is TRUE by default. If set to FALSE you will exclude files in the data/ directory from being tracked by git and prevent them from appearing on GitHub. You should set data_in_git = FALSE if your data files are large (>100 mb is the limit for GitHub) or you do not want to make the data files publicly accessible on GitHub.
    • To load your custom code in the paper.qmd, you have a few options. You can write all your R code in chunks in the qmd, that’s the simplest method. Or you can write R code in script files in /R, and include devtools::load_all(".") at the top of your paper.qmd. Or you can write functions in /R and use library(pkgname) at the top of your paper.qmd, or omit library and preface each function call with pkgname::. Up to you to choose whatever seems most natural to you.

5. rrtools::use_dockerfile()

  • this creates a basic Dockerfile using rocker/verse as the base image
  • this also creates creates a minimal .yml configuration file to activate continuous integration using GitHub Actions. This will attempt to render your qmd document, in a Docker container specified by your Dockerfile, each time you push to GitHub. You can view the results of each attempt at the 'actions' page for your compendium on github.com, e.g. https://github.com/benmarwick/rrtools/actions
  • the version of R in your rocker container will match the version used when you run this function (e.g., rocker/verse:3.5.0)
  • rocker/verse includes R, the tidyverse, RStudio, pandoc and LaTeX, so compendium build times are very fast
  • we need to:
    • edit the Dockerfile to add linux dependencies (for R packages that require additional libraries outside of R). You can find out what these are by browsing the DESCRIPTION files of the other packages you’re using, and looking in the SystemRequirements field for each package. If you are getting build errors on GitHub Actions, check the logs. Often, the error messages will include the names of missing libraries.
    • modify which qmd files are rendered when the container is made
    • have a public GitHub repo to use the Dockerfile that this function generates. It is possible to keep the repository private and run a local Docker container with minor modifications to the Dockerfile that this function generates.

6. renv::init()

  • this initates tracking of the packages you use in your project using renv. renv will discover the R packages used in your project, and install those packages into a private project library
  • We can use renv::snapshot() to save the state of our project library from time to time, or at the end when we are ready to share. The project state will be saved into a file called renv.lock.
  • Our collaborators can run renv::restore() to install exactly those packages into their own library.
  • Don't skip this step because our Binder and Dockerfile use the renv.lock file to install the packages they need to run your code. So renv is an important component of making a compendium reproducible.

You should be able to follow these steps to get a new research compendium repository ready to write in just a few minutes.

References and related reading

Kitzes, J., Turek, D., & Deniz, F. (Eds.). (2017). The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. https://www.practicereproducibleresearch.org

Marwick, B. (2017). Computational reproducibility in archaeological research: Basic principles and a case study of their implementation. Journal of Archaeological Method and Theory, 24(2), 424-450. https://doi.org/10.1007/s10816-015-9272-9

Marwick, B., Boettiger, C., & Mullen, L. (2018). Packaging data analytical work reproducibly using R (and friends). The American Statistician 72(1), 80-88. https://doi.org/10.1080/00031305.2017.1375986

Piccolo, S. R. and M. B. Frampton (2016). “Tools and techniques for computational reproducibility.” GigaScience 5(1): 30. https://gigascience.biomedcentral.com/articles/10.1186/s13742-016-0135-4

rOpenSci community (2017b). rrrpkg: Use of an R package to facilitate reproducible research. Online at https://github.com/ropensci/rrrpkg

Schmidt, S.C. and Marwick, B., 2020. Tool-Driven Revolutions in Archaeological Science. Journal of Computer Applications in Archaeology, 3(1), pp.18–32. DOI: http://doi.org/10.5334/jcaa.29

Stodden, V. & Miguez, S., (2014). Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research. Journal of Open Research Software. 2(1), p.e21. DOI: http://doi.org/10.5334/jors.ay

Wickham, H. (2017) Research compendia. Note prepared for the 2017 rOpenSci Unconf. https://docs.google.com/document/d/1LzZKS44y4OEJa4Azg5reGToNAZL0e0HSUwxamNY7E-Y/edit#

Wilson G, Bryan J, Cranston K, Kitzes J, Nederbragt L, et al. (2017). Good enough practices in scientific computing. PLOS Computational Biology 13(6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510

Contributing

If you would like to contribute to this project, please start by reading uur Guide to Contributing. Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Acknowledgements

This project was developed during the 2017 Summer School on Reproducible Research in Landscape Archaeology at the Freie Universität Berlin (17-21 July), funded and jointly organized by Exc264 Topoi, CRC1266, and ISAAKiel. Special thanks to Sophie C. Schmidt for help. The convenience functions in this package are inspired by similar functions in the usethis package.

More Repositories

1

wordcountaddin

Word counts and readability statistics in R markdown documents
R
329
star
2

ctv-archaeology

CRAN Task View: Archaeological Science
R
131
star
3

AAA2011-Tweets

R code for analyzing tweets relating to #AAA2011 (text mining, topic modelling, network analysis, clustering and sentiment analysis)
R
71
star
4

JSTORr

Simple text mining of journal articles from JSTOR's Data for Research service
R
70
star
5

Interactive_PCA_Explorer

Shiny app for exploring a PCA
R
47
star
6

How-To-Do-Archaeological-Science-Using-R

HTML
29
star
7

researchcompendium

NOTE: This repo is archived. Please see https://github.com/benmarwick/rrtools for my current approach
R
25
star
8

bookdown-ort

An experiment to add elements of the Open Review Toolkit to bookdown
TeX
20
star
9

dayofarchaeology

A Distant Reading of the Day of Archaeology
R
20
star
10

binford

Datasets used in Binford's 2001 book "Constructing Frames of Reference: An Analytical Method for Archaeological Theory Building Using Ethnographic and Environmental Data Sets"
R
19
star
11

mjbtramp

TeX
17
star
12

CSSCR-2018-R-Markdown-for-Research-Students

View the slides here: https://rawgit.com/benmarwick/CSSCR-2018-R-Markdown-for-Research-Students/master/uw-csscr-huskydown-slides.html
HTML
16
star
13

atom-for-scholarly-writing-with-markdown

TeX
16
star
14

outliner

R
14
star
15

1989-excavation-report-Madjedbebe

Text, data and code to accompany the analysis of the 1989 excavation data
R
14
star
16

evoarchdata

Datasets from studies of cultural evolution in Archaeology
R
13
star
17

signatselect

signatselect: Identifying signatures of selection
R
11
star
18

basic_computational_reproducibility_case_study

TeX
11
star
19

cvequality

Tests for the equality of coefficients of variation from multiple groups
R
10
star
20

UW-eScience-docker-for-reproducible-research

This repository contains my slides and references for a presentation to the UW eScience Institute on using Docker for reproducible research (10 Feb 2015). To view the slides, go to http://benmarwick.github.io/UW-eScience-docker-for-reproducible-research
10
star
21

Analysing-Archaeological-Radiocabon-Ages-Using-R

R
9
star
22

snakecaser

An RStudio Add-in to convert text to snake_case (e.g. for making object names)
R
9
star
23

SAA2015-Open-Methods

Materials relating to the SAA2015 session on Open Methods in Archaeology
HTML
9
star
24

UW-eScience-reproducibility-social-sciences

This repository contains my slides and references for a presentation to the UW eScience Institute on reproducible research in the social sciences (9 April 2014). To view the slides, go to http://benmarwick.github.io/UW-eScience-reproducibility-social-sciences
9
star
25

Stratigraphy-and-radiocarbon-dates-from-Gua-Mo-o-hono-Sulawesi

Text, code and data to accompany Piper et al.
R
8
star
26

smps

time series colour contour plots of data from Scanning Mobility Particle Sizer (SMPS) data
R
7
star
27

CES2021

R
6
star
28

LaplacesDemon

A complete environment for Bayesian inference within R
R
6
star
29

CSSS-Primer-Reproducible-Research

This repository contains my slides and references for a presentation to the UW Center for Statistics and Social Sciences on reproducible research in the social sciences (12 March 2014). To view the slides, go to http://benmarwick.github.io/CSSS-Primer-Reproducible-Research
HTML
6
star
30

teaching-replication-in-archaeology

This repository contains the data and code for our paper: "How to use replication assignments for teaching integrity in empirical archaeology"
TeX
6
star
31

CAA2021

R
5
star
32

tidyverse-for-archaeology

View slides at
HTML
5
star
33

mjbnaturepaper

R
5
star
34

gsloid

Global Sea Level and Oxygen Isotope Data
R
5
star
35

arcas-workshop-good-stat-practice

R
5
star
36

rmgarbage

Automatic garbage extraction from OCR'd text
R
4
star
37

saa-ethics-survey-2020

HTML
4
star
38

onboarding-reproducible-compendia

4
star
39

roev

Rates of Evolution
R
4
star
40

new-data-presentation-paradigm-using-r

Using R to produce the plots recommended by Weissgerber et al. in 10.1371/journal.pbio.1002128. To see the plots click here: https://rawgit.com/benmarwick/new-data-presentation-paradigm-using-r/master/Weissgerber_et_al_supplementary_plots.html
HTML
4
star
41

saa-meeting-abstracts

Quantitative analysis of test in SAA abstracts (raw data is available in this repo)
HTML
4
star
42

CSSS_2016_Packaging

View slides at https://rawgit.com/benmarwick/CSSS_2016_Packaging/master/CSSS_2016_Packaging.html
HTML
3
star
43

Advances-in-Archaeological-Practice-Tweets

R
3
star
44

culturalevochange

R
3
star
45

scopusarchaeology

Explore the titles of archaeology articles from Scopus
R
3
star
46

SAA2017-How-to-do-archaeological-science-using-R

HTML
3
star
47

Marwick-Nara-2019-lecture-4-rrtools-workshop

View the slides here: https://benmarwick.github.io/Marwick-Nara-2019-lecture-4-rrtools-workshop/Marwick-Nara-2019-lecture-4-rrtools-workshop.html#1
JavaScript
3
star
48

CSSS-594-WI23-text-as-data

CS&SS 594 A Wi 23: Special Topics In Social Science And Statistics: Text as Data
Dockerfile
3
star
49

ETH-Zurich-ZuKoSt-Reproducible-Research-Compendia-via-R-packages

Slides for my seminar on 2 March 2017, view the slides here: https://rawgit.com/benmarwick/ETH-Zurich-ZuKoSt-Reproducible-Research-Compendia-via-R-packages/master/ETH-Z%C3%BCrich-Z%C3%BCKoSt-Reproducible-Research-Compendia-via-R-packages.html
HTML
3
star
50

UW-eScience-reproducibility-collaboration

This repository contains my slides and references for a presentation to the UW eScience Institute on reproducible research and collaboration (2 Dec 2014). To view the slides, go to http://benmarwick.github.io/UW-eScience-reproducibility-collaboration
CSS
3
star
51

2019-03-26-Cambridge-Archaeology-Big-Data-Workshop

Data Carpentry Workshop materials for the conference "Big Data in Archaeology: Practicalities and Possibilities"
3
star
52

polygonoverlap

The goal of polygonoverlap is to compute the probability that an observed area of overlap between two sets of polygons is due to chance
R
2
star
53

stat-inference-and-exploration-for-archaeologists

View the slides here:
HTML
2
star
54

Persistence-of-Public-Interest-in-Gun-Control

See here for the output with interactive plots: https://rawgit.com/benmarwick/Persistence-of-Public-Interest-in-Gun-Control/master/README.html
R
2
star
55

systematicsinprehistory

What the Package Does (One Line, Title Case)
HTML
2
star
56

olympicdamboundaries

R
2
star
57

linter-retextjs

A plugin for Atom's Linter that provides an interface to retext.
JavaScript
2
star
58

2019-04-10-saa-workshop

2
star
59

au13uwgeoarchlab

R Code for reproducible research in geoarchaeology
R
2
star
60

predictSource

HTML
2
star
61

UO-2018-On-Ramps-to-Reproducibility

Slides for my talk at the UO Anthropology Department Series.
JavaScript
2
star
62

2019-09-14-morph2019

Please view the website at: https://benmarwick.github.io/2019-09-14-morph2019/
PLSQL
2
star
63

aswr

TeX
2
star
64

guanyingdongstoneartefacts

2
star
65

Data-Science-at-UW-Poster

Text and code of poster presented at this event: http://escience.washington.edu/event/data-science-university-washington-campus-conversation
2
star
66

berlinsummerschoolkeynote

Code, data and slides for my keynote presentation at the 2017 Archaeology Summer School at Freie Universität Berlin
R
2
star
67

saa2019-tweets

Dockerfile
2
star
68

modelextinctionideas

What the Package Does (one line, title case)
HTML
2
star
69

Fatalities-from-the-2021-Military-Coup-in-Myanmar

Dashboard of fatalities from the 2021 Military coup in Myanmar. Data from the Assistance Association for Political Prisoners (Burma, https://aappb.org/)
2
star
70

Pleistocene-aged-stone-artefacts-from-Jerimalai--East-Timor

Text, data and code to accompany the analysis of stone artefacts reported in Marwick et al.
HTML
2
star
71

confschedlr

confschedlr is a package to help organise the program for the 2018 Society of American Archaeology meeting
R
1
star
72

2019-04-10-saa

Transparent and Open Archaeological Science Using R A Short Workshop at the Society of American Archaeology Annual Meeting, Albuquerque Convention Center
Python
1
star
73

March-2019-Cambridge-Big-Data-Archaeology

R
1
star
74

Hernandez-Fernandez-bioclimatic-models

1
star
75

saa-2019-Park-and-Marwick

R
1
star
76

Steele_et_al_VR003_MSA_Pigments

R
1
star
77

pandanusisotopes

Research compendium
Lua
1
star
78

VJU-Geoscience-mapping-with-R-Workshop

VJU Geoscience mapping with R Workshop, July 2022
R
1
star
79

dayofdh2014

R
1
star
80

kwakmarwickaas2015

HTML
1
star
81

ktc11

R
1
star
82

UOW-NIASRA-2016-talk

HTML
1
star
83

Monash-Wombat-2016-talk

HTML
1
star
84

maualithics

PLSQL
1
star
85

bm-vita

Is there a more complex way to write your CV than this? Probably not. PDF is here:
TeX
1
star
86

Seattle-UseR-Group-April-2018

View the slides online at https://rawgit.com/benmarwick/Seattle-UseR-Group-April-2018/master/Seattle-UseR-Group-April-2018.html
HTML
1
star
87

testcontainerit

R
1
star
88

Marwick-UCL-March-2019-Reproducibility

Reproducible Research at the University College London, March 2019: Workshop and Presentation
JavaScript
1
star
89

datacitation

Research compendium for our paper in 'Advances in Archaeological Practice'
HTML
1
star
90

Particle-size-analysis-Putslaagte-1

Text, data and code to accompany the particle size analysis reported in Mackay et al. 2014 http://dx.doi.org/10.1016/j.quaint.2014.05.007
1
star
91

marwick-and-maloney-saa2014

R
1
star