R tips
This repository contains R programming tips covering topics across data cleaning, data visualisation, machine learning, statistical theory and data productionisation.
Many kudos to Dr Chuanxin Liu, my former PhD student and code editor, for teaching me how to code in R in my past life as an immunologist.
Content summary
Legend | Category |
---|---|
Data cleaning | |
Data visualisation | |
Machine learning | |
Productionisation | |
Statistical theory |
Tutorials
🎨 Data visualisation
- An introduction to
ggplot2
using volcano plots (Updated) - Using
DiagrammeR
to draw flow charts (Updated)
📚 Data cleaning
- Data cleaning using
data.table
ortidyverse
(or PythonPandas
) (Updated) - Manipulating character strings using regular expressions
🔨 Productionisation
- Creating SQL <> R workflows - Part 1 (Updated)
- Creating SQL <> R workflows - Part 2 (Updated)
- Automating R Markdown report generation - Part 1 (Updated)
- Automating R Markdown report generation - Part 2 (updated)
🔮 Machine learning
🔢 Statistical theory
- Introduction to expectation and variance
- Beyond expectations: centrality measures in statistics
- Introduction to the normal distribution
- Introduction to the Chi-squared and F distribution
- Introduction to binomial distributions
- Introduction to hypergeometric, geometric, negative binomial and multinomial distributions
Other resources
The resources below also cover a comprehensive range of practical R tutorials.
- Statistical Computing by Alex Reinhart and Christopher Genovese
- Data Science Toolkit by David Benkeser
- What They Forgot to Teach You About R by Jennifer Bryan and Jim Hester
Tutorial style guide
A painful form of technical debt is inconsistent code style. This repository now contains the following file naming and code style rules.
- Folders are no longer ordered with a numerical prefix and names are no longer case sensitive e.e.g
r_tips\tutorials\...
andr_tips\figures\...
- Tutorial subtopics share the same prefix e.g.
r_tips\tutorials\dv-...
andr_tips\tutorials\st-...
- File names contain
-
to separate file name prefixes and_
instead of other white space e.g.r_tips\figures\dv-using_diagrammer-simple_flowchart.svg
- Comments are styled according to the tidyverse style guide:
- The first comment explains the purpose of the code chunk and is styled differently for enhanced readability e.g.
# Code as header --------
- Comments are written in sentence case and only end with a full stop if they contain at least two sentences
- Short comments explaining a function argument do not have to be written on a new line
- Comments should not be followed by a blank line, unless the comment is a stand-alone paragraph containing in-depth rationale or an alternative solution
- The first comment explains the purpose of the code chunk and is styled differently for enhanced readability e.g.
- R code chunks are styled as follows:
- Each R chunk should be named with a short unique description written in the active voice e.g.
create basic plot
andmodify plot labels
- Arguments inside code chunks should not contain white space and boolean argument options should be written in capitals e.g.
{r load libraries, message=FALSE, warning = FALSE}
- To render the github document, results are generally suppressed using
results='hide'
and manually entered in a new line beneath the code. - To render the github document, figures are generally outputed using
fig.show='hold'
and figure outputs can then be suppressed at the local chunk level usingfig.show='hide'
- Each R chunk should be named with a short unique description written in the active voice e.g.
- Set a margin of 80 characters length in RStudio through
Tools\Global options --> Code --> Display --> Show margin
and use this margin as the cut-off for code and comments length
Citations
Citing packages is a good practice when you are publishing research papers. To do this, use citations("package")
to print the relevant package publication. A non-exhaustive list of R packages used in this repository is found below.
- R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
- Wickham et al., (2019). Welcome to the
tidyverse
. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686 - H. Wickham.
ggplot2
: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. - Matt Dowle and Arun Srinivasan (2021).
data.table
: Extension ofdata.frame
. R package version 1.14.2. https://CRAN.R-project.org/package=data.table