• Stars
    star
    406
  • Rank 106,421 (Top 3 %)
  • Language
    R
  • License
    Other
  • Created over 6 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Row-oriented workflows in R with the tidyverse

Row-oriented workflows in R with the tidyverse

Materials for RStudio webinar recording available at this link!:

Thinking inside the box: you can do that inside a data frame?!
Jenny Bryan
Wednesday, April 11 at 1:00pm ET / 10:00am PT
rstd.io/row-work <-- shortlink to this repo
Slides available on SpeakerDeck

Abstract

The data frame is a crucial data structure in R and, especially, in the tidyverse. Working on a column or a variable is a very natural operation, which is great. But what about row-oriented work? That also comes up frequently and is more awkward. In this webinar I’ll work through concrete code examples, exploring patterns that arise in data analysis. We’ll discuss the general notion of "split-apply-combine", row-wise work in a data frame, splitting vs. nesting, and list-columns.

Code examples

Beginner --> intermediate --> advanced
Not all are used in webinar

  • Leave your data in that big, beautiful data frame. ex01_leave-it-in-the-data-frame Show the evil of creating copies of certain rows of certain variables, using Magic Numbers and cryptic names, just to save some typing.
  • Adding or modifying variables. ex02_create-or-mutate-in-place df$var <- ... versus dplyr::mutate(). Recycling/safety, df's as data mask, aesthetics.
  • Are you SURE you need to iterate over rows? ex03_row-wise-iteration-are-you-sure Don't fixate on most obvious generalization of your pilot example and risk overlooking a vectorized solution. Features a paste() example, then goes out with some glue glory.
  • Working with non-vectorized functions. ex04_map-example Small example using purrr::map() to apply nrow() to list of data frames.
  • Row-wise thinking vs. column-wise thinking. ex05_attack-via-rows-or-columns Data rectangling example. Both are possible, but I find building a tibble column-by-column is less aggravating than building rows, then row binding.
  • Iterate over rows of a data frame. iterate-over-rows Empirical study of reshaping a data frame into this form: a list with one component per row. Revisiting a study originally done by Winston Chang. Run times for different number of rows or columns.
  • Generate data from different distributions via purrr::pmap(). ex06_runif-via-pmap Use purrr::pmap() to generate U[min, max] data for various combinations of (n, min, max), stored as rows of a data frame.
  • Are you SURE you need to iterate over groups? ex07_group-by-summarise Use dplyr::group_by() and dplyr::summarise() to compute group-wise summaries, without explicitly splitting up the data frame and re-combining the results. Use list() to package multivariate summaries into something summarise() can handle, creating a list-column.
  • Group-and-nest. ex08_nesting-is-good How to explicitly work on groups of rows via nesting (our recommendation) vs splitting.
  • Row-wise mean or sum. ex09_row-summaries How to do rowSums()-y and rowMeans()-y work inside a data frame.

More tips and links

Big thanks to everyone who weighed in on the related twitter thread. This was very helpful for planning content.

45 minutes is not enough! A few notes about more special functions and patterns for row-driven work. Maybe we need to do a follow up ...

tibble::enframe() and deframe() are handy for getting into and out of the data frame state.

map() and map2() are useful for working with list-columns inside mutate().

tibble::add_row() handy for adding a single row at an arbitrary position in data frame.

imap() handy for iterating over something and its names or integer indices at the same time.

dplyr::case_when() helps you get rid of hairy, nested if () {...} else {...} statements.

Great resource on the "why?" of functional programming approaches (such as map()): https://github.com/getify/Functional-Light-JS/blob/master/manuscript/ch1.md/

More Repositories

1

googlesheets

Google Spreadsheets R API
R
786
star
2

happy-git-with-r

Using Git and GitHub with R, Rstudio, and R Markdown
TeX
550
star
3

here_here

I love the here package. Here's why.
289
star
4

gapminder

Excerpt from the Gapminder data, as an R data package and in plain text delimited form
R
273
star
5

ggplot2-tutorial

Teaching materials for the R package ggplot2
R
236
star
6

code-smells-and-feels

Talk on code smells and feels and how to change that via refactoring
R
230
star
7

send-email-with-r

How to send a bunch of email from R
R
205
star
8

r-graph-catalog

All graphs in “Creating More Effective Graphs”, made with R package ggplot2.
R
187
star
9

repurrrsive

Recursive lists to use in teaching and examples, because there is no mtcars for lists.
R
133
star
10

access-r-source

How to get at R source. I am sick of Googling this. I am writing it down this time.
120
star
11

free-photos

Places to find CC0 photos and the like
116
star
12

debugging

Talk about general debugging strategies. How to be less confused and frustrated.
R
112
star
13

purrr-tutorial

Materials for getting to the know the R package purrr
HTML
111
star
14

pkg-dev-tutorial

Package Development tutorial for useR! 2019 Toulouse
R
89
star
15

docker-why

Notes about why an R user would use Docker
57
star
16

scary-excel-stories

Sobering things about Excel
55
star
17

jadd

RStudio addins
R
52
star
18

bingo

Generate Bingo cards with R.
R
48
star
19

sanesheets

A rant about spreadsheets.
47
star
20

githug

Interface to local and remote Git operations
R
47
star
21

how-to-name-files

R
43
star
22

manipulate-xml-with-purrr-dplyr-tidyr

Example of taming XML with nested data frames and purrr
HTML
40
star
23

lego-rstats

Photos that depict R data structures and operations via Lego
R
39
star
24

analyze-github-stuff-with-r

Marshal data from the GitHub API with R
R
38
star
25

2015-06-28_r-summit-talk

Talk at R Summit and Workshop about using R Markdown and GitHub in your workflow
38
star
26

operation-chromebook

Setup notes for the Bryan family Chromebooks
35
star
27

zen-art-workflow

Links and credits for a talk: Zen And The aRt Of Workflow Maintenance
R
35
star
28

2015-02-23_bryan-fields-talk

Talk at Workshop on Visualization for Big Data: Strategies and Principles, Fields Institute http://www.fields.utoronto.ca/programs/scientific/14-15/bigdata/visualization/
33
star
29

2016-06_spreadsheets

Talks given in May and June 2016.
32
star
30

2019-07_useR-toulouse-usethis

Talk about the usethis R package at useR! 2019 Toulouse
R
30
star
31

foofactors

Make Factors Less Aggravating
R
29
star
32

excelgesis

Critical explanation or interpretation of ... Excel spreadsheets
R
26
star
33

lotr

R
26
star
34

2024_raukr-purrr-pkg-dev

Jenny Bryan's instruction at RaukR: Advanced R for Bioinformatics Summer School
R
23
star
35

lotr-tidy

Tidy data lesson using Lord of the Rings data.
23
star
36

regexcite

PACKAGE EXISTS FOR DEMONSTRATION PURPOSES ONLY! Make Regular Expressions More Exciting
R
21
star
37

STAT545A_2013

UBC grad course in data analysis with R
HTML
21
star
38

earl-london-2017-bryan

Jenny Bryan talk at EARL London, 2017 September 12/13/14
21
star
39

2018-09_purrr-latinr

R
20
star
40

scream

Get replies and quotes of a tweet
19
star
41

jeremy-howard-posit-conf-2023

Notebook seen in Jeremy Howard's keynote at posit::conf(2023)
Jupyter Notebook
19
star
42

tidy-eval-context

17
star
43

2015-08_bryan-jsm-stat-data-sci-talk

Bryan talk at JSM 2015 re: are statisticians data scientists
R
17
star
44

stat540_2014

STAT540 Statistical Methods for High Dimensional Biology, January - April 2014
R
16
star
45

happy-git-and-github-for-the-user

Talk: Happy Git and GitHub for the useR
14
star
46

frogs

Data from the Calaveras Jumping Frog Jubilee
R
11
star
47

organization-and-naming

Draft of mini-lectures about file organization and naming.
9
star
48

2018_advent-of-code

R
8
star
49

candy

candy survey data
R
8
star
50

2024-04_netherlands-escience

8
star
51

2014-05-12-ubc

Python
8
star
52

making-messages

7
star
53

yelpr

Call the Yelp API from R ... at this point, just helping a student!
R
6
star
54

bioinformatics.ca-swc-r

Software Carpentry Bootcamp for bioinformatics.ca 2014-05-12
R
6
star
55

appveyorWTF

WTF AppVeyor, WTF?
R
5
star
56

2017_advent-of-code

R
5
star
57

STAT545

UBC grad course in data analysis with R
5
star
58

user2016-git-tutorial

Tutorial for useR! 2016 @ Stanford
4
star
59

jennybryan.org

Personal website of Jenny Bryan
HTML
4
star
60

swcR_duke

R content from Duke Software Carpentry Workshop May 2013.
R
4
star
61

jblibminer

Explore Your R Libraries
R
4
star
62

test-drive-a-package

try an experimental version of an R package without messing with your main R library
3
star
63

babystats

Bit of data on the Bryan babies
R
3
star
64

explore-libraries-seattle-practice

Just practicing!
R
3
star
65

miami-intermediate-r

Instructor repository for intermediate R room, U of Miami, Software Carpentry Boot Camp, January 2014
R
3
star
66

angrybunny

Split a single string
R
2
star
67

symlink-test

R
2
star
68

2021-06_raukr-iteration

R
2
star
69

furry-sniffle

A practice GitHub repo
2
star
70

teengecko

What the Package Does (One Line, Title Case)
R
2
star
71

xyztest

2
star
72

nfl

R
2
star
73

apple

Experimenting using GHA to render bookdown into gh-pages branch
TeX
2
star
74

localetest

What the Package Does (One Line, Title Case)
R
2
star
75

arms-length-render

Usage of rmarkdown::render() when intermediates and outputs don't live with source
R
2
star
76

vanNH

In-house statistics for the Vancouver Nighthawks of Major League Ultimate
HTML
2
star
77

testfun

Toy package with some tests
R
2
star
78

rmd-render-fun

R
2
star
79

cran-data-pkg-licenses

A look at the licenses used by data packages on CRAN
1
star
80

2013-11_sfu

Supporting documents for talk and workshop for SFU Statistics and Actuarial Science
1
star
81

README-as-visual-index

Autogenerate README to give visual index of a figure directory
R
1
star
82

refactor

Make Factors Less Annoying
R
1
star
83

vigilant-tribble

There's only one way to figure out how this works.
1
star
84

happy

I am happy
1
star
85

foofactors2

Happier Life With Factors
R
1
star
86

bellybutton

Data from "A Jungle in There" re: bacterial diversity in the adult human belly button
1
star
87

foofactors3

What the Package Does (One Line, Title Case)
R
1
star
88

2014-01-27-miami

Software Carpentry Bootcamp at the University of Miami
Python
1
star
89

stat545a-2013-hw06_baik-jon

Last homework for STAT545A
CSS
1
star
90

devhelp

What the Package Does (One Line, Title Case)
R
1
star
91

STAT545Assignment6

R
1
star
92

excuse-me-iris

Toy example used in the article "Excuse me, do you have a moment to talk about version control?"
R
1
star
93

googledrive-deployed-token-demo-service

R
1
star
94

fuzzy-peach

R
1
star
95

fluffybunny

What the Package Does (One Line, Title Case)
1
star
96

clouddeployoops

1
star
97

pudgy-pig

just testing some instructions
1
star
98

safecall

C
1
star
99

reimagined-goggles

Demo project created from raukR, OK to delete later
1
star
100

abcd

What the Package Does (One Line, Title Case)
R
1
star