• Stars
    star
    1,369
  • Rank 34,342 (Top 0.7 %)
  • Language
    R
  • License
    Other
  • Created over 10 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Tidy Messy Data

tidyr tidyr website

CRAN status R-CMD-check Codecov test coverage

Overview

The goal of tidyr is to help you create tidy data. Tidy data is data where:

  1. Each variable is a column; each column is a variable.
  2. Each observation is a row; each row is an observation.
  3. Each value is a cell; each cell is a single value.

Tidy data describes a standard way of storing data that is used wherever possible throughout the tidyverse. If you ensure that your data is tidy, you’ll spend less time fighting with the tools and more time working on your analysis. Learn more about tidy data in vignette("tidy-data").

Installation

# The easiest way to get tidyr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just tidyr:
install.packages("tidyr")

# Or the development version from GitHub:
# install.packages("pak")
pak::pak("tidyverse/tidyr")

Cheatsheet

Getting started

library(tidyr)

tidyr functions fall into five main categories:

  • “Pivoting” which converts between long and wide forms. tidyr 1.0.0 introduces pivot_longer() and pivot_wider(), replacing the older spread() and gather() functions. See vignette("pivot") for more details.

  • “Rectangling”, which turns deeply nested lists (as from JSON) into tidy tibbles. See unnest_longer(), unnest_wider(), hoist(), and vignette("rectangle") for more details.

  • Nesting converts grouped data to a form where each group becomes a single row containing a nested data frame, and unnesting does the opposite. See nest(), unnest(), and vignette("nest") for more details.

  • Splitting and combining character columns. Use separate_wider_delim(), separate_wider_position(), and separate_wider_regex() to pull a single character column into multiple columns; use unite() to combine multiple columns into a single character column.

  • Make implicit missing values explicit with complete(); make explicit missing values implicit with drop_na(); replace missing values with next/previous value with fill(), or a known value with replace_na().

Related work

tidyr supersedes reshape2 (2010-2014) and reshape (2005-2010). Somewhat counterintuitively, each iteration of the package has done less. tidyr is designed specifically for tidying data, not general reshaping (reshape2), or the general aggregation (reshape).

data.table provides high-performance implementations of melt() and dcast()

If you’d like to read more about data reshaping from a CS perspective, I’d recommend the following three papers:

To guide your reading, here’s a translation between the terminology used in different places:

tidyr 1.0.0 pivot longer pivot wider
tidyr < 1.0.0 gather spread
reshape(2) melt cast
spreadsheets unpivot pivot
databases fold unfold

Getting help

If you encounter a clear bug, please file a minimal reproducible example on github. For questions and other discussion, please use community.rstudio.com.


Please note that the tidyr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

More Repositories

1

ggplot2

An implementation of the Grammar of Graphics in R
R
6,496
star
2

dplyr

dplyr: A grammar of data manipulation
R
4,725
star
3

tidyverse

Easily install and load packages from the tidyverse
R
1,633
star
4

rvest

Simple web scraping for R
R
1,488
star
5

purrr

A functional programming toolkit for R
R
1,254
star
6

readr

Read flat files (csv, tsv, fwf) into R
R
1,001
star
7

magrittr

Improve the readability of R code with the pipe
R
957
star
8

datascience-box

Data Science Course in a Box
JavaScript
937
star
9

reprex

Render bits of R code for sharing, e.g., on GitHub or StackOverflow.
R
735
star
10

lubridate

Make working with dates in R just that little bit easier
R
727
star
11

readxl

Read excel files (.xls and .xlsx) into R 🖇
C++
726
star
12

glue

Glue strings to data in R. Small, fast, dependency free interpreted string literals.
R
705
star
13

dtplyr

Data table backend for dplyr
R
661
star
14

tibble

A modern re-imagining of the data frame
R
659
star
15

multidplyr

A dplyr backend that partitions a data frame over multiple processes
R
640
star
16

vroom

Fast reading of delimited files
C++
618
star
17

stringr

A fresh approach to string manipulation in R
R
594
star
18

forcats

🐈🐈🐈🐈: tools for working with categorical variables (factors)
R
551
star
19

dbplyr

Database (DBI) backend for dplyr
R
473
star
20

haven

Read SPSS, Stata and SAS files from R
C
423
star
21

modelr

Helper functions for modelling
R
401
star
22

googlesheets4

Google Spreadsheets R API (reboot of the googlesheets package)
R
354
star
23

googledrive

Google Drive R API
R
321
star
24

style

The tidyverse style guide for R code
HTML
291
star
25

duckplyr

A drop-in replacement for dplyr, powered by DuckDB for performance.
R
236
star
26

design

Tidyverse design principles
R
217
star
27

tidyverse.org

Source of tidyverse.org
HTML
191
star
28

hms

A simple class for storing time-of-day values
R
137
star
29

nycflights13

An R data package containing all out-bound flights from NYC in 2013 + useful metdata
R
127
star
30

tidyversedashboard

Tidyverse activity dashboard
R
71
star
31

tidy-dev-day

Tidyverse developer day
R
69
star
32

tidyeval

A guide to tidy evaluation
CSS
55
star
33

dsbox

Companion R package to Data Science Course in a Box
R
49
star
34

tidytemplate

A pkgdown template for core tidyverse packages
SCSS
45
star
35

blob

A simple S3 class for representing BLOBs
R
44
star
36

funs

Collection of low-level functions for working with vctrs
R
34
star
37

code-review

33
star
38

website-analytics

Web analytics for tidyverse + r-lib sites
R
28
star
39

tidyups

21
star
40

ggplot2-docs

ggplot2 documentation. Auto-generated from ggplot2 sources by pkgdown
HTML
10
star