• Stars
    star
    533
  • Rank 80,401 (Top 2 %)
  • Language
    R
  • License
    Other
  • Created almost 8 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🐈🐈🐈🐈: tools for working with categorical variables (factors)

forcats

CRAN status R-CMD-check Codecov test coverage

Overview

R uses factors to handle categorical variables, variables that have a fixed and known set of possible values. Factors are also helpful for reordering character vectors to improve display. The goal of the forcats package is to provide a suite of tools that solve common problems with factors, including changing the order of levels or the values. Some examples include:

  • fct_reorder(): Reordering a factor by another variable.
  • fct_infreq(): Reordering a factor by the frequency of values.
  • fct_relevel(): Changing the order of a factor by hand.
  • fct_lump(): Collapsing the least/most frequent values of a factor into β€œother”.

You can learn more about each of these in vignette("forcats"). If you’re new to factors, the best place to start is the chapter on factors in R for Data Science.

Installation

# The easiest way to get forcats is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just forcats:
install.packages("forcats")

# Or the the development version from GitHub:
# install.packages("pak")
pak::pak("tidyverse/forcats")

Cheatsheet

Getting started

forcats is part of the core tidyverse, so you can load it with library(tidyverse) or library(forcats).

library(forcats)
library(dplyr)
library(ggplot2)
starwars %>% 
  filter(!is.na(species)) %>%
  count(species, sort = TRUE)
#> # A tibble: 37 Γ— 2
#>    species      n
#>    <chr>    <int>
#>  1 Human       35
#>  2 Droid        6
#>  3 Gungan       3
#>  4 Kaminoan     2
#>  5 Mirialan     2
#>  6 Twi'lek      2
#>  7 Wookiee      2
#>  8 Zabrak       2
#>  9 Aleena       1
#> 10 Besalisk     1
#> # β„Ή 27 more rows
starwars %>%
  filter(!is.na(species)) %>%
  mutate(species = fct_lump(species, n = 3)) %>%
  count(species)
#> # A tibble: 4 Γ— 2
#>   species     n
#>   <fct>   <int>
#> 1 Droid       6
#> 2 Gungan      3
#> 3 Human      35
#> 4 Other      39
ggplot(starwars, aes(x = eye_color)) + 
  geom_bar() + 
  coord_flip()

starwars %>%
  mutate(eye_color = fct_infreq(eye_color)) %>%
  ggplot(aes(x = eye_color)) + 
  geom_bar() + 
  coord_flip()

More resources

For a history of factors, I recommend stringsAsFactors: An unauthorized biography by Roger Peng and stringsAsFactors = <sigh> by Thomas Lumley. If you want to learn more about other approaches to working with factors and categorical data, I recommend Wrangling categorical data in R, by Amelia McNamara and Nicholas Horton.

Getting help

If you encounter a clear bug, please file a minimal reproducible example on Github. For questions and other discussion, please use community.rstudio.com.

More Repositories

1

ggplot2

An implementation of the Grammar of Graphics in R
R
6,324
star
2

dplyr

dplyr: A grammar of data manipulation
R
4,627
star
3

tidyverse

Easily install and load packages from the tidyverse
R
1,575
star
4

rvest

Simple web scraping for R
R
1,455
star
5

tidyr

Tidy Messy Data
R
1,323
star
6

purrr

A functional programming toolkit for R
R
1,211
star
7

readr

Read flat files (csv, tsv, fwf) into R
R
985
star
8

magrittr

Improve the readability of R code with the pipe
R
952
star
9

datascience-box

Data Science Course in a Box
JavaScript
895
star
10

reprex

Render bits of R code for sharing, e.g., on GitHub or StackOverflow.
R
726
star
11

readxl

Read excel files (.xls and .xlsx) into R πŸ–‡
C++
713
star
12

lubridate

Make working with dates in R just that little bit easier
R
712
star
13

glue

Glue strings to data in R. Small, fast, dependency free interpreted string literals.
R
685
star
14

dtplyr

Data table backend for dplyr
R
656
star
15

tibble

A modern re-imagining of the data frame
R
641
star
16

multidplyr

A dplyr backend that partitions a data frame over multiple processes
R
636
star
17

vroom

Fast reading of delimited files
C++
604
star
18

stringr

A fresh approach to string manipulation in R
R
565
star
19

dbplyr

Database (DBI) backend for dplyr
R
455
star
20

haven

Read SPSS, Stata and SAS files from R
C
421
star
21

modelr

Helper functions for modelling
R
398
star
22

googlesheets4

Google Spreadsheets R API (reboot of the googlesheets package)
R
347
star
23

googledrive

Google Drive R API
R
312
star
24

style

The tidyverse style guide for R code
HTML
285
star
25

design

Tidyverse design principles
R
208
star
26

tidyverse.org

Source of tidyverse.org
HTML
189
star
27

hms

A simple class for storing time-of-day values
R
136
star
28

nycflights13

An R data package containing all out-bound flights from NYC in 2013 + useful metdata
R
121
star
29

tidyversedashboard

Tidyverse activity dashboard
R
71
star
30

tidy-dev-day

Tidyverse developer day
59
star
31

tidyeval

A guide to tidy evaluation
CSS
54
star
32

dsbox

Companion R package to Data Science Course in a Box
R
47
star
33

tidytemplate

A pkgdown template for core tidyverse packages
SCSS
45
star
34

blob

A simple S3 class for representing BLOBs
R
44
star
35

code-review

32
star
36

funs

Collection of low-level functions for working with vctrs
R
31
star
37

website-analytics

Web analytics for tidyverse + r-lib sites
R
28
star
38

tidyups

20
star
39

ggplot2-docs

ggplot2 documentation. Auto-generated from ggplot2 sources by pkgdown
HTML
10
star