• Stars
    star
    594
  • Rank 75,329 (Top 2 %)
  • Language
    R
  • License
    Other
  • Created about 15 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A fresh approach to string manipulation in R

stringr

CRAN status R-CMD-check Codecov test coverage Lifecycle: stable

Overview

Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparation tasks. The stringr package provides a cohesive set of functions designed to make working with strings as easy as possible. If youโ€™re not familiar with strings, the best place to start is the chapter on strings in R for Data Science.

stringr is built on top of stringi, which uses the ICU C library to provide fast, correct implementations of common string manipulations. stringr focusses on the most important and commonly used string manipulation functions whereas stringi provides a comprehensive set covering almost anything you can imagine. If you find that stringr is missing a function that you need, try looking in stringi. Both packages share similar conventions, so once youโ€™ve mastered stringr, you should find stringi similarly easy to use.

Installation

# The easiest way to get stringr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just stringr:
install.packages("stringr")

Cheatsheet

Usage

All functions in stringr start with str_ and take a vector of strings as the first argument:

x <- c("why", "video", "cross", "extra", "deal", "authority")
str_length(x) 
#> [1] 3 5 5 5 4 9
str_c(x, collapse = ", ")
#> [1] "why, video, cross, extra, deal, authority"
str_sub(x, 1, 2)
#> [1] "wh" "vi" "cr" "ex" "de" "au"

Most string functions work with regular expressions, a concise language for describing patterns of text. For example, the regular expression "[aeiou]" matches any single character that is a vowel:

str_subset(x, "[aeiou]")
#> [1] "video"     "cross"     "extra"     "deal"      "authority"
str_count(x, "[aeiou]")
#> [1] 0 3 1 2 2 4

There are seven main verbs that work with patterns:

  • str_detect(x, pattern) tells you if thereโ€™s any match to the pattern:

    str_detect(x, "[aeiou]")
    #> [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
  • str_count(x, pattern) counts the number of patterns:

    str_count(x, "[aeiou]")
    #> [1] 0 3 1 2 2 4
  • str_subset(x, pattern) extracts the matching components:

    str_subset(x, "[aeiou]")
    #> [1] "video"     "cross"     "extra"     "deal"      "authority"
  • str_locate(x, pattern) gives the position of the match:

    str_locate(x, "[aeiou]")
    #>      start end
    #> [1,]    NA  NA
    #> [2,]     2   2
    #> [3,]     3   3
    #> [4,]     1   1
    #> [5,]     2   2
    #> [6,]     1   1
  • str_extract(x, pattern) extracts the text of the match:

    str_extract(x, "[aeiou]")
    #> [1] NA  "i" "o" "e" "e" "a"
  • str_match(x, pattern) extracts parts of the match defined by parentheses:

    # extract the characters on either side of the vowel
    str_match(x, "(.)[aeiou](.)")
    #>      [,1]  [,2] [,3]
    #> [1,] NA    NA   NA  
    #> [2,] "vid" "v"  "d" 
    #> [3,] "ros" "r"  "s" 
    #> [4,] NA    NA   NA  
    #> [5,] "dea" "d"  "a" 
    #> [6,] "aut" "a"  "t"
  • str_replace(x, pattern, replacement) replaces the matches with new text:

    str_replace(x, "[aeiou]", "?")
    #> [1] "why"       "v?deo"     "cr?ss"     "?xtra"     "d?al"      "?uthority"
  • str_split(x, pattern) splits up a string into multiple pieces:

    str_split(c("a,b", "c,d,e"), ",")
    #> [[1]]
    #> [1] "a" "b"
    #> 
    #> [[2]]
    #> [1] "c" "d" "e"

As well as regular expressions (the default), there are three other pattern matching engines:

  • fixed(): match exact bytes
  • coll(): match human letters
  • boundary(): match boundaries

RStudio Addin

The RegExplain RStudio addin provides a friendly interface for working with regular expressions and functions from stringr. This addin allows you to interactively build your regexp, check the output of common string matching functions, consult the interactive help pages, or use the included resources to learn regular expressions.

This addin can easily be installed with devtools:

# install.packages("devtools")
devtools::install_github("gadenbuie/regexplain")

Compared to base R

R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R.

  • Uses consistent function and argument names. The first argument is always the vector of strings to modify, which makes stringr work particularly well in conjunction with the pipe:

    letters %>%
      .[1:10] %>% 
      str_pad(3, "right") %>%
      str_c(letters[2:11])
    #>  [1] "a  b" "b  c" "c  d" "d  e" "e  f" "f  g" "g  h" "h  i" "i  j" "j  k"
  • Simplifies string operations by eliminating options that you donโ€™t need 95% of the time.

  • Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs.

Learn more in vignette("from-base")

More Repositories

1

ggplot2

An implementation of the Grammar of Graphics in R
R
6,496
star
2

dplyr

dplyr: A grammar of data manipulation
R
4,725
star
3

tidyverse

Easily install and load packages from the tidyverse
R
1,633
star
4

rvest

Simple web scraping for R
R
1,488
star
5

tidyr

Tidy Messy Data
R
1,369
star
6

purrr

A functional programming toolkit for R
R
1,254
star
7

readr

Read flat files (csv, tsv, fwf) into R
R
1,001
star
8

magrittr

Improve the readability of R code with the pipe
R
957
star
9

datascience-box

Data Science Course in a Box
JavaScript
937
star
10

reprex

Render bits of R code for sharing, e.g., on GitHub or StackOverflow.
R
735
star
11

lubridate

Make working with dates in R just that little bit easier
R
727
star
12

readxl

Read excel files (.xls and .xlsx) into R ๐Ÿ–‡
C++
726
star
13

glue

Glue strings to data in R. Small, fast, dependency free interpreted string literals.
R
705
star
14

dtplyr

Data table backend for dplyr
R
661
star
15

tibble

A modern re-imagining of the data frame
R
659
star
16

multidplyr

A dplyr backend that partitions a data frame over multiple processes
R
640
star
17

vroom

Fast reading of delimited files
C++
618
star
18

forcats

๐Ÿˆ๐Ÿˆ๐Ÿˆ๐Ÿˆ: tools for working with categorical variables (factors)
R
551
star
19

dbplyr

Database (DBI) backend for dplyr
R
473
star
20

haven

Read SPSS, Stata and SAS files from R
C
423
star
21

modelr

Helper functions for modelling
R
401
star
22

googlesheets4

Google Spreadsheets R API (reboot of the googlesheets package)
R
354
star
23

googledrive

Google Drive R API
R
321
star
24

style

The tidyverse style guide for R code
HTML
291
star
25

duckplyr

A drop-in replacement for dplyr, powered by DuckDB for performance.
R
236
star
26

design

Tidyverse design principles
R
217
star
27

tidyverse.org

Source of tidyverse.org
HTML
191
star
28

hms

A simple class for storing time-of-day values
R
137
star
29

nycflights13

An R data package containing all out-bound flights from NYC in 2013 + useful metdata
R
127
star
30

tidyversedashboard

Tidyverse activity dashboard
R
71
star
31

tidy-dev-day

Tidyverse developer day
R
69
star
32

tidyeval

A guide to tidy evaluation
CSS
55
star
33

dsbox

Companion R package to Data Science Course in a Box
R
49
star
34

tidytemplate

A pkgdown template for core tidyverse packages
SCSS
45
star
35

blob

A simple S3 class for representing BLOBs
R
44
star
36

funs

Collection of low-level functions for working with vctrs
R
34
star
37

code-review

33
star
38

website-analytics

Web analytics for tidyverse + r-lib sites
R
28
star
39

tidyups

21
star
40

ggplot2-docs

ggplot2 documentation. Auto-generated from ggplot2 sources by pkgdown
HTML
10
star