• Stars
    star
    281
  • Rank 141,366 (Top 3 %)
  • Language
    C++
  • License
    Other
  • Created over 11 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast and portable character string processing in R (with the Unicode ICU)

Fast and Portable Character String Processing in R (with the Unicode ICU)

Build Status RStudio CRAN mirror downloads RStudio CRAN mirror downloads RStudio CRAN mirror downloads

A comprehensive tutorial and reference manual is available at https://stringi.gagolewski.com/.

Check out stringx for a set of wrappers around stringi with a base R-compatible API.

To learn more about R, check out Marek's open-access (free!) textbook Deep R Programming.

stringi (pronounced “stringy”, IPA [strinɡi]) is THE R package for string/text/natural language processing. It is very fast, consistent, convenient, and — thanks to the ICU – International Components for Unicode library — portable across all locales and platforms.

Available features include:

  • string concatenation, padding, wrapping,
  • substring extraction,
  • pattern searching (e.g., with Java-like regular expressions),
  • collation and sorting,
  • random string generation,
  • case mapping and folding,
  • string transliteration,
  • Unicode normalisation,
  • date-time formatting and parsing,

and many more.

Package Maintainer: Marek Gagolewski

Authors and Contributors: Marek Gagolewski, with contributions from Bartłomiej Tartanus and many others.

The package's API was inspired by that of the early (pre-tidyverse; v0.6.2) version of Hadley Wickham's stringr package (and since the 2015 v1.0.0 stringr is powered by stringi).

Homepage: https://stringi.gagolewski.com/

Citation: Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1–59, https://dx.doi.org/10.18637/jss.v103.i02.

CRAN Entry: https://CRAN.R-project.org/package=stringi

System Requirements: R >= 3.4, ICU4C >= 61 (refer to the INSTALL file for more details)

License: stringi's source code is distributed under the open source BSD-3-clause license. For more details, see LICENSE.

This git repository also contains a custom subset of ICU4C source code which is copyrighted by Unicode, Inc. and others. A binary version of the Unicode Character Database is included. For more details on copyright holders, see LICENSE. The ICU project is covered by the Unicode license — a simple, permissive non-copyleft free software license, compatible with the GNU GPL. The ICU license is intended to allow ICU to be included in free software projects as well as in proprietary or commercial products.

Changes: see the NEWS file.

How to access the stringi C++ API from within an Rcpp-based R package

More Repositories

1

deepr

Deep R Programming (Open-Access Textbook) (COMPLETE!)
71
star
2

datawranglingpy

Minimalist Data Wrangling with Python (Open-Access Textbook)
60
star
3

genieclust

Genie: Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R
C++
48
star
4

clustering-benchmarks

A framework for benchmarking clustering algorithms
Python
31
star
5

TurtleGraphics

Turtle Graphics in R
HTML
22
star
6

stringx

Drop-in replacements for base R string functions powered by stringi
HTML
21
star
7

genie

Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)
C++
21
star
8

teaching-data

Dr Marek's Data for Teaching/Training
14
star
9

lmlcr

Lightweight Machine Learning Classics with R (Book Draft)
TeX
14
star
10

ExampleRcppStringi

Access to the stringi API from within an Rcpp-based Project
C++
11
star
11

Programowanie_w_jezyku_R

M. Gągolewski, Programowanie w języku R, PWN, 2016
R
10
star
12

realtest

When Expectations Meet Reality: Realistic Unit Testing in R
R
10
star
13

FuzzyNumbers

Tools to Deal with Fuzzy Numbers in R
R
10
star
14

Analiza_danych_w_jezyku_Python

M. Gągolewski, M. Bartoszuk, A. Cena, Przetwarzanie i analiza danych w języku Python, PWN, 2016
Python
10
star
15

CITAN

[DEPRECATED] CITation ANalysis toolpack
R
6
star
16

icudt

[DEPRECATED] ICU data files packaged for use in R
R
5
star
17

agop

Aggregation Operators Package for R
R
5
star
18

DataStructures

General data structures and algorithms for use with R
C++
5
star
19

three_dimensions_of_scientific_impact

Data Repository - PNAS Paper "The Three Dimensions of Scientific Impact" by Siudem, Żogała-Siudem, Cena, and Gagolewski
Jupyter Notebook
4
star
20

aipp

Algorytmy i podstawy programowania w języku C++ (in Polish)
C++
3
star
21

bibliography

Marek's Publications in BibTeX + some preprints
TeX
3
star
22

gagolews

Marek Gagolewski
3
star
23

clustering-data-v1

A framework for benchmarking clustering algorithms – Benchmark suite, version 1
Jupyter Notebook
2
star
24

ordinal-regression-data

Ordinal Regression Benchmark Data
2
star
25

www-public

Marek's Homepage
1
star
26

Playground.jl

[DEPRECATED] My Julia functions (for testing, etc.)
Julia
1
star
27

clustering-results-v1

A framework for benchmarking clustering algorithms – Benchmark results (for version 1 of the Suite)
Python
1
star