• Stars
    star
    200
  • Rank 188,149 (Top 4 %)
  • Language
    R
  • License
    Other
  • Created almost 10 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Toolbox for Non-Tabular Data Manipulation

rlist

R-CMD-check codecov.io CRAN Version

rlist is a set of tools for working with list objects. Its goal is to make it easier to work with lists by providing a wide range of functions that operate on non-tabular data stored in them.

This package supports list mapping, filtering, grouping, sorting, updating, searching, file input/output, and many other functions. Most functions in the package are designed to be pipeline friendly so that data processing with lists can be chained.

rlist Tutorial is a highly recommended complete guide to rlist.

This document is also translated into 日本語 (by @teramonagi).

Installation

Install the latest version from GitHub:

devtools::install_github("renkun-ken/rlist")

Install from CRAN:

install.packages("rlist")

Motivation

In R, there are numerous powerful tools to deal with structured data stored in tabular form such as data frame. However, a variety of data is non-tabular: different records may have different fields; for each field they may have different number of values.

It is hard or no longer straightforward to store such data in data frame, but the list object in R is flexible enough to represent such records of diversity. rlist is a toolbox to deal with non-structured data stored in list objects, providing a collection of high-level functions which are pipeline friendly.

Getting started

Suppose we have a list of developers, each of whom has a name, age, a few interests, a list of programming languages they use and the number of years they have been using them.

library(rlist)
devs <- 
  list(
    p1=list(name="Ken",age=24,
      interest=c("reading","music","movies"),
      lang=list(r=2,csharp=4)),
    p2=list(name="James",age=25,
      interest=c("sports","music"),
      lang=list(r=3,java=2,cpp=5)),
    p3=list(name="Penny",age=24,
      interest=c("movies","reading"),
      lang=list(r=1,cpp=4,python=2)))

This type of data is non-relational since it does not well fit the shape of a data frame, yet it can be easily stored in JSON or YAML format. In R, list objects are flexible enough to represent a wide range of non-relational datasets like this. This package provides a wide range of functions to query and manipulate this type of data.

The following examples use str() to show the structure of the output.

Filtering

Filter those who like music and has been using R for more than 3 years.

str( list.filter(devs, "music" %in% interest & lang$r >= 3) )
List of 1
 $ p2:List of 4
  ..$ name    : chr "James"
  ..$ age     : num 25
  ..$ interest: chr [1:2] "sports" "music"
  ..$ lang    :List of 3
  .. ..$ r   : num 3
  .. ..$ java: num 2
  .. ..$ cpp : num 5

Selecting

Select their names and ages.

str( list.select(devs, name, age) )
List of 3
 $ p1:List of 2
  ..$ name: chr "Ken"
  ..$ age : num 24
 $ p2:List of 2
  ..$ name: chr "James"
  ..$ age : num 25
 $ p3:List of 2
  ..$ name: chr "Penny"
  ..$ age : num 24

Mapping

Map each of them to the number of interests.

str( list.map(devs, length(interest)) )
List of 3
 $ p1: int 3
 $ p2: int 2
 $ p3: int 2

More functions

In addition to these basic functions, rlist also supports various types of grouping, joining, searching, sorting, updating, etc. For the introduction to more functionality, please go through the rlist Tutorial.

Lambda expression

In this package, almost all functions that work with expressions accept the following forms of lambda expressions:

  • Implicit lambda expression: expression
  • Univariate lambda expressions:
    • x ~ expression
    • f(x) ~ expression
  • Multivariate lambda expressions:
    • f(x,i) ~ expression
    • f(x,i,name) ~ expression

where x refers to the list member itself, i denotes the index, name denotes the name. If the symbols are not explicitly declared, ., .i and .name will by default be used to represent them, respectively.

nums <- list(a=c(1,2,3),b=c(2,3,4),c=c(3,4,5))
list.map(nums, c(min=min(.),max=max(.)))
list.filter(nums, x ~ mean(x)>=3)
list.map(nums, f(x,i) ~ sum(x,i))

Using pipeline

Working with pipe syntax

Query the name of each developer who likes music and uses R, and put the results in a data frame.

devs |> 
  list.filter("music" %in% interest & "r" %in% names(lang)) |>
  list.select(name,age) |>
  list.stack()
   name age
1   Ken  24
2 James  25

The example above uses the pipe syntax |> introduced in R 4.1 that chains commands in a fluent style.

List environment

List() function wraps a list within an environment where almost all list functions are defined. Here is the List-environment version of the previous example.

ldevs <- List(devs)
ldevs$filter("music" %in% interest & "r" %in% names(lang))$
  select(name,age)$
  stack()$
  data
   name age
1   Ken  24
2 James  25

License

This package is under MIT License.

More Repositories

1

formattable

Formattable Data Structures
HTML
690
star
2

MacType.Decency

A MacType profile that provides decent solution to font rendering and font substitutions for Windows operating systems.
185
star
3

pipeR

Multi-Paradigm Pipeline Implementation
R
164
star
4

vscode-rcpp-demo

A demo project of writing and debugging Rcpp in VSCode
C++
53
star
5

MacType.Source

A MacType profile using Microsoft YaHei UI, and Source Family fonts
33
star
6

rtype

A strong type system for R
R
32
star
7

r-data-practice

R语言数据操作练习
20
star
8

rprintf

Adaptive builder for formatted strings
R
17
star
9

sharedata

Interprocess data sharing between R sessions
R
15
star
10

MacType.XHei.OSX

A configuration for MacType based on XHei OSX font family
11
star
11

japan-r-talk-2017

Slides and code for Japan R Talk 2017
R
10
star
12

dotfiles

My dotfiles
R
9
star
13

vscode-cpp11-demo

A demo project of writing and debugging cpp11 in VSCode
C++
8
star
14

secret

Make Secret Functions in Package
R
8
star
15

pipeR-tutorial

A tutorial for pipeR package
Shell
6
star
16

learning-r-programming-code-data

Code and Data for Learning R Programming book
R
6
star
17

reflite

A Lightweight Reference Object Implementation
R
4
star
18

trademetrics

A Collection of Trading Performance Measures
R
4
star
19

container

C++ Containers for R
C++
4
star
20

renkun-ken.github.io

HTML
4
star
21

rquant

A toolset for building, testing, and visualizing quantitative trading strategies
4
star
22

cpp-coreclr

A minimal example of calling .NET Core methods from C++
C++
4
star
23

rlist-tutorial

A tutorial for rlist package
R
4
star
24

ctpdemo

A minimal CTP demo
C
3
star
25

using-rstats-in-vscode

The presentation slides of Using R in VS Code
3
star
26

learnfsharp

A tutorial of applied F# in technical computing and simulation
F#
2
star
27

optiontools

A Toolbox for Financial Options
2
star
28

factory

A toolbox for creating random data sets in a wide range of forms
R
2
star
29

kNN

A k-NN based non-parametric non-linear non-stationary time series predictor
R
2
star
30

sknn

Similarity-based k-nearest neighbors algorithms for classification, regression, and clustering
R
2
star
31

nonparams

A comprehensive set of R codes for nonparametric statistics and econometrics
R
1
star
32

shm

A Toolbox for Working with Shared Memory
C++
1
star
33

refer

Functions with Reference Semantics
C++
1
star
34

xmu-thesis-lyx

A LyX Template for master/PhD thesis of Xiamen University
TeX
1
star
35

coder

Code tools for R
R
1
star
36

remoteR

Connect to Remote R Sessions
R
1
star
37

recursive

A toolbox for recursive statistical model fitting for descriptive and predictive modeling
1
star