• Stars
    star
    166
  • Rank 223,932 (Top 5 %)
  • Language
    R
  • License
    Other
  • Created over 10 years ago
  • Updated almost 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Multi-Paradigm Pipeline Implementation

pipeR

Linux Build Status Windows Build status codecov.io CRAN Version

pipeR provides various styles of function chaining methods:

  • Pipe operator
  • Pipe object
  • pipeline function

Each of them represents a distinct pipeline model but they share almost a common set of features. A value can be piped to the next expression

  • As the first unnamed argument of the function
  • As dot symbol (.) in the expression
  • As a named variable defined by a formula
  • For side-effect that carries over the input to the next
  • For assignment that saves an intermediate value

The syntax is designed to make the pipeline more readable and friendly to a wide variety of operations.

pipeR Tutorial is a highly recommended complete guide to pipeR.

This document is also translated into 日本語 (by @hoxo_m).

Installation

Install the latest development version from GitHub:

devtools::install_github("renkun-ken/pipeR")

Install from CRAN:

install.packages("pipeR")

Getting started

The following code is an example written in traditional approach:

It basically performs bootstrap on mpg values in built-in dataset mtcars and plots its density function estimated by Gaussian kernel.

plot(density(sample(mtcars$mpg, size = 10000, replace = TRUE), 
  kernel = "gaussian"), col = "red", main="density of mpg (bootstrap)")

The code is deeply nested and can be hard to read and maintain. In the following examples, the traditional code is rewritten by Pipe operator, Pipe() function and pipeline() function, respectively.

  • Operator-based pipeline
mtcars$mpg %>>%
  sample(size = 10000, replace = TRUE) %>>%
  density(kernel = "gaussian") %>>%
  plot(col = "red", main = "density of mpg (bootstrap)")
  • Object-based pipeline (Pipe())
Pipe(mtcars$mpg)$
  sample(size = 10000, replace = TRUE)$
  density(kernel = "gaussian")$
  plot(col = "red", main = "density of mpg (bootstrap)")
  • Argument-based pipeline
pipeline(mtcars$mpg,
  sample(size = 10000, replace = TRUE),
  density(kernel = "gaussian"),
  plot(col = "red", main = "density of mpg (bootstrap)"))
  • Expression-based pipeline
pipeline({
  mtcars$mpg
  sample(size = 10000, replace = TRUE)
  density(kernel = "gaussian")
  plot(col = "red", main = "density of mpg (bootstrap)")  
})

Usage

%>>%

Pipe operator %>>% basically pipes the left-hand side value forward to the right-hand side expression which is evaluated according to its syntax.

Pipe to first-argument of function

Many R functions are pipe-friendly: they take some data by the first argument and transform it in a certain way. This arrangement allows operations to be streamlined by pipes, that is, one data source can be put to the first argument of a function, get transformed, and put to the first argument of the next function. In this way, a chain of commands are connected, and it is called a pipeline.

On the right-hand side of %>>%, whenever a function name or call is supplied, the left-hand side value will always be put to the first unnamed argument to that function.

rnorm(100) %>>%
  plot
rnorm(100) %>>%
  plot(col="red")

Sometimes the value on the left is needed at multiple places. One can use . to represent it anywhere in the function call.

rnorm(100) %>>%
  plot(col="red", main=length(.))

There are situations where one calls a function in a namespace with ::. In this case, the call must end up with ().

rnorm(100) %>>%
  stats::median()
  
rnorm(100) %>>%
  graphics::plot(col = "red")

Pipe to . in an expression

Not all functions are pipe-friendly in every case: You may find some functions do not take your data produced by a pipeline as the first argument. In this case, you can enclose your expression by {} or () so that %>>% will use . to represent the value on the left.

mtcars %>>%
  { lm(mpg ~ cyl + wt, data = .) }
mtcars %>>%
  ( lm(mpg ~ cyl + wt, data = .) )

Pipe by formula as lambda expression

Sometimes, it may look confusing to use . to represent the value being piped. For example,

mtcars %>>%
  (lm(mpg ~ ., data = .))

Although it works perfectly, it may look ambiguous if . has several meanings in one line of code.

%>>% accepts lambda expression to direct its piping behavior. Lambda expression is characterized by a formula enclosed within (), for example, (x ~ f(x)). It contains a user-defined symbol to represent the value being piped and the expression to be evaluated.

mtcars %>>%
  (df ~ lm(mpg ~ ., data = df))
mtcars %>>%
  subset(select = c(mpg, wt, cyl)) %>>%
  (x ~ plot(mpg ~ ., data = x))

Pipe for side effect

In a pipeline, one may be interested not only in the final outcome but sometimes also in intermediate results. To print, plot or save the intermediate results, it must be a side-effect to avoid breaking the mainstream pipeline. For example, calling plot() to draw scatter plot returns NULL, and if one directly calls plot() in the middle of a pipeline, it would break the pipeline by changing the subsequent input to NULL.

One-sided formula that starts with ~ indicates that the right-hand side expression will only be evaluated for its side-effect, its value will be ignored, and the input value will be returned instead.

mtcars %>>%
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
  (~ cat("rows:",nrow(.),"\n")) %>>%   # cat() returns NULL
  summary
mtcars %>>%
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
  (~ plot(mpg ~ wt, data = .)) %>>%    # plot() returns NULL
  (lm(mpg ~ wt, data = .)) %>>%
  summary()

With ~, side-effect operations can be easily distinguished from mainstream pipeline.

An easier way to print the intermediate value it to use (? expr) syntax like asking question.

mtcars %>>% 
  (? ncol(.)) %>>%
  summary

Pipe with assignment

In addition to printing and plotting, one may need to save an intermediate value to the environment by assigning the value to a variable (symbol).

If one needs to assign the value to a symbol, just insert a step like (~ symbol), then the input value of that step will be assigned to symbol in the current environment.

mtcars %>>%
  (lm(formula = mpg ~ wt + cyl, data = .)) %>>%
  (~ lm_mtcars) %>>%
  summary

If the input value is not directly to be saved but after some transformation, then one can use =, <-, or more natural -> to specify a lambda expression to tell what to be saved (thanks @yanlinlin82 for suggestion).

mtcars %>>%
  (~ summ = summary(.)) %>>%  # side-effect assignment
  (lm(formula = mpg ~ wt + cyl, data = .)) %>>%
  (~ lm_mtcars) %>>%
  summary
mtcars %>>%
  (~ summary(.) -> summ) %>>%
  
mtcars %>>%
  (~ summ <- summary(.)) %>>%

An easier way to saving intermediate value that is to be further piped is to use (symbol = expression) syntax:

mtcars %>>%
  (~ summ = summary(.)) %>>%  # side-effect assignment
  (lm_mtcars = lm(formula = mpg ~ wt + cyl, data = .)) %>>%  # continue piping
  summary

or (expression -> symbol) syntax:

mtcars %>>%
  (~ summary(.) -> summ) %>>%  # side-effect assignment
  (lm(formula = mpg ~ wt + cyl, data = .) -> lm_mtcars) %>>%  # continue piping
  summary

Extract element from an object

x %>>% (y) means extracting the element named y from object x where y must be a valid symbol name and x can be a vector, list, environment or anything else for which [[]] is defined, or S4 object.

mtcars %>>%
  (lm(mpg ~ wt + cyl, data = .)) %>>%
  (~ lm_mtcars) %>>%
  summary %>>%
  (r.squared)

Compatibility

library(dplyr)
mtcars %>>%
  filter(mpg <= mean(mpg)) %>>%  
  select(mpg, wt, cyl) %>>%
  (~ plot(.)) %>>%
  (model = lm(mpg ~ wt + cyl, data = .)) %>>%
  (summ = summary(.)) %>>%
  (coefficients)
library(ggvis)
mtcars %>>%
  ggvis(~mpg, ~wt) %>>%
  layer_points()
library(rlist)
1:100 %>>%
  list.group(. %% 3) %>>%
  list.mapv(g ~ mean(g))

Pipe()

Pipe() creates a Pipe object that supports light-weight chaining without any external operator. Typically, start with Pipe() and end with $value or [] to extract the final value of the Pipe.

Pipe object provides an internal function .(...) that work exactly in the same way with x %>>% (...), and it has more features than %>>%.

NOTE: .() does not support assignment with = but supports ~, <- and ->.

Piping

Pipe(rnorm(1000))$
  density(kernel = "cosine")$
  plot(col = "blue")
Pipe(mtcars)$
  .(mpg)$
  summary()
Pipe(mtcars)$
  .(~ summary(.) -> summ)$
  lm(formula = mpg ~ wt + cyl)$
  summary()$
  .(coefficients)

Subsetting and extracting

pmtcars <- Pipe(mtcars)
pmtcars[c("mpg","wt")]$
  lm(formula = mpg ~ wt)$
  summary()
pmtcars[["mpg"]]$mean()

Assigning values

plist <- Pipe(list(a=1,b=2))
plist$a <- 0
plist$b <- NULL

Side effect

Pipe(mtcars)$
  .(? ncol(.))$
  .(~ plot(mpg ~ ., data = .))$    # side effect: plot
  lm(formula = mpg ~ .)$
  .(~ lm_mtcars)$                  # side effect: assign
  summary()$

Compatibility

  • Working with dplyr:
Pipe(mtcars)$
  filter(mpg >= mean(mpg))$
  select(mpg, wt, cyl)$
  lm(formula = mpg ~ wt + cyl)$
  summary()$
  .(coefficients)$
  value
  • Working with ggvis:
Pipe(mtcars)$
  ggvis(~ mpg, ~ wt)$
  layer_points()
  • Working with rlist:
Pipe(1:100)$
  list.group(. %% 3)$
  list.mapv(g ~ mean(g))$
  value

pipeline()

pipeline() provides argument-based and expression-based pipeline evaluation mechanisms. Its behavior depends on how its arguments are supplied. If only the first argument is supplied, it expects an expression enclosed in {} in which each line represents a pipeline step. If, instead, multiple arguments are supplied, it regards each argument as a pipeline step. For all pipeline steps, the expressions will be transformed to be connected by %>>% so that they behave exactly the same.

One notable difference is that in pipeline()'s argument or expression, the special symbols to perform specially defined pipeline tasks (e.g. side-effect) does not need to be enclosed within () because no operator priority issues arise as they do in using %>>%.

pipeline({
  mtcars
  lm(formula = mpg ~ cyl + wt)
  ~ lmodel
  summary
  ? .$r.squared
  coef
})

Thanks @hoxo_m for the idea presented in this post.

License

This package is under MIT License.

More Repositories

1

formattable

Formattable Data Structures
HTML
692
star
2

rlist

A Toolbox for Non-Tabular Data Manipulation
R
201
star
3

MacType.Decency

A MacType profile that provides decent solution to font rendering and font substitutions for Windows operating systems.
185
star
4

vscode-rcpp-demo

A demo project of writing and debugging Rcpp in VSCode
C++
54
star
5

MacType.Source

A MacType profile using Microsoft YaHei UI, and Source Family fonts
33
star
6

rtype

A strong type system for R
R
32
star
7

r-data-practice

R语言数据操作练习
24
star
8

rprintf

Adaptive builder for formatted strings
R
17
star
9

sharedata

Interprocess data sharing between R sessions
R
15
star
10

MacType.XHei.OSX

A configuration for MacType based on XHei OSX font family
11
star
11

japan-r-talk-2017

Slides and code for Japan R Talk 2017
R
10
star
12

dotfiles

My dotfiles
R
9
star
13

vscode-cpp11-demo

A demo project of writing and debugging cpp11 in VSCode
C++
8
star
14

secret

Make Secret Functions in Package
R
8
star
15

pipeR-tutorial

A tutorial for pipeR package
Shell
6
star
16

learning-r-programming-code-data

Code and Data for Learning R Programming book
R
6
star
17

reflite

A Lightweight Reference Object Implementation
R
4
star
18

trademetrics

A Collection of Trading Performance Measures
R
4
star
19

container

C++ Containers for R
C++
4
star
20

renkun-ken.github.io

HTML
4
star
21

rquant

A toolset for building, testing, and visualizing quantitative trading strategies
4
star
22

cpp-coreclr

A minimal example of calling .NET Core methods from C++
C++
4
star
23

rlist-tutorial

A tutorial for rlist package
R
4
star
24

ctpdemo

A minimal CTP demo
C
3
star
25

using-rstats-in-vscode

The presentation slides of Using R in VS Code
3
star
26

learnfsharp

A tutorial of applied F# in technical computing and simulation
F#
2
star
27

optiontools

A Toolbox for Financial Options
2
star
28

factory

A toolbox for creating random data sets in a wide range of forms
R
2
star
29

kNN

A k-NN based non-parametric non-linear non-stationary time series predictor
R
2
star
30

sknn

Similarity-based k-nearest neighbors algorithms for classification, regression, and clustering
R
2
star
31

nonparams

A comprehensive set of R codes for nonparametric statistics and econometrics
R
1
star
32

shm

A Toolbox for Working with Shared Memory
C++
1
star
33

refer

Functions with Reference Semantics
C++
1
star
34

xmu-thesis-lyx

A LyX Template for master/PhD thesis of Xiamen University
TeX
1
star
35

coder

Code tools for R
R
1
star
36

remoteR

Connect to Remote R Sessions
R
1
star
37

recursive

A toolbox for recursive statistical model fitting for descriptive and predictive modeling
1
star