• Stars
    star
    267
  • Rank 144,972 (Top 3 %)
  • Language
    R
  • License
    Other
  • Created over 4 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Sliding Window Functions

slider

Codecov test coverage R-CMD-check

slider provides a family of general purpose “sliding window” functions. The API is purposefully very similar to purrr. The goal of these functions is usually to compute rolling averages, cumulative sums, rolling regressions, or other “window” based computations.

There are 3 core functions in slider:

  • slide() iterates over your data like purrr::map(), but uses a sliding window to do so. It is type-stable, and always returns a result with the same size as its input.

  • slide_index() computes a rolling calculation relative to an index. If you have ever wanted to compute something like a “3 month rolling average” where the number of days in each month is irregular, you might like this function.

  • slide_period() is similar to slide_index() in that it slides relative to an index, but it first breaks the index up into “time blocks”, like 2 month blocks of time, and then it slides over .x using indices defined by those blocks.

Each of these core functions have the same variants as purrr::map(). For example, slide() has slide_dbl(), slide2(), and pslide(), along with the other combinations of these variants that you might expect from having previously used purrr.

To learn more about these three functions, read the introduction vignette.

There are also a set of extremely fast specialized variants of slide_dbl() for the most common use cases. These include slide_sum() for rolling sums and slide_mean() for rolling averages. There are index variants of each of these as well, like slide_index_sum().

Installation

Install the released version from CRAN with:

install.packages("slider")

Install the development version from GitHub with:

remotes::install_github("r-lib/slider")

Examples

The help page for slide() has many examples, but here are a few:

library(slider)

The classic example would be to do a moving average. slide() handles this with a combination of the .before and .after arguments, which control the width of the window and the alignment.

# Moving average (Aligned right)
# "The current element + 2 elements before"
slide_dbl(1:5, ~mean(.x), .before = 2)
#> [1] 1.0 1.5 2.0 3.0 4.0

# Align left
# "The current element + 2 elements after"
slide_dbl(1:5, ~mean(.x), .after = 2)
#> [1] 2.0 3.0 4.0 4.5 5.0

# Center aligned
# "The current element + 1 element before + 1 element after"
slide_dbl(1:5, ~mean(.x), .before = 1, .after = 1)
#> [1] 1.5 2.0 3.0 4.0 4.5

With Inf, you can do a “cumulative slide” to compute cumulative expressions. I think of this as saying “give me everything before the current element.”

slide(1:4, ~.x, .before = Inf)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1 2
#> 
#> [[3]]
#> [1] 1 2 3
#> 
#> [[4]]
#> [1] 1 2 3 4

With .complete, you can decide whether or not .f should be evaluated on incomplete windows. In the following example, the requested window size is 3, but the first two results are computed on windows of size 1 and 2 because partial results are allowed by default. When .complete is set to TRUE, the first two results are not computed.

slide(1:4, ~.x, .before = 2)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1 2
#> 
#> [[3]]
#> [1] 1 2 3
#> 
#> [[4]]
#> [1] 2 3 4

slide(1:4, ~.x, .before = 2, .complete = TRUE)
#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> [1] 1 2 3
#> 
#> [[4]]
#> [1] 2 3 4

Data frames

Unlike purrr::map(), slide() iterates over data frames in a row wise fashion. Interestingly this means the default of slide() becomes a generic row wise iterator, with nice syntax for accessing data frame columns.

There is a vignette specifically about this.

mini_cars <- cars[1:4,]

slide(mini_cars, ~.x)
#> [[1]]
#>   speed dist
#> 1     4    2
#> 
#> [[2]]
#>   speed dist
#> 1     4   10
#> 
#> [[3]]
#>   speed dist
#> 1     7    4
#> 
#> [[4]]
#>   speed dist
#> 1     7   22

slide_dbl(mini_cars, ~.x$speed + .x$dist)
#> [1]  6 14 11 29

This makes rolling regressions trivial!

library(tibble)
set.seed(123)

df <- tibble(
  y = rnorm(100),
  x = rnorm(100)
)

# Window size of 20 rows
# The current row + 19 before
# (see slide_index() for how to do this relative to a date vector!)
df$regressions <- slide(df, ~lm(y ~ x, data = .x), .before = 19, .complete = TRUE)

df[15:25,]
#> # A tibble: 11 × 3
#>         y      x regressions
#>     <dbl>  <dbl> <list>     
#>  1 -0.556  0.519 <NULL>     
#>  2  1.79   0.301 <NULL>     
#>  3  0.498  0.106 <NULL>     
#>  4 -1.97  -0.641 <NULL>     
#>  5  0.701 -0.850 <NULL>     
#>  6 -0.473 -1.02  <lm>       
#>  7 -1.07   0.118 <lm>       
#>  8 -0.218 -0.947 <lm>       
#>  9 -1.03  -0.491 <lm>       
#> 10 -0.729 -0.256 <lm>       
#> 11 -0.625  1.84  <lm>

Index sliding

In many business settings, the value you want to compute is tied to some index, like a date vector. In these cases, you’ll probably want to compute sliding windows relative to the index, and not using the fixed window that slide() provides. You can use slide_index() to pass in both .x and an index, .i, and the window will be calculated relative to that index.

Here, when computing a “2 day window”, you probably don’t want "2019-08-16" and "2019-08-18" to be grouped together. slide() has no concept of an index, so when you specify a window size of 2, it will group these two together. slide_index(), on the other hand, will do the right thing.

x <- 1:3
i <- as.Date(c("2019-08-15", "2019-08-16", "2019-08-18"))

# slide() has no concept of an "index"
slide(x, ~.x, .before = 1)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1 2
#> 
#> [[3]]
#> [1] 2 3

# "index aware"
slide_index(x, i, ~.x, .before = 1)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1 2
#> 
#> [[3]]
#> [1] 3

Essentially what happens is that when we get to "2019-08-18", it “looks backwards” 1 day to set a window boundary at "2019-08-17". Since the date at position 2, "2019-08-16", is before "2019-08-17", it is not included.

Powerfully, you can pass through any object to .before that computes a value from .i - .before. This means that you could also have used a lubridate period object (which gets even more interesting when you use weeks() or months()):

slide_index(x, i, ~.x, .before = lubridate::days(1))
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1 2
#> 
#> [[3]]
#> [1] 3

Period sliding

slide_period() is different from slide_index() in that it first breaks the index into “time blocks” and then slides over .x relative to those blocks. For example, in the monthly period slide below, i is broken up into 4 time blocks of “the current block of monthly data, plus one block before this one”. The locations of those blocks are the locations that are used to slice .x with.

i <- as.Date(c(
  "2019-01-29", 
  "2019-01-30", 
  "2019-02-05", 
  "2019-04-01", 
  "2019-05-10"
))

slide_period(i, i, "month", ~.x, .before = 1)
#> [[1]]
#> [1] "2019-01-29" "2019-01-30"
#> 
#> [[2]]
#> [1] "2019-01-29" "2019-01-30" "2019-02-05"
#> 
#> [[3]]
#> [1] "2019-04-01"
#> 
#> [[4]]
#> [1] "2019-04-01" "2019-05-10"

One neat thing to notice is that slide_period() is aware of the distance between elements of .i in the period you specify. The practical implication of this is that in the above example, group 3 with 2019-04-01 did not include 2019-02-05 in it, because it is more than 1 month group away.

Inspiration

This package is inspired heavily by SQL’s window functions. The API is similar, but more general because you can iterate over any kind of R object.

There have been multiple attempts at creating sliding window functions (I personally created rollify(), and worked a little bit on tsibble::slide() with Earo Wang).

  • zoo::rollapply()
  • tibbletime::rollify()
  • tsibble::slide()

I believe that slider is the next iteration of these. There are a few reasons for this:

  • To me, the API is more intuitive, and is more flexible because .before and .after let you completely control the entry point (as opposed to fixed entry points like "center", "left", etc.

  • It is objectively faster because it is written purely in C.

  • With slide_vec() you can return any kind of object, and are not limited to the suffixed versions: _dbl, _int, etc.

  • It iterates rowwise over data frames, consistent with the vctrs framework.

  • I believe it is overall more consistent, backed by a theory that can always justify the sliding window generated by any combination of the parameters.

Earo and I have spoken, and we have mutually agreed that it would be best to deprecate tsibble::slide() in favor of slider::slide().

Additionally, data.table’s non-equi joins have been pretty much the only solution to the problem that slide_index() tries to solve. Their solution is robust and quite fast, and has been a nice benchmark for slider. slider is trying to solve a much narrower problem, so the API here is more focused.

Performance

Like purrr::map(), the core functions of slider, such as slide() and slide_index(), are optimized in C to be as fast as possible, but there is overhead involved in calling .f repeatedly. These functions are meant to be as general purpose as possible, at the cost of some performance. This means that slider can be used for more abstract computations, like rolling regressions, or any other custom function that you want to use in a rolling fashion.

slider also provides specialized functions for some of the most common use cases, such as slide_mean(), or slide_index_sum(). These compute their corresponding metric at the C level, using a specialized algorithm, and are often much faster than their slide_dbl(x, fn) equivalent.

References

I’ve found the following references very useful to understand more about window functions:

Code of Conduct

Please note that the slider project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

More Repositories

1

devtools

Tools to make an R developer's life easier
R
2,336
star
2

lintr

Static Code Analysis for R
R
1,135
star
3

httr

httr: a friendly http package for R
R
975
star
4

actions

GitHub Actions for the R community
JavaScript
868
star
5

testthat

An R 📦 to make testing 😀
R
849
star
6

usethis

Set up commonly used 📦 components
R
798
star
7

pkgdown

Generate static html documentation for an R package
R
686
star
8

styler

Non-invasive pretty printing of R code
R
657
star
9

pak

A fresh approach to package installation
C
575
star
10

cli

Tools for making beautiful & useful command line interfaces
R
571
star
11

roxygen2

Generate R package documentation from inline R comments
R
554
star
12

rig

The R Installation Manager
Rust
460
star
13

rlang

Low-level API for programming with R
R
454
star
14

progress

Progress bar in your R terminal
R
447
star
15

R6

Encapsulated object-oriented programming for R
R
393
star
16

here

A simpler way to find your files
R
387
star
17

scales

Tools for ggplot2 scales
R
373
star
18

fs

Provide cross platform file operations based on libuv.
C
353
star
19

covr

Test coverage reports for R
R
328
star
20

rex

Friendly regular expressions for R.
R
325
star
21

crayon

🖍️ R package for colored terminal output — now superseded by cli
R
321
star
22

memoise

Easy memoisation for R
R
310
star
23

remotes

Install R packages from GitHub, GitLab, Bitbucket, git, svn repositories, URLs
R
309
star
24

lobstr

Understanding complex R objects with tools similar to str()
R
294
star
25

callr

Call R from R
R
281
star
26

vctrs

Generic programming with typed R vectors
C
272
star
27

waldo

Find differences between R objects
R
272
star
28

zeallot

Variable assignment with zeal! (or multiple, unpacking, and destructuring assignment in R)
R
245
star
29

conflicted

An alternative conflict resolution strategy for R
R
242
star
30

bench

High Precision Timing of R Expressions
R
237
star
31

gmailr

Access the Gmail RESTful API from R.
R
234
star
32

processx

Execute and Control Subprocesses from R
R
225
star
33

xml2

Bindings to libxml2
R
212
star
34

asciicast

Turn R scripts into terminal screencasts
R
211
star
35

gh

Minimalistic GitHub API client in R
R
210
star
36

httr2

Make HTTP requests and process their responses. A modern reimagining of httr.
R
206
star
37

cpp11

cpp11 helps you to interact with R objects using C++ code.
C++
187
star
38

keyring

🔐 Access the system credential store from R
R
185
star
39

vdiffr

Visual regression testing and graphical diffing with testthat
C++
177
star
40

svglite

A lightweight svg graphics device for R
C++
177
star
41

pillar

Format columns with colour
R
173
star
42

ragg

Graphic Devices Based on AGG
C++
169
star
43

ymlthis

write YAML for R Markdown, bookdown, blogdown, and more
R
163
star
44

hugodown

Make websites with hugo and RMarkdown
R
163
star
45

withr

Methods For Temporarily Modifying Global State
R
162
star
46

coro

Coroutines for R
R
146
star
47

rprojroot

Finding files in project subdirectories
R
146
star
48

debugme

Easy and efficient debugging for R packages
R
144
star
49

available

Check if a package name is available to use
R
141
star
50

ellipsis

Tools for Working with ...
R
138
star
51

archive

R bindings to libarchive, supporting a large variety of archive formats
C++
138
star
52

gert

Simple git client for R
C
136
star
53

later

Schedule an R function or formula to run after a specified period of time.
C++
132
star
54

rray

Simple Arrays
R
130
star
55

isoband

isoband: An R package to generate contour lines and polygons.
C++
130
star
56

fastmap

Fast map implementation for R
C++
128
star
57

prettyunits

Pretty, human readable formatting of quantities
JavaScript
126
star
58

tidyselect

A backend for functions taking tidyverse selections
R
122
star
59

desc

Manipulate DESCRIPTION files
R
120
star
60

gargle

Infrastructure for calling Google APIs from R, including auth
R
112
star
61

rcmdcheck

Run R CMD check from R and collect the results
R
110
star
62

evaluate

A version of eval for R that returns more information about what happened
R
107
star
63

prettycode

Syntax highlight R code in the terminal
R
100
star
64

mockery

A mocking library for R.
R
100
star
65

sloop

S language OOP ⛵️
R
98
star
66

pkgdepends

R Package Dependency Resolution
R
93
star
67

revdepcheck

R package reverse dependency checking
R
93
star
68

clock

A Date-Time Library for R
R
93
star
69

lifecycle

Manage the life cycle of your exported functions and arguments
R
91
star
70

systemfonts

System Native Font Handling in R
C++
90
star
71

gtable

The layout packages that powers ggplot2
R
85
star
72

askpass

Password Entry for R, Git, and SSH
R
83
star
73

rappdirs

Find OS-specific directories to store data, caches, and logs. A port of python's AppDirs
R
81
star
74

zip

Platform independent zip compression via miniz
C
81
star
75

commonmark

High Performance CommonMark and Github Markdown Rendering in R
C
81
star
76

downlit

Syntax Highlighting and Automatic Linking
R
80
star
77

clisymbols

Unicode symbols for CLI applications, with fallbacks
R
74
star
78

tree-sitter-r

C
74
star
79

ps

R package to query, list, manipulate system processes
C
72
star
80

sessioninfo

Print Session Information
R
72
star
81

pkgapi

Create a map of functions for an R package - WORK IN PROGRESS!
R
69
star
82

credentials

Tools for Managing SSH and Git Credentials
R
69
star
83

roxygen2md

Convert elements of roxygen documentation to markdown
R
69
star
84

sodium

R bindings to libsodium
R
68
star
85

backports

Reimplementations of Functions Introduced Since R-3.0.0
R
65
star
86

pkgbuild

Find tools needed to build R packages
R
65
star
87

cliapp

Rich Command Line Applications
R
62
star
88

webfakes

Fake web apps for HTTP testing R packages
C
61
star
89

generics

Common generic methods
R
60
star
90

diffviewer

HTML widget to visually compare files
JavaScript
57
star
91

liteq

Serverless R message queue using SQLite
R
55
star
92

pkgload

Simulate installing and loading a package
R
55
star
93

cachem

Key-value caches for R
R
53
star
94

carrier

Create standalone functions for remote execution
R
49
star
95

brio

Basic R Input Output
R
49
star
96

jose

Javascript Object Signing and Encryption for R
R
47
star
97

urlchecker

Run CRAN URL checks from older versions of R
R
46
star
98

pkgconfig

Private configuration for R packages
R
40
star
99

filelock

Cross platform file locking in R
R
39
star
100

pkginstall

Provides a replacement for `utils::install.packages()`
R
35
star