• Stars
    star
    678
  • Rank 64,430 (Top 2 %)
  • Language
    R
  • License
    Other
  • Created about 6 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Apply Mapping Functions in Parallel using Futures

furrr

CRAN status R-CMD-check Codecov test coverage

Overview

The goal of furrr is to combine purrr’s family of mapping functions with future’s parallel processing capabilities. The result is near drop in replacements for purrr functions such as map() and map2_dbl(), which can be replaced with their furrr equivalents of future_map() and future_map2_dbl() to map in parallel.

The code draws heavily from the implementations of purrr and future.apply and this package would not be possible without either of them.

What has been implemented?

Every variant of the following functions has been implemented:

  • map()
  • map2()
  • pmap()
  • walk()
  • imap()
  • modify()

This includes atomic variants like map_dbl() through future_map_dbl() and predicate variants like map_at() through future_map_at().

Installation

You can install the released version of furrr from CRAN with:

install.packages("furrr")

And the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("DavisVaughan/furrr")

Learning

The easiest way to learn about furrr is to browse the website. In particular, the function reference page can be useful to get a general overview of the functions in the package, and the following vignettes are deep dives into various parts of furrr:

Example

furrr has been designed to function as identically to purrr as possible, so that you can immediately have familiarity with it.

library(furrr)
library(purrr)

map(c("hello", "world"), ~.x)
#> [[1]]
#> [1] "hello"
#> 
#> [[2]]
#> [1] "world"

future_map(c("hello", "world"), ~.x)
#> [[1]]
#> [1] "hello"
#> 
#> [[2]]
#> [1] "world"

The default backend for future (and through it, furrr) is a sequential one. This means that the above code will run out of the box, but it will not be in parallel. The design of future makes it incredibly easy to change this so that your code will run in parallel.

# Set a "plan" for how the code should run.
plan(multisession, workers = 2)

# This does run in parallel!
future_map(c("hello", "world"), ~.x)
#> [[1]]
#> [1] "hello"
#> 
#> [[2]]
#> [1] "world"

If you are still skeptical, here is some proof that we are running in parallel.

library(tictoc)

# This should take 6 seconds in total running sequentially
plan(sequential)

tic()
nothingness <- future_map(c(2, 2, 2), ~Sys.sleep(.x))
toc()
#> 6.08 sec elapsed
# This should take ~2 seconds running in parallel, with a little overhead
# in `future_map()` from sending data to the workers. There is generally also
# a one time cost from `plan(multisession)` setting up the workers.
plan(multisession, workers = 3)

tic()
nothingness <- future_map(c(2, 2, 2), ~Sys.sleep(.x))
toc()
#> 2.212 sec elapsed

Data transfer

It’s important to remember that data has to be passed back and forth between the workers. This means that whatever performance gain you might have gotten from your parallelization can be crushed by moving large amounts of data around. For example, if you are moving large data frames to the workers, running models in parallel, and returning large model objects back, the shuffling of data can take a large chunk of that time. Rather than returning the entire model object, you might consider only returning a performance metric, or smaller specific pieces of that model that you are most interested in.

This performance drop can especially be prominent if using future_pmap() to iterate over rows and return large objects at each iteration.

More Repositories

1

extrachecks

124
star
2

almanac

Tools for working with recurrence rules, holidays, and calendars
R
68
star
3

strapgod

"I'm beginning to feel like a strap god." - Eminem
R
62
star
4

so-you-want-to-use-rcpp

47
star
5

ivs

Interval Vectors
R
44
star
6

cbuild

Tools to Make Developing R Packages Interfacing with C Easier
R
42
star
7

cexport

What the Package Does (One Line, Title Case)
C
32
star
8

flyingfox

An R Interface to the Quantopian Zipline Financial Backtester
R
25
star
9

warp

Group Dates
R
22
star
10

extrachecks-html5

18
star
11

slides

All my presentations
HTML
14
star
12

multidplyr2

What the Package Does (One Line, Title Case)
R
13
star
13

fin-econ-project-bitcoin

HTML
12
star
14

2019-useR-workshop-design-for-humans

R
11
star
15

cross

Run Functions Across Package Versions
R
10
star
16

r-extensions

https://r-extensions.davisvaughan.com/
R
9
star
17

vcturrrs

Combining the Iteration of 'purrr' With the Type Stability of 'vctrs'
R
7
star
18

vacation

Holiday and Calendar Extensions for almanac
R
6
star
19

nodegraph

Infrastructure of node-based lazy matrix computation
R
5
star
20

xcursion

Examples With 'xtensor'
C++
4
star
21

calendarrr

An R frontend to the QuantLib Calendar API
C++
4
star
22

declair

What the Package Does (One Line, Title Case)
R
4
star
23

slidejoin

What the Package Does (One Line, Title Case)
R
4
star
24

addr

An R package with a C function - Complements the blog post ->
C
4
star
25

standardize

Standardize Subscripts
R
4
star
26

2020-06-01_dplyr-vctrs-compat

HTML
4
star
27

timerip

Rip Out Datetime Components
C
3
star
28

2019-04-19_duke-data-dialogue

3
star
29

machinegun

Provision Cloud Machines Preloaded With 'Docker'
R
3
star
30

rstudio-conf-2020

R
3
star
31

almanac-old

Tools for adjusting dates according to business calendars and holidays
C++
3
star
32

r-tree-sitter

C
3
star
33

hybridfire

What the Package Does (One Line, Title Case)
R
2
star
34

tidying-excel-cashflows-blog-companion

HTML
2
star
35

2019-04-12_unc-charlotte-demo

JavaScript
2
star
36

2019-07-09_useR-2019-rray

R
2
star
37

temporalintervals

What the Package Does (One Line, Title Case)
2
star
38

vsexample

What the Package Does (One Line, Title Case)
C++
2
star
39

xaringanrecipes-companion

JavaScript
2
star
40

2019-02-19_charlotte-dsba-5122

JavaScript
2
star
41

tidy64

An S3 Class Supporting 64-bit Integers
C
2
star
42

cexportuser

What the Package Does (One Line, Title Case)
C
2
star
43

xaringanrecipes

A Field Guide to xaringan
HTML
1
star
44

testpack

"Testing" an RStudio Project
R
1
star
45

datea

Extended Date Classes
R
1
star
46

cshared

Companion package for the blog post ->
C
1
star
47

R-Tutorials

JavaScript
1
star
48

cdemo

What the Package Does (One Line, Title Case)
C
1
star
49

int64

Large Integer Types
C
1
star
50

funneljoin-comparison

R
1
star
51

xtensorrr

An Easier Leap Into 'xtensor'
C++
1
star
52

featherframe

(DEPRECATED: reticulate does this now) Convert between R objects and Pandas DataFrames and Series, all within R
R
1
star
53

flatcat

Provide flat map()-ing functions
R
1
star
54

test-renv

R
1
star
55

almanac2

What the Package Does (One Line, Title Case)
R
1
star
56

testthathelperpath

What the Package Does (One Line, Title Case)
R
1
star
57

almanac-old2

The Grammar of Schedules
R
1
star
58

blog

Personal website
JavaScript
1
star
59

bookdownunder

bookdown in a package?
HTML
1
star
60

2023-04-24_code-review-principles

JavaScript
1
star
61

dockermachinery

What the Package Does (One Line, Title Case)
R
1
star