• Stars
    star
    154
  • Rank 242,095 (Top 5 %)
  • Language
    R
  • Created about 10 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Adding progress bar to '*apply' functions in R

pbapply: adding progress bar to '*apply' functions in R

CRAN version CRAN RStudio mirror downloads check

A lightweight package that adds progress bar to vectorized R functions (*apply). The implementation can easily be added to functions where showing the progress is useful (e.g. bootstrap). The type and style of the progress bar (with percentages or remaining time) can be set through options. The package supports several parallel processing backends, such as snow-type clusters, multicore-type forking, and future.

Versions

Install CRAN release version (recommended):

install.packages("pbapply")

Development version:

install.packages("pbapply", repos = "https://psolymos.r-universe.dev")

See user-visible changes in the NEWS file.

Use the issue tracker to report a problem, or to suggest a new feature.

How to get started?

1. You are not yet an R user

In this case, start with understanding basic programming concepts, such as data structures (matrices, data frames, indexing these), for loops and functions in R. The online version of Garrett Grolemund's Hands-On Programming with R walks you through these concepts nicely.

2. You are an R user but haven't used vectorized functions yet

Learn about vectorized functions designed to replace for loops: lapply, sapply, and apply. Here is a repository called The Road to Progress that I created to show you how to go from a for loop to lapply/sapply.

Watch the video

3. You are an R user familiar with vectorized functions

In this case, you can simply add pbapply::pb before your *apply functions, e.g. apply() will become pbapply::pbapply(), etc. You can guess what happens. Now if you want to speed things up a little (or a lot), try pbapply::pbapply(..., cl = 4) to use 4 cores instead of 1.

If you are a Windows user, things get a bit more complicated, but not much. Check how to work with parallel::parLapply to set up a snow type cluster or use a suitable future backend (see some examples below). Have a look at the The Road to Progress repository to see more worked examples.

4. You are a seasoned R developer writing your own packages

Read on, the next section is for you.

How to add pbapply to a package

There are two ways of adding the pbapply package to another package.

1. Suggests: pbapply

Add pbapply to the Suggests field in the DESCRIPTION.

Use a conditional statement in your code to fall back on a base function in case of pbapply is not installed:

out <- if (requireNamespace("pbapply", quietly = TRUE)) {
   pbapply::pblapply(X, FUN, ...)
} else {
   lapply(X, FUN, ...)
}

See a small example package here.

2. Depends/Imports: pbapply

Add pbapply to the Depends or Imports field in the DESCRIPTION.

Use the pbapply functions either as pbapply::pblapply() or specify them in the NAMESPACE (importFrom(pbapply, pblapply)) and use it as pblapply() (without the ::). You'd have to add a comment #' @importFrom pbapply pblapply if you are using roxygen2.

Customizing the progress bar in your package

Specify the progress bar options in the zzz.R file of the package:

.onAttach <- function(libname, pkgname){
    options("pboptions" = list(
        type = if (interactive()) "timer" else "none",
        char = "-",
        txt.width = 50,
        gui.width = 300,
        style = 3,
        initial = 0,
        title = "R progress bar",
        label = "",
        nout = 100L,
        min_time = 2))
    invisible(NULL)
}

This will set the options and pbapply will not override these when loaded.

See a small example package here.

Suppressing the progress bar in your functions

Suppressing the progress bar is sometimes handy. By default, progress bar is suppressed when !interactive(). In other instances, put this inside a function:

pbo <- pboptions(type = "none")
on.exit(pboptions(pbo), add = TRUE)

Working with a future backend

The future backend might require additional arguments to be set by package developers to avoid warnings for end users. Most notably, you will have to determine how to handle random number generation as part of parallel evaluation. You can pass the future.seed argument directly through .... In general, ass any additional arguments to FUN immediately following the FUN argument, and any additional arguments to the the future backend after cl = "future" statement:

pblapply(1:2, FUN = my_fcn, {additional my_fcn args}, cl = "future", {additional future args})

See this issue for a discussion.

Examples

The following pb* functions are available in the pbapply package:

base pbapply works in parallel
apply pbapply โœ…
by pbby โœ…
eapply pbeapply โœ…
lapply pblapply โœ…
.mapply pb.mapply โŒ
mapply pbmapply โŒ
Map pbMap โŒ
replicate pbreplicate โœ…
sapply pbsapply โœ…
tapply pbtapply โœ…
vapply pbvapply โœ…
โŒ pbwalk โœ…

Command line usage

library(pbapply)
set.seed(1234)
n <- 2000
x <- rnorm(n)
y <- rnorm(n, model.matrix(~x) %*% c(0,1), sd=0.5)
d <- data.frame(y, x)
## model fitting and bootstrap
mod <- lm(y~x, d)
ndat <- model.frame(mod)
B <- 500
bid <- sapply(1:B, function(i) sample(nrow(ndat), nrow(ndat), TRUE))
fun <- function(z) {
    if (missing(z))
        z <- sample(nrow(ndat), nrow(ndat), TRUE)
    coef(lm(mod$call$formula, data=ndat[z,]))
}

## standard '*apply' functions
# system.time(res1 <- lapply(1:B, function(i) fun(bid[,i])))
#    user  system elapsed
#   1.096   0.023   1.127
system.time(res2 <- sapply(1:B, function(i) fun(bid[,i])))
#    user  system elapsed
#   1.152   0.017   1.182
system.time(res3 <- apply(bid, 2, fun))
#    user  system elapsed
#   1.134   0.010   1.160
system.time(res4 <- replicate(B, fun()))
#    user  system elapsed
#   1.141   0.022   1.171

## 'pb*apply' functions
## try different settings:
## "none", "txt", "tk", "win", "timer"
op <- pboptions(type="timer") # default
system.time(res1pb <- pblapply(1:B, function(i) fun(bid[,i])))
#    |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% ~00s
#    user  system elapsed
#   1.539   0.046   1.599
pboptions(op)

pboptions(type="txt")
system.time(res2pb <- pbsapply(1:B, function(i) fun(bid[,i])))
#   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
#    user  system elapsed
#   1.433   0.045   1.518
pboptions(op)

pboptions(type="txt", style=1, char="=")
system.time(res3pb <- pbapply(bid, 2, fun))
# ==================================================
#    user  system elapsed
#   1.389   0.032   1.464
pboptions(op)

pboptions(type="txt", char=":")
system.time(res4pb <- pbreplicate(B, fun()))
#   |::::::::::::::::::::::::::::::::::::::::::::::::::| 100%
#    user  system elapsed
#   1.427   0.040   1.481
pboptions(op)

Parallel backends

You have a few different options to choose from as a backend. This all comes down to the cl argument in the pb* functions.

  • cl = NULL (default): sequential execution
  • cl is of class cluster: this implies that you used cl = parallel::makeCluster(n) or something similar (n being the number of worker nodes)
  • cl is a positive integer (usually > 1): forking type parallelism is used in this case
  • cl = "future": you are using one of the future plans and parallelism is defined outside of the pb* call.

Note that on Windows the forking type is not available and pb* functions will fall back to sequential evaluation.

Some examples:

f <- function(i) Sys.sleep(1)

## sequential
pblapply(1:2, f)

## cluster
cl <- parallel::makeCluster(2)
pblapply(1:2, f, cl = cl)
parallel::stopCluster(cl)

## forking
pblapply(1:2, f, cl = 2)

## future
library(future)

cl <- parallel::makeCluster(2)
plan(cluster, workers = cl)
r2 <- pblapply(1:2, f, cl = "future")
parallel::stopCluster(cl)

plan(multisession, workers = 2)
pblapply(1:2, f, cl = "future")

plan(sequential)

Progress with Shiny

library(shiny)
library(pbapply)

pboptions(
    type = "shiny",
    title = "Shiny progress",
    label = "Almost there ...")

ui <- fluidPage(
    plotOutput("plot")
)

server <- function(input, output, session) {
    output$plot <- renderPlot({
        pbsapply(1:15, function(z) Sys.sleep(0.5))
        plot(cars)
    })
}

shinyApp(ui, server)

More Repositories

1

intrval

Relational Operators for Intervals
R
41
star
2

the-road-to-progress

Step-by-step guide for vectorizing/parallelizing your code
R
18
star
3

clickrup

Interacting with the ClickUp v2 API from R
R
18
star
4

ResourceSelection

Resource Selection (Probability) Functions for Use-Availability Data in R
R
9
star
5

detect

R package for analyzing wildlife data with detection error
R
5
star
6

qpad-workshop

Point Count Data Analysis
R
4
star
7

yegrug-shiny-2022

Going bananas with Shiny - YEGRUG Meetup 2022 November
R
4
star
8

bSims

Bird Point Count Simulator
R
4
star
9

EDMAinR

Euclidean Distance Matrix Analysis in R
R
3
star
10

moosecounter

Adaptive Moose Surveys
R
3
star
11

qpad-book

QPAD book: point count data analysis
TeX
2
star
12

psolymos.github.io

Personal website -- Peter Solymos
HTML
2
star
13

mefa

R package for multivariate mata handling in ecology and biogeography
R
2
star
14

opticut

Likelihood Based Optimal Partitioning
R
2
star
15

packages

List, descriptions, and stats of R packages by @psolymos
HTML
1
star
16

test-api

Testing endpoints
R
1
star
17

academic-website-template

Academic website template
JavaScript
1
star
18

PVAClone

Population Viability Analysis with Data Cloning in R
R
1
star
19

bamanalytics

Boreal Avian Modeling Project
R
1
star
20

KnockKnockJokes

R package for Knock-Knock Jokes
R
1
star
21

lillasolymos-landing

Landing page for lillasolymos.com
1
star
22

wac2wav

WAC to WAV Converter
C
1
star
23

sharx

Models and Data Sets for the Study of Species-Area Relationships in R
TeX
1
star
24

sar-db

Global speciesโ€“area relationships (SAR) database
R
1
star
25

pbapplySuggests

Suggests: pbapply
R
1
star