• Stars
    star
    226
  • Rank 175,205 (Top 4 %)
  • Language
    R
  • License
    GNU General Publi...
  • Created about 5 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

very fast scatterplots for R

scattermore

CRAN status CRAN downloads R CMD check status codecov

Scatterplots with more datapoints. If you want to plot bazillions of points without much waiting, use this.

If you want to report the usage of scattermore within a scientific project, you may want to refer to it systematically -- scattermore has been peer-reviewed as a part of a larger package for interactive cytometry data analysis (ShinySOM). Links here: PubMed 32049322, OUP, doi:10.1093/bioinformatics/btaa091

Installation

  • from CRAN repositories (recommended): install.packages('scattermore')
  • from GitHub (development version: devtools::install_github('exaexa/scattermore')

Quick How-To

Function scattermoreplot is meant to behave roughly like the standard plot:

library(scattermore)
scattermoreplot(rnorm(1e7),
                rnorm(1e7),
		col=heat.colors(1e7, alpha=.1),
		main='Scattermore demo')

If you use ggplot2, you can use geom_scattermore instead of geom_point to rasterize the graphics (e.g. to reduce PDF size):

ggplot(....) + geom_scattermore()

(Note that the processing of data in ggplot is usually too slow itself; use geom_scattermost to dodge that.)

Advanced usage

Function scattermore only creates the raster graphics for the plots; this can be plotted out afterwards (or processed in any other weird ways). Let's try a manual benchmark:

# create 10 million 2D datapoints
data <- cbind(rnorm(1e7),rnorm(1e7))

# prepare empty plot
par(mar=rep(0,4))

# plot the datapoints and see how long it takes
system.time(plot(scattermore(data, rgba=c(64,128,192,10), xlim=c(-3,3), ylim=c(-3,3))))

   user  system elapsed
  0.413   0.044   0.461

You should immediately see quite a heap of tiny points:

Resulting scatterplot

Now, how fast would the standard plot() do?

# compare with the usual plot function on x11/cairo
system.time(plot(data, pch='.', xlim=c(-3,3), ylim=c(-3,3), col=rgb(0.25,0.5,0.75,0.04)))

   user  system elapsed
  9.752   0.023   9.794

This way, 0.46 seconds of scattermore means a nice ~20x speedup over plot on my laptop. Moreover, if you use different plotting setups (basically any non-Cairo, say windows- or quartz-based grDevices backends), you will very possibly see much greater speedups. Cairo itself is sometimes more than 10x faster than the other backends. That means scattermore may be over 200x faster in total.

How does it work?

  1. Points and colors get converted to vectors and passed to C
  2. C code rasterizes the whole thing to a prepared bitmap. This is already quite fast, but some low-level optimization can probably speed it up several more times. Volunteers/pull requests welcome. (Is there a way to push a raw uint8_t array into C from R?)
  3. The resulting array gets converted to R raster using as.raster, which can get plotted. (Fun fact: When plotting less than roughly 1 million points, most computational time is spent only by this conversion!)

How fast is it?

Let us measure the same example as above, with points limited to different sizes (i.e. in the first case, scattermore receives data[1:1e4,]):

points  .  average time (s)
--------+------------------
1e4     .  0.037
3e4     .  0.039
1e5     .  0.042
3e5     .  0.051
1e6     .  0.076     -- ~50% of the time is R raster conversion overhead
3e6     .  0.170     -- caches start to overflow here
1e7     .  0.460

(Multicolor plotting is slightly slower (usually 2x), because the reading and transporting of the relatively large color matrix eats quite a lot of cache.)

How nice is it?

Custom rasterization gives a bit of extra features. These are the two most obvious:

  1. The gazillions of points are present as a raster, even in vector output. That might be a problem sometimes (remember to use sufficient raster size to get the desired DPI!), but makes vector output smaller and much more easily processed by other tools. (Remember the huge PDFs with scatterplots that take a minute to load?)
  2. The rasterization is not required to work in limited memory as in usual plotting libraries, which we use to gain a bit of extra precision in color mixing. This is most visible when plotting a ton of low-alpha points where the usual blending methods produce ugly rounding artifacts.
library(ggplot2)
library(scattermore)

# data
d <- cbind(rnorm(1e6),runif(1e6))

# first plot (geom_point)
ggsave('point.png', units='in', width=3, height=3,
  ggplot(data.frame(x=d[,1],y=d[,2])) +
  geom_point(shape='.', alpha=.05, aes(x,y,color=y)) +
  scale_color_viridis_c(guide='none') +
  ggtitle("geom_point"))

# second plot (geom_scattermost)
ggsave('scattermore.png', units='in', width=3, height=3,
  ggplot() +
  geom_scattermost(
    d,
    col=viridisLite::viridis(100, alpha=0.05)[1+99*d[,2]],
    pointsize=2,
    pixels=c(700,700)) +
  ggtitle("geom_scattermost"))

Plot with geom_pointPlot with geom_scattermore

More Repositories

1

codecrypt

Post-quantum cryptography tool (THIS REPOSITORY IS ONLY A MIRROR OF THE MAIN ONE, PLEASE DO NOT FILE BUGS HERE)
C++
302
star
2

ls47

Variant of hand-computable ElsieFour cipher with 7x7 3D-printable board. THIS REPOSITORY IS A MIRROR, DO NOT OPEN ISSUES HERE.
Python
43
star
3

better-mff-thesis

A slightly improved variant of the official thesis sample
TeX
26
star
4

EmbedSOM

Fast embedding ot multidimensional datasets, great for cytometry data
R
24
star
5

zfs-backup

Zfs backup juggling tool -- snapshotting, archiving, retention.
9
star
6

cloudvpn

Meshing VPN tool.
C
7
star
7

btrpstr

attempt at a minimalistic and nice TikZ poster class for #betterposter
TeX
6
star
8

simple-mff-slides

A non-template for Beamer slides with Metropolis.
TeX
5
star
9

R.asne

A-tSNE for R
C++
4
star
10

panelbuilder

Panel optimization and unmixing tool for multicolor and spectral cytometry
R
3
star
11

nougad

non-linear unmixing by gradient descent
R
3
star
12

orthos

Extremely skinnable X11 display manager.
C++
3
star
13

ShinySOM

Interactive flow+mass cytometry data analysis with SOMs (for R Shiny)
R
2
star
14

pulseaudio-dummy-deb

Dummy PulseAudio package with precisely zero PulseAudio!
Makefile
2
star
15

hilbert-curve-o1

Constant-time integer transform to 2D Hilbert curve
C
2
star
16

covtools.jl

commandline tools for exploring test coverage of julia projects
Julia
2
star
17

xcompose

small useful xcompose with greek letters and some math
2
star
18

simple-mff-poster

a simple poster starter-pack for MFF bachelors
TeX
2
star
19

hs21

Domácí úkoly na NPRG068 Programování v Haskellu 2021/22
Haskell
2
star
20

chownmap

chown whole UID/GID ranges, e.g. for LXC unprivileged containers
Roff
2
star
21

rash

RAcing in baSH!
Shell
2
star
22

escm

exa scheme, the C++-connected scheme interpreter.
C++
2
star
23

apulse-debian

apulse packaged for debian
C
1
star
24

shinyDendro

idendro for R Shiny (pronounce as shin-i-dendro)
JavaScript
1
star
25

dte-debian

debian gbp packaging for dte text editor -- https://craigbarnes.gitlab.io/dte/
C
1
star
26

latex-dinkus

latex typesetting of dinkus, asterism, and similar breaks
TeX
1
star
27

gigascatter-tiles

zoomable gigascatter plots (WIP)
Julia
1
star
28

bunnykill

Furry bunny hopping game for lovers of jumpnbump. Includes blood. Lots of.
C++
1
star
29

manurxiv

A helper manuscript styling class
TeX
1
star
30

elixirposter-tex

LaTeX template for Elixir-Europe conference posters
TeX
1
star
31

english-verb-tense-hypercube

Tikz/TeX poster to show people when trying to explain stuff like "I would have never been being kicked in my grammar-aware cortex." See website for PDF.
TeX
1
star