• Stars
    star
    537
  • Rank 82,649 (Top 2 %)
  • Language
    R
  • License
    GNU General Publi...
  • Created almost 10 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Column scatter / beeswarm-style plots in ggplot2

Beeswarm-style plots with ggplot2

Build Status CRAN status

Introduction

Beeswarm plots (aka column scatter plots or violin scatter plots) are a way of plotting points that would ordinarily overlap so that they fall next to each other instead. In addition to reducing overplotting, it helps visualize the density of the data at each point (similar to a violin plot), while still showing each data point individually.

ggbeeswarm provides two different methods to create beeswarm-style plots using ggplot2. It does this by adding two new ggplot geom objects:

  • geom_quasirandom: Uses a van der Corput sequence or Tukey texturing (Tukey and Tukey “Strips displaying empirical distributions: I. textured dot strips”) to space the dots to avoid overplotting. This uses sherrillmix/vipor.

  • geom_beeswarm: Uses the beeswarm library to do point-size based offset.

Features:

  • Can handle categorical variables on the y-axis (thanks @smsaladi, @koncina)
  • Automatically dodges if a grouping variable is categorical and dodge.width is specified (thanks @josesho)

See the examples below.

Installation

This package is on CRAN so install should be a simple:

install.packages('ggbeeswarm')

If you want the development version from GitHub, you can do:

devtools::install_github("eclarke/ggbeeswarm")

Examples

Here is a comparison between geom_jitter and geom_quasirandom on the iris dataset:

set.seed(12345)
library(ggplot2)
library(ggbeeswarm)
#compare to jitter
ggplot(iris,aes(Species, Sepal.Length)) + geom_jitter()

ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom()

geom_quasirandom()

Using geom_quasirandom:

#default geom_quasirandom
ggplot(mpg,aes(class, hwy)) + geom_quasirandom()

# With categorical y-axis
ggplot(mpg,aes(hwy, class)) + geom_quasirandom(groupOnX=FALSE)

# Some groups may have only a few points. Use `varwidth=TRUE` to adjust width dynamically.
ggplot(mpg,aes(class, hwy)) + geom_quasirandom(varwidth = TRUE)

# Automatic dodging
sub_mpg <- mpg[mpg$class %in% c("midsize", "pickup", "suv"),]
ggplot(sub_mpg, aes(class, displ, color=factor(cyl))) + geom_quasirandom(dodge.width=1)

Alternative methods

geom_quasirandom can also use several other methods to distribute points. For example:

ggplot(iris, aes(Species, Sepal.Length)) + geom_quasirandom(method = "tukey") + ggtitle("Tukey texture")

ggplot(iris, aes(Species, Sepal.Length)) + geom_quasirandom(method = "tukeyDense") +
    ggtitle("Tukey + density")

ggplot(iris, aes(Species, Sepal.Length)) + geom_quasirandom(method = "frowney") +
    ggtitle("Banded frowns")

ggplot(iris, aes(Species, Sepal.Length)) + geom_quasirandom(method = "smiley") +
    ggtitle("Banded smiles")

ggplot(iris, aes(Species, Sepal.Length)) + geom_quasirandom(method = "pseudorandom") +
    ggtitle("Jittered density")

ggplot(iris, aes(Species, Sepal.Length)) + geom_beeswarm() + ggtitle("Beeswarm")

geom_beeswarm()

Using geom_beeswarm:

ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm()

ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm(side = 1L)

ggplot(mpg,aes(class, hwy)) + geom_beeswarm(size=.5)

# With categorical y-axis
ggplot(mpg,aes(hwy, class)) + geom_beeswarm(size=.5)

# Also watch out for points escaping from the plot with geom_beeswarm
ggplot(mpg,aes(hwy, class)) + geom_beeswarm(size=.5) + scale_y_discrete(expand=expansion(add=c(0.5,1)))

ggplot(mpg,aes(class, hwy)) + geom_beeswarm(size=1.1)

# With automatic dodging
ggplot(sub_mpg, aes(class, displ, color=factor(cyl))) + geom_beeswarm(dodge.width=0.5)

Alternative methods

df <- data.frame(
  x = "A",
  y = sample(1:100, 200, replace = TRUE)
)
ggplot(df, aes(x = x, y = y)) + geom_beeswarm(cex = 2.5, method = "swarm") + ggtitle('method = "swarm" (default)')

ggplot(df, aes(x = x, y = y)) + geom_beeswarm(cex = 2.5, method = "compactswarm") + ggtitle('method = "compactswarm"')

ggplot(df, aes(x = x, y = y)) + geom_beeswarm(cex = 2.5, method = "hex") + ggtitle('method = "hex"')

ggplot(df, aes(x = x, y = y)) + geom_beeswarm(cex = 2.5, method = "square") + ggtitle('method = "square"')

ggplot(df, aes(x = x, y = y)) + geom_beeswarm(cex = 2.5, method = "center") + ggtitle('method = "center"')

Different point distribution priority

#With different beeswarm point distribution priority
dat<-data.frame(x=rep(1:3,c(20,40,80)))
dat$y<-rnorm(nrow(dat),dat$x)
ggplot(dat,aes(x,y)) + geom_beeswarm(cex=2) + ggtitle('Default (ascending)') + scale_x_continuous(expand=expansion(add=c(0.5,.5)))

ggplot(dat,aes(x,y)) + geom_beeswarm(cex=2,priority='descending') + ggtitle('Descending') + scale_x_continuous(expand=expansion(add=c(0.5,.5)))

ggplot(dat,aes(x,y)) + geom_beeswarm(cex=2,priority='density') + ggtitle('Density') + scale_x_continuous(expand=expansion(add=c(0.5,.5)))

ggplot(dat,aes(x,y)) + geom_beeswarm(cex=2,priority='random') + ggtitle('Random') + scale_x_continuous(expand=expansion(add=c(0.5,.5)))

Corral runaway points

set.seed(1995)
df2 <- data.frame(
  y = rnorm(1000),
  id = sample(c("G1", "G2", "G3"), size = 1000, replace = TRUE)
)
p <- ggplot(df2, aes(x = id, y = y, colour = id))

# use corral.width to control corral width
p + geom_beeswarm(cex = 2.5, corral = "none", corral.width = 0.9) + ggtitle('corral = "none" (default)')

p + geom_beeswarm(cex = 2.5, corral = "gutter", corral.width = 0.9) + ggtitle('corral = "gutter"')

p + geom_beeswarm(cex = 2.5, corral = "wrap", corral.width = 0.9) + ggtitle('corral = "wrap"')

p + geom_beeswarm(cex = 2.5, corral = "random", corral.width = 0.9) + ggtitle('corral = "random"')

p + geom_beeswarm(cex = 2.5, corral = "omit", corral.width = 0.9) + ggtitle('corral = "omit"')
## Warning: Removed 303 rows containing missing values (geom_point).


Authors: Erik Clarke, Scott Sherrill-Mix, and Charlotte Dawson

More Repositories

1

swga

Select primer sets for selective whole genome amplification (SWGA)
C++
33
star
2

komplexity

A method of assessing sequence complexity based on kmer frequencies
Rust
26
star
3

melt

Standalone nucleotide sequence melt temp calculator for Python
Python
7
star
4

linecount

Fast, simple line count function in Rust
Rust
6
star
5

sarcoid-microbiome-paper

Code to generate figures, etc for the Clarke et al sarcoid microbiome paper
R
3
star
6

ncbi-baltimore

Gets the Baltimore ranks (if applicable) from NCBI taxa IDs
Python
3
star
7

mars

MARS - MinION Assembly and Reporting System
Python
3
star
8

oboparser

Parses GO OBO files, with (limited) reasoning
Python
2
star
9

refseq_dl

Download refseq genomes for a group using Snakemake
Python
2
star
10

eclectic

random R functions
R
2
star
11

onecodex_pipeline

Pipeline for parsing One Codex analyses
Python
2
star
12

jrdcolors

Fill and color scales for ggplot2, adapted from JRD suggested colors
R
1
star
13

blog

blog
1
star
14

harb-modeling

Hierarchical, autoregressive binomial models for microbiome data
Stan
1
star
15

config

Erik's Config Files
Emacs Lisp
1
star
16

fqscan

Scans a directory of (optionally paired) FASTQ files for the prevalence of particular targets
Python
1
star
17

mwsync

Java framework to create live mirrors of MediaWiki sites.
Java
1
star
18

argutils

Python utility for converting between config files and argument parsers
Python
1
star
19

s3sync

Sync directory with an S3-compatible endpoint
Go
1
star
20

swga_paper

Data and analysis for the SWGA paper
1
star
21

go-historical-analysis

Pipeline for the mass analysis of GEO datasets using historical versions of the Gene Ontology annotations.
Python
1
star