• Stars
    star
    588
  • Rank 76,022 (Top 2 %)
  • Language
    R
  • License
    Other
  • Created almost 7 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A grammar of graphics for comparative genomics

gggenomes

A grammar of graphics for comparative genomics

gggenomes is a versatile graphics package for comparative genomics. It extends the popular R visualization packageggplot2 by adding dedicated plot functions for genes, syntenic regions, etc. and verbs to manipulate the plot to, for example, quickly zoom in into gene neighborhoods.

A realistic use case comparing six viral genomes

gggenomes makes it easy to combine data and annotations from different sources into one comprehensive and elegant plot. Here we compare the genomic architecture of 6 viral genomes initially described in Hackl et al.: Endogenous virophages populate the genomes of a marine heterotrophic flagellate

library(gggenomes)

# to inspect the example data shipped with gggenomes
data(package="gggenomes")

gggenomes(emale_genes, emale_seqs, emale_tirs, emale_ava) %>%
  add_feats(ngaros=emale_ngaros, gc=emale_gc) %>%
  add_sublinks(emale_prot_ava) %>%
  flip_by_links() +
  geom_feat(position="identity", size=6) +
  geom_seq() +
  geom_link(data=links(2)) +
  geom_bin_label() +
  geom_gene(aes(fill=name)) +
  geom_gene_tag(aes(label=name), nudge_y=0.1, check_overlap = TRUE) +
  geom_feat(data=feats(ngaros), alpha=.3, size=10, position="identity") +
  geom_feat_note(aes(label="Ngaro-transposon"), feats(ngaros),
      nudge_y=.1, vjust=0) +
  geom_ribbon(aes(x=(x+xend)/2, ymax=y+.24, ymin=y+.38-(.4*score),
      group=seq_id, linetype="GC-content"), feats(gc),
      fill="lavenderblush4", position=position_nudge(y=-.1)) +
  scale_fill_brewer("Genes", palette="Dark2", na.value="cornsilk3")

ggsave("man/figures/emales.png", width=8, height=4)

For a reproducible recipe describing the full evolution of an earlier version of this plot with an older version of gggenomes starting from a mere set of contigs, and including the bioinformatics analysis workflow, have a look at From a few sequences to a complex map in minutes.

Motivation & concept

Visualization is a corner stone of both exploratory analysis and science communication. Bioinformatics workflows, unfortunately, tend to generate a plethora of data products often in adventurous formats making it quite difficult to integrate and co-visualize the results. Instead of trying to cater to the all these different formats explicitly, gggenomes embraces the simple tidyverse-inspired credo:

  • Any data set can be transformed into one (or a few) tidy data tables
  • Any data set in a tidy data table can be easily and elegantly visualized

As a result gggenomes helps bridge the gap between data generation, visual exploration, interpretation and communication, thereby accelerating biological research.

Under the hood gggenomes uses a light-weight track system to accommodate a mix of related data sets, essentially implementing ggplot2 with multiple tidy tables instead of just one. The data in the different tables are tied together through a global genome layout that is automatically computed from the input and defines the positions of genomic sequences (chromosome/contigs) and their associated features in the plot.

Inspiration

gggenomes stands on the shoulder of giants. It was born out of admiration of David Wilkins' gggenes package, draws from other ggplot2 extensions such as Guangchuang Yu's ggtree, and is fundamentally inspired by Thomas Lin Pedersen's incredibly rich ggraph package.

Installation

gggenomes is at this point still in an alpha release state, and therefore only available as a developmental package.

# install ggtree
# https://bioconductor.org/packages/release/bioc/html/ggtree.html
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("ggtree")

# install.packages("devtools")
devtools::install_github("thackl/thacklr")
devtools::install_github("thackl/gggenomes")

More Repositories

1

minidot

Fast and pretty dotplots for whole genomes assemblies using minimap and R/ggplot2
R
74
star
2

proovframe

frame-shift correction for long-read (meta)genomics
Perl
30
star
3

seq-scripts

scripts for sequence and feature conversion, annotation, analysis ...
Perl
25
star
4

treemmer-animate

R
10
star
5

detectEVE

Find endogenous viral elements in genomes
Python
9
star
6

cross-species-scaffolding

Super-scaffolding of draft genome assemblies with in silico mate-pair libraries derived from (closely) related references
Shell
8
star
7

ggworldmap

Visualize Global-Scale Quantitative Data on World Maps with Projection and Shifted Central Meridian
R
7
star
8

thacklr

A Collection of R Functions and Snippets I Often Use
R
6
star
9

host-phage-tRNA-att-finder

Link phages and potential host by finding shared attachment sites for prophage integration
Perl
6
star
10

pro-tycheposons

Supplementary code and data for mobile genetic elements in Prochlorococcus
R
5
star
11

kmer-scripts

scripts for working with kmers - plots, filter, ...
R
2
star
12

thackl.github.io

Source code of my homepage & blog
HTML
2
star
13

lecture-variant-calling

Variant-Calling-Example
1
star
14

proovframe-benchmark

Benchmark data sets used to assess proovframe - frame-shift correction for long-read (meta)genomics
Shell
1
star
15

plot-scripts

Quick plots directly in the terminal window
R
1
star
16

google-sync

Scripts for syncing local tsv files with google spreadsheets
R
1
star
17

dropbox-dual-account-setup

Multiple dropbox accounts on linux
Shell
1
star
18

simons-metagenome-data-descriptor-map-figure

Jupyter Notebook
1
star