• Stars
    star
    129
  • Rank 279,262 (Top 6 %)
  • Language
    R
  • Created about 4 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Generate IGV style locus tracks from bigWig files in R

trackplot - Fast and easy visualisation of bigWig files in R

GitHub closed issues

Introduction

trackplot.R is an ultra-fast, simple, and minimal dependency R script to generate IGV style track plots (aka locus plots), profile plots and heatmaps from bigWig files.

Installation

trackplot.R is a standalone R script and requires no installation. Just source it and you're good to go!

source("https://github.com/PoisonAlien/trackplot/blob/master/R/trackplot.R?raw=true")

# OR

download.file(url = "https://raw.githubusercontent.com/PoisonAlien/trackplot/master/R/trackplot.R", destfile = "trackplot.R")
source('trackplot.R') 

# OR If you prefer to have it as package

remotes::install_github(repo = "poisonalien/trackplot")

Features

Why trackplot?

  • It's extremely fast since most of the heavy lifting is done by bwtool. >15X faster than deeptools for equivalent profileplots and heatmaps
  • Lightweight and minimal dependency
    • data.table and bwtool are the only requirements. Similar R packages GViz and karyoploteR has over 150 dependencies.
    • Plots are generated in pure base R graphics (no ggplot2 or tidyverse packages)
  • Automatically queries UCSC genome browser for gene models, cytobands, and chromHMM tracks - making analysis reproducible.
  • Supports GTF and standard UCSC gene formats as well.
  • Customization: Each plot can be customized for color, scale, height, width, etc.
  • Tracks can be summarized per condition (by mean, median, max, min)
  • PCA and, optional differential peak analysis with limma when using uniformly processed, normalized bigWig files.

Dependencies

  1. data.table R package - which itself has no dependency.
  2. bwtool - a command line tool for processing bigWig files. Install and move the binary to a PATH (e.g; /usr/local/bin) or a directory under the PATH.
  • For macOS: Please download the pre-build binary from here. Make it executable with chmod +x bwtool. macOS gatekeeper might complain that it can't run the binary downloaded from the internet. If so, allow it in the security settings.

  • For centOS or debian: Follow these compilation instructions.

Usage

Simple usage - Make a table of all the bigWig files to be analysed with read_coldata() and pass it to the downstream functions.

flowchart TD
    a[bigWig file list] -->A{read_coldata}
    A --> B{track_extract}
    B --> B1[track_plot]
    A --> C{profile_extract}
    C --> C1[profile_summarize]
    C --> C3[profile_heatmap]
    C1 --> C2[profile_plot]
    A --> D{extract_summary}
    D --> D1[pca_plot]
    D --> D2[diffpeak]
    D2 --> D3[volcano_plot]
#Path to bigWig files
bigWigs = c("H1_Oct4.bw", "H1_Nanog.bw", "H1_k4me3.bw", 
            "H1_k4me1.bw", "H1_k27ac.bw", "H1_H2az.bw", "H1_Ctcf.bw")

#Make a table of bigWigs along with ref genome build
bigWigs = read_coldata(bws = bigWigs, build = "hg19")

trackplots

track_extract() and track_plot() are two functions to generate IGV style track plots (aka locus plots) from bigWig files. Additionally, track_summarize can summarize tracks by condition.

Step-1: Extract signal from bigWig files

#Region to plot
oct4_loci = "chr6:31125776-31144789"

#Extract bigWig signal for a loci of interest
t = track_extract(colData = bigWigs, loci = oct4_loci)

#Or you can also specifiy a gene name instead of a loci 
# - loci and gene models will be automatically extracted from UCSC genome browser
t = track_extract(colData = bigWigs, gene = "POU5F1")

Step-2: Plot

Basic plot

track_plot(summary_list = t)

Add cytoband and change colors for each track

track_cols = c("#d35400","#d35400","#2980b9","#2980b9","#2980b9", "#27ae60","#27ae60")
track_plot(summary_list = t, 
          col = track_cols, 
          show_ideogram = TRUE)

Heighilight regions of interest (any bed files would do)

oct4_nanog_peaks = c("H1_Nanog.bed","H1_Oct4.bed") #Peak files 
track_plot(summary_list = t, 
          col = track_cols, 
          show_ideogram = TRUE, 
          peaks = oct4_nanog_peaks)

Add some chromHMM tracks to the bottom

chromHMM data should be a bed file with the 4th column containing chromatin state. See here for an example file.

Note that the color code for each of the 15 states are as described here. In case if it is different for your data, you will have to define your own color codes for each state and pass it to the argument chromHMM_cols

chromHMM_peaks = "H1_chromHMM.bed"

track_plot(summary_list = t, 
          col = track_cols, 
          show_ideogram = TRUE, 
          peaks = oct4_nanog_peaks, chromHMM = chromHMM_peaks)

Add some chromHMM tracks from UCSC

UCSC has 9 cell lines for which chromHMM data is available. These can be added automatically in case if you dont have your own data. In this case, use the argument ucscChromHMM with any values from TableName column of the below table.

                    TableName    cell                      Description              Tissue Karyotype
1: wgEncodeBroadHmmGm12878HMM GM12878     B-lymphocyte, lymphoblastoid               blood    normal
2:  wgEncodeBroadHmmH1hescHMM H1-hESC             embryonic stem cells embryonic stem cell    normal
3:   wgEncodeBroadHmmHepg2HMM   HepG2         hepatocellular carcinoma               liver    cancer
4:   wgEncodeBroadHmmHepg2HMM    HMEC         mammary epithelial cells              breast    normal
5:    wgEncodeBroadHmmHsmmHMM    HSMM        skeletal muscle myoblasts              muscle    normal
6:   wgEncodeBroadHmmHuvecHMM   HUVEC umbilical vein endothelial cells        blood vessel    normal
track_plot(summary_list = t, 
          col = track_cols, 
          show_ideogram = TRUE, 
          peaks = oct4_nanog_peaks, 
          ucscChromHMM = c("wgEncodeBroadHmmH1hescHMM", "wgEncodeBroadHmmNhlfHMM"))

Re-organize tracks

By default tracks are organized from top to bottom as c("p", "b", "h", "g", "c") corresponding to peaks track, bigWig track, chromHmm track, gene track, and cytoband track. This can be changes with the argument layout_ord. Furthermore, bigWig tracks themselves can be ordered with the argument bw_ord which accepts the names of the bigWig tracks as input and plots them in the given order.

#Draw only NANOG, OCT4 bigWigs in that order. Re-organize the layout in the order chromHMM track, gene track, cytoband track. Rest go to the end.
track_plot(
  summary_list = t,
  col = track_cols,
  show_ideogram = TRUE,
  genename = c("POU5F1", "TCF19"),
  peaks = oct4_nanog_peaks,
  peaks_track_names = c("NANOG", "OCT4"),
  groupAutoScale = FALSE, ucscChromHMM = "wgEncodeBroadHmmH1hescHMM", y_min = 0, 
  bw_ord = c("NANOG", "OCT4"), 
  layout_ord = c("h", "g", "c")
)

narrowPeaks and broadPeaks

All of the above plots can also be generated with narrowPeak or broadPeak files as input. Here, 5th column containing scores are plotted as intensity. Color coding and binning of scores are as per UCSC convention

narrowPeak is one of the output from macs2 peak caller and are easier to visualize in the absence of bigWig files.

narrowPeaks = c("H1_Ctcf.bed", "H1_H2az.bed", "H1_k27ac.bed", 
                "H1_k4me1.bed", "H1_k4me3.bed", "H1_Nanog.bed", 
                "H1_Oct4.bed", "H1_Pol2.bed")

#Use peak as input_type
narrowPeaks = read_coldata(narrowPeaks, build = "hg19", input_type = "peak")

oct4_loci = "chr6:30,818,383-31,452,182" #633Kb region for example

narrowPeaks_track = track_extract(colData = narrowPeaks, loci = oct4_loci)

#Rest plotting is same
track_plot(summary_list = narrowPeaks_track, 
          show_ideogram = TRUE, 
          peaks = oct4_nanog_peaks, 
          ucscChromHMM = c("wgEncodeBroadHmmH1hescHMM", "wgEncodeBroadHmmNhlfHMM"))

image

profileplots

profile_extract() -> profile_summarize() -> profile_plot() are functions to generate density based profile-plots from bigWig files.

  • Below example for summarizing approx. 3,671 peaks for 3 bigWig files takes ca. 3 seconds on my 5 year old macbook Pro. This includes generating signal matrix, summarizing, and plotting. Equivalent deeptools commands takes 20 seconds.
  • Optionally, it can also query UCSC genome browser for refseq transcripts of desired assembly and summarize around TSS regions
  • Replicates can be collapsed into single value per condition

Example data from GSE99183 where U87 glioma cell lines are treated with a DMSO and a BRD4 degradaer.

bws = c("GSM2634756_U87_BRD4.bw", "GSM2634757_U87_BRD4_dBET_24h.bw", "GSM2634758_U87_BRD4_dBET_2h.bw")
bws = read_coldata(bws = bws, 
                  sample_names = c("BRD4", "BRD4_dBET_24h", "BRD4_dBET_2h"), 
                  build = "hg19")

Refseq transcripts

#Extract signals from bigWig files around refseq transcripts
pe_refseq = profile_extract(colData = bws, ucsc_assembly = TRUE, 
                            startFrom = 'start', up = 1500, down = 1500)

#Estimate mean signal
ps_refseq = profile_summarize(sig_list = pe_refseq) 

#Plot
profile_plot(ps_refseq)

Custom BED regions

#BRD4 binding sites 
bed = "GSM2634756_U87_BRD4_peaks.narrowPeak.gz"

#Center and extend 1500 both ways from the peak center
pe_bed = profile_extract(colData = bws, bed = bed, startFrom = "center", 
                          up = 1500, down = 1500, nthreads = 4)

#Estimate mean signal
ps_bed = profile_summarize(sig_list = pe_bed) 

#Plot
profile_plot(ps_bed)

heatmap

Output from profile_extract can be used to draw a heatmap with profile_heatmap

profile_heatmap(mat_list = pe_bed, top_profile = TRUE, zmaxs = 0.8)

PSA If you find the tool useful, consider starring this repository or up voting this Biostars thread so that more poeple can find it :)

Caveat

  • Windows OS is not supported

Citation

If you find the script useful consider citing trackplot and bwtool

Mayakonda, Anand, and Frank Westermann. Trackplot: a fast and lightweight R script for epigenomic enrichment plots. Bioinformatics advances vol. 4,1 vbae031. 28 Feb. 2024. PMID: 38476298

Pohl A, Beato M. bwtool: a tool for bigWig files. Bioinformatics. 2014 Jun 1;30(11):1618-9. doi: 10.1093/bioinformatics/btu056. Epub 2014 Jan 30. PMID: 24489365

Acknowledgements

Joschka Hey for all the cool suggestions :)