scATAC-seq-analysis-notes
my notes for scATACseq analysis
paper to read
- Benchmarking computational methods for single-cell chromatin data analysis
- Single-cell transcriptomics in cancer: computational challenges and opportunities by Jean, Kamil and Fan.
- review Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation
- Prospective, brain-wide labeling of neuronal subclasses with enhancer-driven AAVs Neuroscientists need ever-more specific methods to restrict expression of tools to defined neuronal subpopulations. This amazing study from @AllenInstitute uses scATAC-seq & scRNA-seq to discover enhancers that are even more specific than driver lines.
ATAC-seq QC
protocols
An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues
Fragment length distribution
A blog post by Xi Chen
The successful construction of a ATAC library requires a proper pair of Tn5 transposase cutting events at the ends of DNA. In the nucleosome-free open chromatin regions, many molecules of Tn5 can kick in and chop the DNA into small pieces; around nucleosome-occupied regions, Tn5 can only access the linker regions. Therefore, in a normal ATAC-seq library, you should expect to see a sharp peak at the <100 bp region (open chromatin), and a peak at ~200bp region (mono-nucleosome), and other larger peaks (multi-nucleosomes)
there might be some artifact with how the aligner deals with fragments where the forward read and reverse read are exact reverse complements of each other. I know that Bowtie (1 but not 2) has some issue with those reads.
ATAC-seq
Some may notice that the peaks produced look both like peaks produced from the TF ChIP-seq pipeline as well as the histone ChIP-seq pipeline. This is intentional, as ATAC-seq data looks both like TF data (narrow peaks of signal) as well as histone data (broader regions of openness).
- ATACseqQC a bioconductor package for quality control of ATAC-seq data.
- RASQUAL (Robust Allele Specific QUAntification and quality controL) maps QTLs for sequenced based cellular traits by combining population and allele-specific signals. paper: Fine-mapping cellular QTLs with RASQUAL and ATAC-seq
- ATAC-seq Forum
- Single-cell ATAC-Seq
- A rapid and robust method for single cell chromatin accessibility profiling
- Global Prediction of Chromatin Accessibility Using RNA-seq from Small Number of Cells from RNA-seq to DNA accessibility. tool on github
- NucleoATAC: Python package for calling nucleosomes using ATAC-Seq data
- chromVAR: Inferring transcription factor variation from single-cell epigenomic data scATAC-seq
- ENCODE ATAC-seq guidelines
- Brockman is a suite of command line tools and R functions to convert genomics data into DNA k-mer words representing the regions associated with a chromatin mark, and then analyzing these k-mer sets to see how samples differ from each other. This approach is primarily intended for single cell genomics data, and was tested most extensively on single cell ATAC-seq data
- Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets via protocol-specific bias modeling
- msCentipede is an algorithm for accurately inferring transcription factor binding sites using chromatin accessibility data (Dnase-seq, ATAC-seq) and is written in Python2.x and Cython.
- The Differential ATAC-seq Toolkit (DAStk) is a set of scripts to aid analyzing differential ATAC-Seq data.
- Identification of Transcription Factor Binding Sites using ATAC-seq We propose HINT-ATAC, a footprinting method that addresses ATAC- seq specific protocol artifacts
- HMMRATAC splits a single ATAC-seq dataset into nucleosome-free and nucleosome-enriched signals, learns the unique chromatin structure around accessible regions, and then predicts accessible regions across the entire genome. We show that HMMRATAC outperforms the popular peak-calling algorithms on published human and mouse ATAC-seq datasets.
- ChromA: Chromatin Landscape Annotation Tool. ChromA is a probabilistic model to annotate chromatin regions into accessible or inaccessible, open or closed, based on their ATACseq profile. ChromA can process bulk datasets, single-cell or integrate information from a combination of both. Even more, ChromA can integrate information from different replicates or different cellular populations to create a consensus representation of chromatin accessibility.
peak calling
--shift -100 --extsize 200 will amplify the 'cutting sites' enrichment from ATAC-seq data. So in the end, the 'peak' is where Tn5 transposase likes to attack. The fact is that, although many information such as the insertion length and the other mate alignment is ignored, such result is still usable. Especially when the short fragment population is extremely dominant, the final output won't be off much.
macs2 --nomodel --keepdup all --shift -100 --extsize 200
macs2 -f BAMPE
- generich written by John, previous labmates at Harvard FAS informatics. will take a look!
motif analysis
- margeAn API for Analysis of Motifs Using HOMER in R: https://www.biorxiv.org/content/10.1101/249268v1
- homerkit
Dimension Reduction
- clustering scATACseq data: the TF-IDF way My effort to replicate some of the studies.
- Dimensionality Reduction for scATAC Data A blog post by Andrew Hill. Very informative.
clustering
- cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data github https://github.com/aertslab/cistopic
copy-number
- Alleloscope is a method for allele-specific copy number estimation that can be applied to single cell DNA and ATAC sequencing data (separately or in combination), allowing for integrative multi-omic analysis of allele-specific copy number and chromatin accessibility for the same cell.
footprint
- HINT tutorial: https://www.regulatory-genomics.org/hint/tutorial/ paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1642-2 single cell ATAC tutorial is on the way.
- seqoutBias remove Tn5 cut bias.
- scOpen: chromatin-accessibility estimation of single-cell ATAC data tutorial https://www.regulatory-genomics.org/hint/tutorial-differential-footprints-on-scatac-seq/
- Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal: TOBIAS. Very good according to Andrew. https://github.com/loosolab/TOBIAS has all you need for bias correction, differential footprint and plotting. will try for sure!
nucleosome positioning
pipelines
- MAESTRO Single-cell Transcriptome and Regulome Analysis Pipeline from Shirley Liu lab.
- scATAC-pro: a comprehensive workbench for single-cell chromatin accessibility sequencing data https://github.com/wbaopaul/scATAC-pro
- ArchR : Analysis of Regulatory Chromatin in R (biorxiv link) (github link) (Full documentation see here)
- SnapATAC much faster than
cellranger-atac
.
integrate scATAC and scRNAseq
- SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks paper https://www.biorxiv.org/content/10.1101/2022.08.19.504505v1
- MOFA+: analysis of matching scRNA-seq and scATAC-seq data
- Buenrostro et al., Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation, Cell (2018), https://doi.org/10.1016/j.cell.2018.03.074
- Duren et al., Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations, PNAS (2018), https://doi.org/10.1073/pnas.1805681115
- Prospective, brain-wide labeling of neuronal subclasses with enhancer-driven AAVs
- Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain Jean Fan kindly shared her scripts for the paper at https://github.com/JEFworks/Supplementary-Code/tree/master/snDropSeq_scTHSseq. will need to ask her when I have questions.
- Building gene regulatory networks from single-cell ATAC-seq and RNA-seq using Linked Self-Organizing Maps github link https://github.com/csjansen/SOMatic
- Comprehensive integration of single cell data integrating scATAC and scRNAseq in the
seurat
v3 R package. - Integrative analysis of single cell genomics data by coupled nonnegative matrix factorizations
- Multi-Omics Factor Analysis (MOFA) http://bioconductor.org/packages/release/bioc/html/MOFA.html integrate scATAC and scRNAseq?
- Garnett automatic annotation of scRNAseq and scATACseq data sets.
- Fluent genomics with plyranges and tximeta integrate bulk RNAseq and ATACseq data using bioconductor packages and using plyranges (dplyr for GRanges)!
predicting ATAC peak target gene
- Cicero
- ChIA-Drop: Multiplex chromatin interactions with single-molecule precision by Yijun Ruan at JAX lab. His office is next to my previous postdoc advisor Roel Verhaak's.