RNA-seq analysis
General sequencing data analysis materials
- Next-Gen Sequence Analysis Workshop (2015) held by Titus Brown (now in UC Davis)
- Fall 2015, BMMB 852: Applied Bioinformatics by Istvan Albert from Penn state University. He developed the all-time popular biostars
- Steven Turner in UVA is maitaining a list of training opportunities for genomic data analysis
- Jeff Leek group's recommended genomic papers
- awesome tutorial for NGS file format
- UVA Bioconnector Workshops
- Explaining your errors QC fail
- EMBL-EBI has a very comprehensive list of courses for online training
RNA-seq specific
- RNA sequencing: the teenage years A nice review.
- RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis
- Introduction to RNA-seq analysis youtube video
- RNAseq differential expression analysis – NGS2015
- Kallisto and sleuth tutorial blazing fast RNA-seq analysis by Lior Patcher's lab. A sleuth for RNA-Seq
- pathway analysis using GAGE
- Tutorial: RNA-seq differential expression & pathway analysis with Sailfish, DESeq2, GAGE, and Pathview
- A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
- RNA-seq tutorial wiki Informatics for RNA-seq: A web resource for analysis on the cloud.
- RNA-seqlopedia Great introduction of RNA-seq from sample preparation to data analysis
- RNAseq data analysis from data carpentry
- paper: Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage
- paper: A survey of best practices for RNA-seq data analysis
- paper: Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories
- paper: Cross-platform normalization of microarray and RNA-seq data for machine learning applications. Tool
- review: Translating RNA sequencing into clinical diagnostics: opportunities and challenges
- paper: Thousands of large-scale RNA sequencing experiments yield a comprehensive new human gene list and reveal extensive transcriptional noise
RNA-seq experimental design
- Thinking about Designing RNA Seq Experiments to Measure Differential Gene Expression: The Basics a blog post
- Tutorial: Rna Seq Experimental Design For Measuring Differential Gene Expression from biostars
- Scotty - Power Analysis for RNA Seq Experiments
- Experimental Design in Differential Abundance analysis web server
- Experimental Design in Differential Abundance analysis bioconductor package
Quality Control
- QoRTs: a comprehensive toolset for quality control and data processing of RNA-Seq experiments
- QUaCRS
- RSeQC RNA-seq data QC
- RNA-SeqQC
Normalization, quantification, and differential expression
Normalization is essential for RNAseq analysis. However, one needs to understand the underlining assumptions for each methods. Most methods assume there is no global changes between conditions (e.g. TMM normalization). However, this may not be true when global effect occurs. For example, if you delete a gene that controls transcription, you expect to see global gene expression reduction. In that case, other normalization methods need to be considered. (e.g. spike-in controls). The same principle applies to other high-throughput sequencing data such as ChIPseq.
read this very important paper by Rafael A Irizarry: Genome-wide repressive capacity of promoter DNA methylation is revealed through epigenomic manipulation
DESseq2 normalization by Simon Anders:
To estimate the library size, simply taking the total number of (mapped or unmapped) reads is, in our experience, not a good idea. Sometimes, a few very strongly expressed genes are differentially expressed, and as they make up a good part of the total counts, they skew this number. After you divide by total counts, these few strongly expressed genes become equal, and the whole rest looks differentially expressed.
The following simple alternative works much better:
- Construct a "reference sample" by taking, for each gene, the geometric mean of the counts in all samples.
- To get the sequencing depth of a sample relative to the reference, calculate for each gene the quotient of the counts in your sample divided by the counts of the reference sample. Now you have, for each gene, an estimate of the depth ratio.
- Simply take the median of all the quotients to get the relative depth of the library.
This is what the
estimateSizeFactors
function of our DESeq package doese.
If one wants to use a set of genes that are not affected by the global change, do
dds = newCountDataSet(CountTable, Design$condition )
dds <- estimateSizeFactors(dds,
controlGenes = rownames(dds) %in% norm_genes)
dds_global <- estimateSizeFactors(dds)
dds_global <- DESeq(dds_global)
res_global <- results(dds_global)
or give self-defined size factors.
sizeFactors(dds) = c(my_Values)
- A Comparison of Methods: Normalizing High-Throughput RNA Sequencing Data
- Errors in RNA-Seq quantification affect genes of relevance to human disease
- A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification
- Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data
- paper: Union Exon Based Approach for RNA-Seq Gene Quantification: To Be or Not to Be?
- paper: The impact of amplification on differential expression analyses by RNA-seq Computational removal of read duplicates is not recommended for differential expression analysis.
- paper: Normalization of RNA-seq data using factor analysis of control genes or samples: About spike-ins control and R normalization strategy - remove unwanted variation (RUV).
- NVT - an R package for the assessment of RNA-Seq normalization methods.
- paper: Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions A must read for RNA-seq normalization and to understand the assumptions for each normalization method!
- YARN bioc package: Robust Multi-Condition RNA-Seq Preprocessing and Normalization.
- Smooth quantile normalization or qsmooth is a generalization of quantile normalization, which is an average of the two types of assumptions about the data generation process: quantile normalization and quantile normalization between groups.
Traditional way of RNA-seq analysis
-
Two nature protocols for RNA-seq analysis
Count-based differential expression analysis of RNA sequencing data using R and Bioconductor Based on DESeq and EdgeR.
Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks -
A nice tutorial from f1000 research RNA-Seq workflow: gene-level exploratory analysis and differential expression from Michael Love who is the author of DESeq2.
-
f1000 bioconductor workflow: RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR
-
f1000 From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline by Gordon Smith.
A post from Nextgeneseek
The three papers kind of replaces earlier tools from Salzberg’s group (Bowtie/TopHat,Cufflinks, and Cuffmerge)
they offer a totally new way to go from raw RNA-seq reads to differential expression analysis:
align RNA-seq reads to genome (HISATinstead of Bowtie/TopHat, STAR),
assemble transcripts and estimate expression (StringTie instead of Cufflinks), and
perform differential expression analysis (Ballgown instead of Cuffmerge).
Simulation-based comprehensive benchmarking of RNA-seq aligners A nature method paper.
We found that performance varied by genome complexity, and accuracy and popularity were poorly correlated. The most widely cited tool underperforms for most metrics, particularly when using default settings
RapMap: A Rapid, Sensitive and Accurate Tool for Mapping RNA-seq Reads to Transcriptomes. From Sailfish group.
- BitSeq Transcript isoform level expression and differential expression estimation for RNA-seq
- Dumpster diving in RNA-sequencing to find the source of every last read ROP is a computational protocol aimed to discover the source of all reads, which originated from complex RNA molecules, recombinant antibodies and microbial communities.
For mapping based methods, usually the raw reads are mapped to transcriptome or genome (need to model gaps by exon-exon junction), and then a gene/transcript level counts are obtained by:
- HTSeq-count: one of the most popular counting tool, but it is slow.
- featureCounts: much faster, use mulitple threads.
- VERSE: built on
featureCounts
, integrateHTseq
. - eXpress.
Finally, differential expression is carried out by
-
EBseq An R package for gene and isoform differential expression analysis of RNA-seq data
-
JunctionSeq differential usage of exons and splice junctions in High-Throughput, Next-Generation RNA-Seq datasets. The methodology is heavily based on the DEXSeq bioconductor package.The core advantage of JunctionSeq over other similar tools is that it provides a powerful automated tools for generating readable and interpretable plots and tables to facilitate the interpretation of the results. An example results report is available here.
-
MetaSeq Meta-analysis of RNA-Seq count data in multiple studies
-
derfinder Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution
-
DGEclust is a program for clustering and differential expression analysis of expression data generated by next-generation sequencing assays, such as RNA-seq, CAGE and others
-
Degust: Perform RNA-seq analysis and visualisation. Simply upload a CSV file of read counts for each replicate; then view your DGE data.
-
Vennt Dynamic Venn diagrams for Differential Gene Expression.
-
GlimmaInteractive HTML graphics for RNA-seq data.
Extra Notes
- In RNA-Seq, 2 != 2: Between-sample normalization
- RPKM/FPKM, TPM and raw counts for RNA-seq
- Youtube video counts vs TPM\
- Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols
Benchmarking
bcbio.rnaseq
RNAseqGUI. I have used several times. looks good.
compcodeR
paper: Benchmark Analysis of Algorithms for Determining and Quantifying Full-length mRNA Splice Forms from RNA-Seq Data
paper: Comparative evaluation of isoform-level gene expression estimation algorithms for RNA-seq and exon-array platforms
paper:A benchmark for RNA-seq quantification pipelines
Map free
- RNASkim
- Salmon: Accurate, Versatile and Ultrafast Quantification from RNA-seq Data using Lightweight-Alignment. It is the sucessor of Salfish I have used Salfish once, and it is super-fast! Salmon is supposed to be even better. tutorial
- Kallisto from Lior Patcher's lab. paper: Near-optimal probabilistic RNA-seq quantification
- sleuth works with Kallisto for differential expression.
- NORA: A tool for transcript quantification where accuracy matters claims to be more accurate than RSEM, Salmon and Kallisto.
- paper: Differential analysis of RNA-Seq incorporating quantification uncertainty by Sleuth from Lior Pachter group.
- Differential analysis of RNA-Seq incorporating quantification uncertainty: sleuth
- Reanalysis of published RNA-Seq data using kallisto and sleuth based on shiny.
- tximport: import and summarize transcript-level estimates for gene-level analysis now on bioconductor
- tximeta by Mike Love. Import transcript quantification into R/Bioconductor with automatic annotation metadata.
- f1000 research paper Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences from Mike love et.al.
- RATS: Relative Abundance of Transcripts: An R package for the detection of Differential. Transcript isoform Usage.
It provides a method to detect changes in the relative abundance of the alternative transcripts (isoforms) of genes. This is called Differential Transcript Usage (DTU).
Detecting DTU is supplementary to the quantification of transcripts by tools like Salmon, Sailfish and Kallisto and the detection of Differential Transcript Expression (DTE) by tools such as Sleuth.
I particularly like the figure in the tutorial showing the differences among DTU, DTE and DEG. The paper transcript-level estimates improve gene-level inferences above also talks about the differences:
- differential gene expression (DGE) studies, where the overall transcriptional output of each gene is compared between conditions; 2) differential transcript/exon usage (DTU/DEU) studies, where the composition of a gene’s isoform abundance spectrum is compared between conditions, or
- differential transcript expression (DTE) studies, where the interest lies in whether individual transcripts show differential expression between conditions.
! MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data. The statistical model of MATS calculates the P-value and false discovery rate that the difference in the isoform ratio of a gene between two conditions exceeds a given user-defined threshold.
Blog posts on Kallisto/Salmon
- Comparing unpublished RNA-Seq gene expression quantifiers
- Kallisto, a new ultra fast RNA-seq quantitation method from Next GEN SEEK
- kallisto paper summary: Near-optimal RNA-seq quantification from Next GEN SEEK
- Not-quite alignments: Salmon, Kallisto and Efficient Quantification of RNA-Seq data
- Using Kallisto for gene expression analysis of published RNAseq data
- How accurate is Kallisto? from Mark Ziemann
- ALIGNMENT FREE TRANSCRIPTOME QUANTIFICATION
- A sleuth for RNA-seq
- Using Salmon, Sailfish and Sleuth for differential expression
- Road-testing Kallisto
- Why you should use alignment-independent quantification for RNA-Seq
A biostar post: Do not feed rounded estimates of gene counts from kallisto into DESeq2 (please make sure you read through all the comments, and now there is a suggested workflow for feeding rounded estimates of gene counts to DESeq etc)
There is some confusion in the answers to this question that hopefully I can clarify with the three comments below:
- kallisto produces estimates of transcript level counts, and therefore to obtain an estimate of the number of reads from a gene the correct thing to do is to sum the estimated counts from the constituent transcripts of that gene. Of note in the language above is the word "estimate", which is necessary because in many cases reads cannot be mapped uniquesly to genes. However insofar as obtaining a good estimate, the approach of kallisto (and before it Cufflinks, RSEM, eXpress and other "transcript level quantification tools") is superior to naïve "counting" approaches for estimating the number of reads originating from a gene. This point has been argued in many papers; among my own papers it is most clearly explained and demonstrated in Trapnell et al. 2013.
- Although estimated counts for a gene can be obtained by summing the estimated counts of the constituent transcripts from tools such as kallisto, and the resulting numbers can be rounded to produce integers that are of the correct format for tools such as DESeq, the numbers produced by such an approach do not satisfy the distributional assumptions made in DESeq and related tools. For example, in DESeq2, counts are modeled "as following a negative binomial distribution". This assumption is not valid when summing estimated counts of transcripts to obtain gene level counts, hence the justified concern of Michael Love that plugging in sums of estimated transcript counts could be problematic for DESeq2. In fact, even the estimated transcript counts themselves are not negative binomial distributed, and therefore also those are not appropriate for plugging into DESeq2. His concern is equally valid with many other "count based" differential expression tools.
- Fortunately there is a solution for performing valid statistical testing of differential abundance of individual transcripts, namely the method implemented in sleuth. The approach is described here. To test for differential abundance of genes, one must first address the question of what that means. E.g. is a gene differential if at least one isoform is? or if all the isoforms are? The tests of sleuth are performed at the granularity of transcripts, allowing for downstream analysis that can capture the varied questions that might make biological sense in specific contexts.
In summary, please do not plug in rounded estimates of gene counts from kallisto into DESeq2 and other tools. While it is technically possible, it is not statistically advisable. Instead, you should use tools that make valid distributional assumptions about the estimates.
However, Charlotte Soneson, Mike Love and Mark Robinson showed in a f1000 paper: Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences that rounded values from transcript level can be fed into DESeq2 etc for gene-level differential expression, and it is valid and preferable in many ways.
Thanks Rob Patro for pointing it out!
- artemis: RNAseq analysis, from raw reads to pathways, typically in a few minutes. Mostly by wrapping
Kallisto
and caching everything we possibly can. - isolator:Rapid and robust analysis of RNA-Seq experiments.
Isolator has a particular focus on producing stable, consistent estimates. Maximum likelihood approaches produce unstable point estimates: small changes in the data can result in drastically different results, conflating downstream analysis like clustering or PCA. Isolator produces estimates that are in general, simultaneously more stable and more accurate other methods
Circular RNA
- a bunch of tools from Dieterich Lab in github.
Batch effects
- ComBat-seq is a batch effect adjustment tool for bulk RNA-seq count data. It is an improved model based on the popular ComBat. "In addition, ComBat-seq provides adjusted data which preserves the integer nature of counts, so that the adjusted data are compatible with the assumptions of state-of-the-art differential expression software (e.g. edgeR, DESeq2, which specifically request untransformed count data)"
- TACKLING BATCH EFFECTS AND BIAS IN TRANSCRIPT EXPRESSION by mike love
paper:Tackling the widespread and critical impact of batch effects in high-throughput data by Jeffrey T. Leek in Rafael A. Irizarry's lab. - A reanalysis of mouse ENCODE comparative gene expression data
- Is it species or is it batch? They are confounded, so we can't know
- Mouse / Human Transcriptomics and Batch Effects
- Meta-analysis of RNA-seq expression data across species, tissues and studies:Interspecies clustering by tissue is the predominantly observed pattern among various studies under various distance metrics and normalization methods
- Surrogate Variable Analysis:SVA bioconductor
- Paper Summary: Systematic bias and batch effects in single-cell RNA-Seq data
- Modeling and correcting fragment sequence bias for RNA-seq: alpine bioconductor package from Mike Love.
- BatchQC: interactive software for evaluating sample and batch effects in genomic data.
- A framework for RNA quality correction in differential expression analysis
Databases
- Fetch run information from the European Nucleotide Archive (ENA) A command line tool from Lior Pachter lab.
- Curation of over 10,000 transcriptomic studies to enable data reuse
- MiPanda is an online resource for the interrogation and visualization of gene expression data from the myriad of publicly available cancer and normal next generation sequencing datasets.
- KnockTF a comprehensive human gene expression profile database with knockdown/knockout of transcription factors
- BioJupies Automatically Generates RNA-seq Data Analysis Notebooks With BioJupies you can produce in seconds a customized, reusable, and interactive report from your own raw or processed RNA-seq data through a simple user interface
- RNA meta analysis has ~26,700 studies (5,717 RNA-Seq and 20,955 Microarray). https://rnama.com/ Based on 750 manually labeled studies, our clustering algorithm correctly identifies 91% of sample groups.
- refine.biowill have harmonized over 60,000 gene expression experiments
- ReCount is an online resource consisting of RNA-seq gene count datasets built using the raw data from 18 different studies updated version here
- Recount2-FANTOM Recounting the FANTOM Cage Associated Transcriptome. Long non-coding RNAs (lncRNAs.
- The conquer (consistent quantification of external rna-seq data) repository is developed by Charlotte Soneson and Mark D Robinson at the University of Zurich, Switzerland. single cell RNA-seq data sets.
- The Lair: a resource for exploratory analysis of published RNA-Seq data. From Lior Pachter group!
- The Digital Expression Explorer(dee) The Digital Expression Explorer (DEE) is a repository of digital gene expression profiles mined from public RNA-seq data sets. These data are obtained from NCBI Short Read Archive.
blog post for it - dee2 Digital Expression Explorer 2. Digital Expression Explorer 2 (DEE2) is a repository of uniformly processed RNA-seq data mined from public data obtained from NCBI Short Read Archive. By Ziemann Mark et.al! Version 2 of dee.
- Extracting allelic read counts from 250,000 human sequencing runs in Sequence Read Archive data
- SHARQ Search public, human, RNA-seq experiments by cell, tissue type, and other features | Indexing 19807 files
- MetaSRA: normalized sample-specific metadata for the Sequence Read Archive
- ARCHS4: Massive Mining of Publicly Available RNA-seq Data from Human and Mouse ARCHS4 provides access to gene counts from HiSeq 2000, HiSeq 2500 and NextSeq 500 platforms for human and mouse experiments from GEO and SRA.
- iDEP-reads: Uniformlly processed public RNA-Seq data Read counts data for 5,470 human and mouse datasets from ARCHS4 v6 and 12,670 datasets from DEE2 for 9 model organisms by steven Ge.
- SRA-explorer This tool aims to make datasets within the Sequence Read Archive more accessible.
- OmicIDX on BigQuery by Sean Davis who develped SRAdb at NIH. In practice, the OmicIDX mines data from the NCBI Sequence Read Archive (SRA) and NCBI Biosample databases (updated daily).
- RESTful RNA-seq Analysis API A simple RESTful API to access analysis results of all public RNAseq data for nearly 200 species in European Nucleotide Archive.
- intropolis is a list of exon-exon junctions found across 21,504 human RNA-seq samples on the Sequence Read Archive (SRA) from spliced read alignment to hg19 with Rail-RNA. Two files are provided:
- ExpressionAtlas bioconductor package:
This package is for searching for datasets in EMBL-EBI Expression Atlas, and downloading them into R for further analysis. Each Expression Atlas dataset is represented as a SimpleList object with one element per platform. Sequencing data is contained in a SummarizedExperiment object, while microarray data is contained in an ExpressionSet or MAList object.
- GTEx Resources in the UCSC Browser signal track on trackhub
- batch recompute ~20,000 RNA-seq samples from larget sequencing project such as TCGA, TARGET and GETEX. Used
hg38
andgencode v21
as annotation. - A cloud-based workflow to quantify transcript-expression levels in public cancer compendia used kallisto for TCGA/CCLE datasets and gencode v24 as annotation.
- OMics Compendia Commons (OMiCC) OMiCC is a community-based, biologist-friendly web platform for creating and (meta-) analyzing annotated gene-expression data compendia across studies and technology platforms for more than 24,000 human and mouse studies from Gene Expression Omnibus (GEO)
- GEIPA interactively explore TCGA expression data, survival etc
- GEOdiver An easy to use web tool for analysing GEO datasets.
- ScanGEO - parallel mining of high-throughput gene expression data
- shinyGEO a web-based application for performing differential expression and survival analysis on Gene Expression Omnibus datasets.
- GREIN: An interactive web platform for re-analyzing GEO RNA-seq data
- ImaGEO Integrative Meta-Analysis of GEO Data.
- Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants It consists of selected microarray and RNA-sequencing studies from ArrayExpress, which have been manually curated, annotated with ontology terms, checked for high quality and processed using standardised analysis methods. Since the last update, Atlas has grown seven-fold (1572 studies as of August 2015), and incorporates baseline expression profiles of tissues from Human Protein Atlas, GTEx and FANTOM5, and of cancer cell lines from ENCODE, CCLE and Genentech projects.
- DCTD Releases A New Resource for Exploring Cell Line Transcriptional Responses to Anti-Cancer Agents: The NCI Transcriptional Pharmacodynamics Workbench
- scRNASeqDB a database for gene expression profiling in human single cell by RNA-seq
- JingleBells - A repository of standardized single cell RNA-Seq datasets for analysis and visualization in IGV of the raw reads at the single cell level. Currently focused on immune cells. (http://www.jimmunol.org/content/198/9/3375.long)
Gene Set enrichment analysis
- Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
- Tutorial: RNA-seq differential expression & pathway analysis with Sailfish, DESeq2, GAGE, and Pathview
- Metascape a web server for gene-set analysis.
- GSEA from Broad Institute.
- singscore is an R/Bioconductor package which implements the simple single-sample gene-set (or gene-signature) scoring method proposed by Foroutan et al. (2018). It uses rank-based statistics to analyze each sample's gene expression profile and scores the expression activities of gene sets at a single-sample level.
Pathway analysis
- Statistical analysis and visualization of functional profiles for gene and gene clusters: bioconductor clusterProfiler by GuangChuang Yu from University of HongKong. Can do many jobs and GSEA like figure. It is very useful and I will give it a try besides
- GAGE.
- DAVID:The Database for Annotation, Visualization and Integrated Discovery (DAVID ). UPDATED in 2016!!!
- EGSEA Ensemble of Gene Set Enrichment Analyses. By Gordon Smith. take a look!
- DESeq to fgsea tutorial by Stephen Turner.
- Lightweight Iterative Gene set Enrichment in R (LIGER) by Jean Fan.
- Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap 2019 Nature Protocol
Fusion gene detection
- Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods Overall, STAR-Fusion, Arriba, and STAR-SEQR are the most accurate and fastest for fusion detection on cancer transcriptomes.
- MINTIE: identifying novel structural and splice variants in transcriptomes using RNA-seq data
- arribaFast and accurate gene fusion detection from RNA-Seq data. top performer of ICGC-TCGA DREAM competition.
- fusioncatcher
- PRADA from our lab
- Fusion Matcher: Match predicted fusions according to chromosomal location or gene annotation(s)
- paper:Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data
- paper: Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data
- chimera A package for secondary analysis of fusion products.
- Pegasus Fusion Annotation and Prediction.
- Oncofuse is a framework designed to estimate the oncogenic potential of de-novo discovered gene fusions. It uses several hallmark features and employs a bayesian classifier to provide the probability of a given gene fusion being a driver mutation.
- chimeraviz Visualization tools for gene fusions.
- SQUID: Transcriptomic Structural Variation Detection from RNA-seq
Alternative splicing
- SplicePlot: a tool for visualizing alternative splicing Sashimi plots
- Multivariate Analysis of Transcript Splicing (MATS)
- SNPlice is a software tool to find and evaluate the co-occurrence of single-nucleotide-polymorphisms (SNP) and altered splicing in next-gen mRNA sequence reads. SNPlice requires, as input: genome aligned reads, exon-intron-exon junctions, and SNPs. exon-intron-exon junctions and SNPs may be derived from the reads directly, using, for example, TopHat2 and samtools, or they may be derived from independent sources
- Visualizing Alternative Splicing github page
- spladder Tool for the detection and quantification of alternative splicing events from RNA-Seq data
- SUPPA This tool generates different Alternative Splicing (AS) events and calculates the PSI ("Percentage Spliced In") value for each event exploiting the fast quantification of transcript abundances from multiple samples.
- IntSplice: Upload a VCF (variant call format) file to predict if an SNV (single nucleotide variation) from intronic positions -50 to -3 is pathogenic or not.
- Whippet: an efficient method for the detection and quantification of alternative splicing reveals extensive transcriptomic complexity
- RNA splicing analysis using heterogeneous and large RNA-seq datasets MAJIQ v2.
microRNAs and non-coding RNAs
- miARma-Seq workflow miRNA-Seq And RNA-Seq Multiprocess Analysis tool, a comprehensive pipeline analysis suite designed for mRNA, miRNA and circRNA identification and differential expression analysis, applicable to any sequenced organism.
- [All the tools you need to analyse your miRNAs:tools4miRNAs
- paper Evaluation of microRNA alignment techniques
- protocol: Analysis RNA-seq and Noncoding RNA
transcriptional pausing
- GRO-seq
- RNApol2 ChIP-seq
- iRNA-seq: computational method for genome-wide assessment of acute transcriptional regulation from total RNA-seq data
intron retention
- IRFinder
- iread read this paper comparing IRFinder https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-6541-0
- S-IRFindeR: stable and accurate measurement of intron retention
Allel specific expression
- paper Tools and best practices for data processing in allelic expression analysis
- paper Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression
- paper Tools and best practices for data processing in allelic expression analysis
immnune related
- ImReP is a computational method for rapid and accurate profiling of the adaptive immune repertoire from regular RNA-Seq data.
- Comprehensive analyses of tumor immunity: implications for cancer immunotherapy
- pVAC-Seq is a cancer immunotherapy pipeline for the identification of personalized Variant Antigens by Cancer Sequencing (pVAC-Seq) that integrates tumor mutation and expression data (DNA- and RNA-Seq). It enables cancer immunotherapy research by using massively parallel sequence data to predicting tumor-specific mutant peptides (neoantigens) that can elicit anti-tumor T cell immunity.
- [JingleBells] (http://jinglebells.bgu.ac.il/) - A repository of standardized single cell RNA-Seq datasets for analysis and visualization in IGV of the raw reads at the single cell level. Currently focused on immune cells. (http://www.jimmunol.org/content/198/9/3375.long)
- immunedeconv - an R package for unified access to computational methods for estimating immune cell fractions from bulk RNA sequencing data.
Reads from xenografts
- Xenosplit XenoSplit is a fast computational tool to detect the true origin of the graft RNA-Seq and DNA-Seq libraries prior to profiling of patient-derived xenografts (PDXs).
single cell tutorials
- Course material in notebook format for learning about single cell bioinformatics methods
- Analysis of single cell RNA-seq data course, Cambridge University Great tutorial!
- f1000 workflow paper A step-by-step workflow for low-level analysis of single-cell RNA-seq data by Aaron Lun, the athour of diffHiC, GenomicInteractions and csaw.
- 2016 Bioconductor workshop: Analysis of single-cell RNA-seq data with R and Bioconductor
- paper: Single-Cell Transcriptomics Bioinformatics and Computational Challenges
single cell RNA-seq normalization
- paper: Assessment of single cell RNA-seq normalization methods
- paper: A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications
- Normalizing single-cell RNA sequencing data: challenges and opportunities Nature Methods
- SinQC: A Method and Tool to Control Single-cell RNA-seq Data Quality.
- Scone Single-Cell Overview of Normalized Expression data
single cell batch effect
- Overcoming confounding plate effects in differential expression analyses of single-cell RNA-seq data
Single cell RNA-seq
- a collection of single RNA-seq tools by Sean Davis
- paper: Design and computational analysis of single-cell RNA-sequencing experiments
- paper by Mark Robinson: Bias, Robustness And Scalability In Differential Expression Analysis Of Single-Cell RNA-Seq Data
Considerable differences are found between the methods in terms of the number and characteristics of the genes that are called differentially expressed. Pre-filtering of lowly expressed genes can have important effects on the results, particularly for some of the methods originally developed for analysis of bulk RNA-seq data. Generally, however, methods developed for bulk RNA-seq analysis do not perform notably worse than those developed specifically for scRNA-seq.
- paper: Power Analysis of Single Cell RNA‐Sequencing Experiments
- paper: The contribution of cell cycle to heterogeneity in single-cell RNA-seq data
- paper: Batch effects and the effective design of single-cell gene expression studies
- On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data
- paper: Comparison of methods to detect differentially expressed genes between single-cell populations
- review: Single-cell genome sequencing: current state of the science
- Ginkgo A web tool for analyzing single-cell sequencing data.
- SingleCellExperiment bioc package Defines a S4 class for storing data from single-cell experiments. This includes specialized methods to store and retrieve spike-in information, dimensionality reduction coordinates and size factors for each cell, along with the usual metadata for genes and libraries.
- ASAP: a Web-based platform for the analysis and inter-active visualization of single-cell RNA-seq data
- FASTGenomics, an online platform to share single-cell RNA sequencing data and perform analyses using reproducible workflows. Users can upload their own data and use standard or customized workflows for the exploration and analysis of gene expression data (Scholz et al. 2018).
- Seurat is an R package designed for the analysis and visualization of single cell RNA-seq data. It contains easy-to-use implementations of commonly used analytical techniques, including the identification of highly variable genes, dimensionality reduction (PCA, ICA, t-SNE), standard unsupervised clustering algorithms (density clustering, hierarchical clustering, k-means), and the discovery of differentially expressed genes and markers.
- R package for the statistical assessment of cell state hierarchies from single-cell RNA-seq data
- Monocle Differential expression and time-series analysis for single-cell RNA-Seq and qPCR experiments.
- Single Cell Differential Expression: bioconductor package scde
- Sincera:A Computational Pipeline for Single Cell RNA-Seq Profiling Analysis. Bioconductor package will be available soon.
- MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data
- scDD: A statistical approach for identifying differential distributions in single-cell RNA-seq experiments
- Inference and visualisation of Single-Cell RNA-seq Data data as a hierarchical tree structure: bioconductor CellTree
- Fast and accurate single-cell RNA-Seq analysis by clustering of transcript-compatibility counts by Lior Pachter et.al
- cellity: Classification of low quality cells in scRNA-seq data using R.
- bioconductor: using scran to perform basic analyses of single-cell RNA-seq data
- scater: single-cell analysis toolkit for expression with R
- Monovar: single-nucleotide variant detection in single cells
- paper: Comparison of methods to detect differentially expressed genes between single-cell populations
- Single-cell mRNA quantification and differential analysis with Census
- CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data
- CellView: Interactive Exploration Of High Dimensional Single Cell RNA-Seq Data
single cell RNA-seq clustering
- Geometry of the Gene Expression Space of Individual Cells
- pcaReduce: Hierarchical Clustering of Single Cell Transcriptional Profiles.
- CountClust: Clustering and Visualizing RNA-Seq Expression Data using Grade of Membership Models. Fits grade of membership models (GoM, also known as admixture models) to cluster RNA-seq gene expression count data, identifies characteristic genes driving cluster memberships, and provides a visual summary of the cluster memberships
- FastProject: A Tool for Low-Dimensional Analysis of Single-Cell RNA-Seq Data
- SNN-Cliq Identification of cell types from single-cell transcriptomes using a novel clustering method
- Compare clusterings for single-cell sequencing bioconductor package.The goal of this package is to encourage the user to try many different clustering algorithms in one package structure. We give tools for running many different clusterings and choices of parameters. We also provide visualization to compare many different clusterings and algorithm tools to find common shared clustering patterns.
- CIDR: Ultrafast and accurate clustering through imputation for single cell RNA-Seq data
- SC3- consensus clustering of single-cell RNA-Seq data. SC3 achieves high accuracy and robustness by consistently integrating different clustering solutions through a consensus approach. Tests on twelve published datasets show that SC3 outperforms five existing methods while remaining scalable, as shown by the analysis of a large dataset containing 44,808 cells. Moreover, an interactive graphical implementation makes SC3 accessible to a wide audience of users, and SC3 aids biological interpretation by identifying marker genes, differentially expressed genes and outlier cells.
nanostring
advance of scRNA-seq tech
- Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding no isolation of single cells needed!