• Stars
    star
    133
  • Rank 272,600 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created about 5 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pipeline to find aberrant events in RNA-Seq data, useful for diagnosis of rare disorders

Detection of RNA Outlier Pipeline

DROP pipeline status Version Version

The detection of RNA Outliers Pipeline (DROP) is an integrative workflow to detect aberrant expression, aberrant splicing, and mono-allelic expression from raw sequencing files.

The manuscript is available in Nature Protocols. SharedIt link.

drop logo

What's new

Versions 1.3.3, 1.3.2 and 1.3.1 fix some bugs. Version 1.3.0 introduces the option to use FRASER 2.0 which is an improved version of FRASER that uses the Intron Jaccard Index metric instead of percent spliced in and splicing efficiency to quantify and later call aberrant splicing. To run FRASER 2.0, modify the FRASER_version parameter in the aberrantSplicing dictionary in the config file and adapt the quantileForFiltering and deltaPsiCutoff parameters. See the config template for more details. When switching between FRASER versions, we recommend running DROP in a separate folder for each version. Moreover, DROP now allows users to provide lists of genes to focus on and do the multiple testing correction instead of the usual transcriptome-wide approach. Refer to the documentation.

Snakemake v.7.8 introduced some changes in which changes in parameters can cause rules to be re-executed. More info here. This affects DROP and causes certain rules in the AS and QC modules to be triggered even if they were already completed and there were no changes in the sample annotation or scripts. The workaround is to run DROP by adding the parameter --rerun-triggers mtime, e.g. snakemake -n --rerun-triggers mtime or snakemake --cores 10 --rerun-triggers mtime. We will investigate the rules in DROP to fix this.

Version 1.2.3 simplifies the plots in the AE Summary Script. In addition, there's a new heatmap in the sampleQC Summary that allows to better identify DNA-RNA mismatches.

As of version 1.2.1 DROP has a new module that performs RNA-seq variant calling. The input are BAM files and the output either a single-sample or a multi-sample VCF file (option specified by the user) annotated with allele frequencies from gnomAD (if specified by the user). The sample annotation table does not need to be changed, but several new parameters in the config file have to be added and tuned. For more info, refer to the documentation.

Also, as of version 1.2.1 the integration of external split and non-split counts to detect aberrant splicing is now possible. Simply specify in a new column in the sample annotation the directory containing the counts. For more info, refer to the documentation.

Quickstart

DROP is available on bioconda. We recommend using a dedicated conda environment (drop_env in this example). Installation time: ~ 10min.

mamba create -n drop_env -c conda-forge -c bioconda drop --override-channels

In the case of mamba/conda troubles we recommend using the fixed DROP_<version>.yaml installation file we make available on our public server. Install the current version and use the full path in the following command to install the conda environment drop_env

mamba env create -f DROP_1.3.3.yaml

Test installation with demo project

conda activate drop_env
mkdir ~/drop_demo
cd ~/drop_demo
drop demo

The pipeline can be run using snakemake commands

snakemake -n # dryrun
snakemake --cores 1

Expected runtime: 25 min

For more information on different installation options, refer to the documentation

Set up a custom project

Install the drop module according to installation and initialize the project in a custom project directory.

Prepare the input data

Create a sample annotation that contains the sample IDs, file locations and other information necessary for the pipeline. Edit the config file to set the correct file path of sample annotation and locations of non-sample specific input files. The requirements are described in the documentation.

Execute the pipeline

Once these files are set up, you can execute a dry run from your project directory

snakemake -n

This shows you the rules of all subworkflows. Omit -n and specify the number of cores with --cores if you are sure that you want you execute all printed rules. You can also invoke single workflows explicitly e.g. for aberrant expression with:

snakemake aberrantExpression --cores 10

Datasets

The following publicly-available datasets of gene counts can be used as controls. Please cite as instructed for each dataset.

  • 154 non strand-specific fibroblasts, build hg19, Technical University of Munich: DOI

  • 135 strand-specific fibroblasts, build hg19, high seq depth (116 million mapped reads), Technical University of Munich: DOI

  • 127 strand-specific fibroblasts, build hg19, low seq depth (70 million mapped reads), Technical University of Munich: DOI

  • 49 tissues, each containing hundreds of samples, non strand-specific, build hg19, GTEx: DOI

  • 49 tissues, each containing hundreds of samples, non strand-specific, build hg38, GTEx: DOI

  • 139 strand-specific fibroblasts, build hg19, Baylor College of Medicine: DOI

  • 125 strand-specific blood, build hg19, Baylor College of Medicine: DOI

  • 330 strand-specific induced pluripotent stem cells (iPSCs), build hg19, EMBL: DOI

  • 56 non strand-specific amniotic fluid cells, build hg19, The University of Hong Kong: DOI

If you want to contribute with your own count matrices, please contact us: yepez at in.tum.de

Citation

If you use DROP in research, please cite our manuscript.

Furthermore, if you use the aberrant expression module, also cite OUTRIDER; if you use the aberrant splicing module, also cite FRASER; and if you use the MAE module, also cite the Kremer, Bader et al study and DESeq2.

For the complete set of tools used by DROP (e.g. for counting), see the manuscript.

Acknowledgements and Funding

The DROP team is composed of members from the Gagneur lab at the Department of Informatics and School of Medicine of the Technical University of Munich (TUM) and The German Human Genome-Phenome Archive (GHGA). The team has been funded by the German Bundesministerium fĂźr Bildung und Forschung (BMBF) through the e:Med Networking fonds AbCD-Net, Medical Informatics Initiative CORD-MI, and ERA PerMed project PerMiM. We would like to thank all the users for their feedback.

More Repositories

1

OUTRIDER

OUTRIDER: OUTlier in RNA-seq fInDER is an R-based framework to find aberrantly expressed genes in RNA-seq data
R
48
star
2

MMSplice_MTSplice

Tissue-specific variant effect predictions on splicing
Jupyter Notebook
39
star
3

FRASER

FRASER - Find RAre Splicing Events in RNA-seq
R
36
star
4

absplice

Python
33
star
5

concise

Concise: Keras extension for regulatory genomics
Jupyter Notebook
32
star
6

FRASER-analysis

Accompanying analysis code for the FRASER manuscript
R
25
star
7

splicemap

Jupyter Notebook
18
star
8

SpeciesLM

Jupyter Notebook
13
star
9

spectralis

Python
9
star
10

OCR-Stats

R
6
star
11

MMSplice_paper

Analysis code of MMSplice paper
Jupyter Notebook
4
star
12

wBuild

R project report builder
Python
4
star
13

dependencies_DNALM

Code repository for the manuscript: Nucleotide dependency analysis of DNA language models reveals genomic functional elements
Jupyter Notebook
4
star
14

rCube

R
3
star
15

AbSplice_analysis

Python
3
star
16

OUTRIDER-analysis

Accompanying analysis repository for the OUTRIDER paper
TeX
3
star
17

Manuscript_Avsec_Bioinformatics_2017

Code for Avsec et al, Bioinformatics 2017
Jupyter Notebook
3
star
18

ALS-kaggle

Outlier prediction-based solution to Task1 of the End ALS Kaggle challenge
Jupyter Notebook
3
star
19

py_outrider

py_outrider: a generalized framework for context-dependent outlier detection in omics data
Python
2
star
20

gfeat

Python genomic features extractor from raw files
Python
2
star
21

autoCorrection

The autoencoder implementation for the OUTRIDER package
Jupyter Notebook
2
star
22

gene-embedding

Jupyter Notebook
2
star
23

Leukemia_outlier

R
2
star
24

dataviz

Data Analysis and Visualization in R
HTML
1
star
25

AbExp

Python
1
star
26

RNA_diagnostics_paper_figures

R
1
star
27

Manuscript_Cheng_RNA_2017

Analysis code for Cheng et al., RNA (2017)
HTML
1
star
28

Cellular_energy_codon_analysis

Code for the data processing and analyses present on the manuscript: "Cellular energy regulates mRNA translation and degradation in a codon-specific manner".
Jupyter Notebook
1
star
29

abexp-ukbb-trait-analysis

UK Biobank trait analysis with AbExp scores
Python
1
star