• Stars
    star
    117
  • Rank 299,997 (Top 6 %)
  • Language Nextflow
  • License
    MIT License
  • Created over 4 years ago
  • Updated 23 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Assembly and intrahost/low-frequency variant calling for viral samples

nf-core/viralrecon nf-core/viralrecon

GitHub Actions CI Status GitHub Actions Linting Status AWS CI Cite with Zenodo

Nextflow run with conda run with docker run with singularity Launch on Nextflow Tower

Get help on SlackFollow on TwitterWatch on YouTube

Introduction

nf-core/viralrecon is a bioinformatics analysis pipeline used to perform assembly and intra-host/low-frequency variant calling for viral samples. The pipeline supports both Illumina and Nanopore sequencing data. For Illumina short-reads the pipeline is able to analyse metagenomics data typically obtained from shotgun sequencing (e.g. directly from clinical samples) and enrichment-based library preparation methods (e.g. amplicon-based: ARTIC SARS-CoV-2 enrichment protocol; or probe-capture-based). For Nanopore data the pipeline only supports amplicon-based analysis obtained from primer sets created and maintained by the ARTIC Network.

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from running the full-sized tests individually for each --platform option can be viewed on the nf-core website and the output directories will be named accordingly i.e. platform_illumina/ and platform_nanopore/.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

Pipeline summary

The pipeline has numerous options to allow you to run only specific aspects of the workflow if you so wish. For example, for Illumina data you can skip the host read filtering step with Kraken 2 with --skip_kraken2 or you can skip all of the assembly steps with the --skip_assembly parameter. See the usage and parameter docs for all of the available options when running the pipeline.

The SRA download functionality has been removed from the pipeline (>=2.1) and ported to an independent workflow called nf-core/fetchngs. You can provide --nf_core_pipeline viralrecon when running nf-core/fetchngs to download and auto-create a samplesheet containing publicly available samples that can be accepted directly by the Illumina processing mode of nf-core/viralrecon.

A number of improvements were made to the pipeline recently, mainly with regard to the variant calling. Please see Major updates in v2.3 for a more detailed description.

Illumina

nf-core/viralrecon Illumina metro map

  1. Merge re-sequenced FastQ files (cat)
  2. Read QC (FastQC)
  3. Adapter trimming (fastp)
  4. Removal of host reads (Kraken 2; optional)
  5. Variant calling
    1. Read alignment (Bowtie 2)
    2. Sort and index alignments (SAMtools)
    3. Primer sequence removal (iVar; amplicon data only)
    4. Duplicate read marking (picard; optional)
    5. Alignment-level QC (picard, SAMtools)
    6. Genome-wide and amplicon coverage QC plots (mosdepth)
    7. Choice of multiple variant callers (iVar variants; default for amplicon data || BCFTools; default for metagenomics data)
    8. Choice of multiple consensus callers (BCFTools, BEDTools; default for both amplicon and metagenomics data || iVar consensus)
      • Consensus assessment report (QUAST)
      • Lineage analysis (Pangolin)
      • Clade assignment, mutation calling and sequence quality checks (Nextclade)
    9. Create variants long format table collating per-sample information for individual variants (BCFTools), functional effect prediction (SnpSift) and lineage analysis (Pangolin)
  6. De novo assembly
    1. Primer trimming (Cutadapt; amplicon data only)
    2. Choice of multiple assembly tools (SPAdes || Unicycler || minia)
  7. Present QC and visualisation for raw read, alignment, assembly and variant calling results (MultiQC)

Nanopore

nf-core/viralrecon Nanopore metro map

  1. Sequencing QC (pycoQC)
  2. Aggregate pre-demultiplexed reads from MinKNOW/Guppy (artic guppyplex)
  3. Read QC (NanoPlot)
  4. Align reads, call variants and generate consensus sequence (artic minion)
  5. Remove unmapped reads and obtain alignment metrics (SAMtools)
  6. Genome-wide and amplicon coverage QC plots (mosdepth)
  7. Downstream variant analysis:
    • Count metrics (BCFTools)
    • Variant annotation (SnpEff, SnpSift)
    • Consensus assessment report (QUAST)
    • Lineage analysis (Pangolin)
    • Clade assignment, mutation calling and sequence quality checks (Nextclade)
    • Individual variant screenshots with annotation tracks (ASCIIGenome)
    • Create variants long format table collating per-sample information for individual variants (BCFTools), functional effect prediction (SnpSift) and lineage analysis (Pangolin)
  8. Present QC, visualisation and custom reporting for sequencing, raw reads, alignment and variant calling results (MultiQC)

Quick Start

  1. Install Nextflow (>=22.10.1)

  2. Install any of Docker, Singularity (you can follow this tutorial), Podman, Shifter or Charliecloud for full pipeline reproducibility (you can use Conda both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs).

  3. Download the pipeline and test it on a minimal dataset with a single command:

    nextflow run nf-core/viralrecon -profile test,YOURPROFILE --outdir <OUTDIR>

    Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (YOURPROFILE in the example command above). You can chain multiple config profiles in a comma-separated string.

    • The pipeline comes with config profiles called docker, singularity, podman, shifter, charliecloud and conda which instruct the pipeline to use the named tool for software management. For example, -profile test,docker.
    • Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile <institute> in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.
    • If you are using singularity, please use the nf-core download command to download images first, before running the pipeline. Setting the NXF_SINGULARITY_CACHEDIR or singularity.cacheDir Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.
    • If you are using conda, it is highly recommended to use the NXF_CONDA_CACHEDIR or conda.cacheDir settings to store the environments in a central location for future pipeline runs.
  4. Start running your own analysis!

    • Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.
    • Typical command for Illumina shotgun analysis:

      nextflow run nf-core/viralrecon \
          --input samplesheet.csv \
          --outdir <OUTDIR> \
          --platform illumina \
          --protocol metagenomic \
          --genome 'MN908947.3' \
          -profile <docker/singularity/podman/conda/institute>
    • Typical command for Illumina amplicon analysis:

      nextflow run nf-core/viralrecon \
          --input samplesheet.csv \
          --outdir <OUTDIR> \
          --platform illumina \
          --protocol amplicon \
          --genome 'MN908947.3' \
          --primer_set artic \
          --primer_set_version 3 \
          --skip_assembly \
          -profile <docker/singularity/podman/conda/institute>
    • Typical command for Nanopore amplicon analysis:

      nextflow run nf-core/viralrecon \
          --input samplesheet.csv \
          --outdir <OUTDIR> \
          --platform nanopore \
          --genome 'MN908947.3' \
          --primer_set_version 3 \
          --fastq_dir fastq_pass/ \
          --fast5_dir fast5_pass/ \
          --sequencing_summary sequencing_summary.txt \
          -profile <docker/singularity/podman/conda/institute>
    • An executable Python script called fastq_dir_to_samplesheet.py has been provided if you are using --platform illumina and would like to auto-create an input samplesheet based on a directory containing FastQ files before you run the pipeline (requires Python 3 installed locally) e.g.

      wget -L https://raw.githubusercontent.com/nf-core/viralrecon/master/bin/fastq_dir_to_samplesheet.py
      ./fastq_dir_to_samplesheet.py <FASTQ_DIR> samplesheet.csv
    • You can find the default keys used to specify --genome in the genomes config file. This provides default options for

      • Reference genomes (including SARS-CoV-2)

      • Genome associates primer sets

      • Nextclade datasets

        The Pangolin and Nextclade lineage and clade definitions change regularly as new SARS-CoV-2 lineages are discovered. For instructions to use more recent versions of lineage analysis tools like Pangolin and Nextclade please refer to the updating containers section in the usage docs.

      Where possible we are trying to collate links and settings for standard primer sets to make it easier to run the pipeline with standard keys; see usage docs.

Documentation

The nf-core/viralrecon pipeline comes with documentation about the pipeline usage, parameters and output.

Credits

These scripts were originally written by Sarai Varona, Miguel JuliΓ‘, Erika Kvalem and Sara Monzon from BU-ISCIII and co-ordinated by Isabel Cuesta for the Institute of Health Carlos III, Spain. Through collaboration with the nf-core community the pipeline has now been updated substantially to include additional processing steps, to standardise inputs/outputs and to improve pipeline reporting; implemented and maintained primarily by Harshil Patel (@drpatelh) from Seqera Labs, Spain.

The key steps in the Nanopore implementation of the pipeline are carried out using the ARTIC Network's field bioinformatics pipeline and were inspired by the amazing work carried out by contributors to the connor-lab/ncov2019-artic-nf pipeline originally written by Matt Bull for use by the COG-UK project. Thank you for all of your incredible efforts during this pandemic!

Many thanks to others who have helped out and contributed along the way too, including (but not limited to)*:

Name Affiliation
Aengus Stewart The Francis Crick Institute, UK
Alexander Peltzer Boehringer Ingelheim, Germany
Alison Meynert University of Edinburgh, Scotland
Anthony Underwood Centre for Genomic Pathogen Surveillance
Anton Korobeynikov Saint Petersburg State University, Russia
Artem Babaian University of British Columbia, Canada
Dmitry Meleshko Saint Petersburg State University, Russia
Edgar Garriga Nogales Centre for Genomic Regulation, Spain
Erik Garrison UCSC, USA
Gisela Gabernet QBiC, University of TΓΌbingen, Germany
Joao Curado Flomics Biotech, Spain
Jerome Nicod The Francis Crick Institute, UK
Jose Espinosa-Carrasco Centre for Genomic Regulation, Spain
Katrin Sameith DRESDEN-concept Genome Center, Germany
Kevin Menden QBiC, University of TΓΌbingen, Germany
Lluc Cabus Flomics Biotech, Spain
Marta Pozuelo Flomics Biotech, Spain
Maxime Garcia Seqera Labs, Spain
Michael Heuer UC Berkeley, USA
Phil Ewels SciLifeLab, Sweden
Richard Mitter The Francis Crick Institute, UK
Robert Goldstone The Francis Crick Institute, UK
Simon Heumos QBiC, University of TΓΌbingen, Germany
Stephen Kelly Memorial Sloan Kettering Cancer Center, USA
Thanh Le Viet Quadram Institute, UK

* Listed in alphabetical order

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #viralrecon channel (you can join with this invite).

Citations

If you use nf-core/viralrecon for your analysis, please cite it using the following doi: 10.5281/zenodo.3901628

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

More Repositories

1

rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
Nextflow
863
star
2

sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
Nextflow
388
star
3

modules

Repository to host tool-specific module files for the Nextflow DSL2 community!
Nextflow
276
star
4

tools

Python package with helper tools for the nf-core community.
Python
232
star
5

mag

Assembly and binning of metagenomes
Nextflow
208
star
6

scrnaseq

A single-cell RNAseq pipeline for 10X genomics data
Nextflow
204
star
7

chipseq

ChIP-seq peak-calling, QC and differential analysis pipeline.
Nextflow
186
star
8

atacseq

ATAC-seq peak-calling and QC analysis pipeline
Nextflow
179
star
9

ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
Nextflow
179
star
10

nanoseq

Nanopore demultiplexing, QC and alignment pipeline
Nextflow
168
star
11

eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
Nextflow
141
star
12

rnafusion

RNA-seq analysis pipeline for detection of gene-fusions
Nextflow
138
star
13

methylseq

Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel
Nextflow
137
star
14

taxprofiler

Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data
Nextflow
118
star
15

test-datasets

Test data to be used for automated testing with the nf-core pipelines
100
star
16

configs

Config files used to define parameters specific to compute environments at different Institutions
Nextflow
85
star
17

hic

Analysis of Chromosome Conformation Capture data (Hi-C)
Nextflow
85
star
18

raredisease

Call and score variants from WGS/WES of rare disease patients.
Nextflow
81
star
19

cutandrun

Analysis pipeline for CUT&RUN and CUT&TAG experiments that includes QC, support for spike-ins, IgG controls, peak calling and downstream analysis.
Nextflow
79
star
20

smrnaseq

A small-RNA sequencing analysis pipeline
Nextflow
72
star
21

funcscan

(Meta-)genome screening for functional and natural product gene sequences
Nextflow
69
star
22

website

Code and files for the main nf-core website.
Astro
65
star
23

pangenome

Renders a collection of sequences into a pangenome graph.
Nextflow
63
star
24

hlatyping

Precision HLA typing from next-generation sequencing data
Nextflow
60
star
25

bacass

Simple bacterial assembly and annotation pipeline
Nextflow
60
star
26

differentialabundance

Differential abundance analysis for feature/ observation matrices from platforms such as RNA-seq
Nextflow
56
star
27

airrflow

B-cell and T-cell Adaptive Immune Receptor Repertoire (AIRR) sequencing analysis pipeline using the Immcantation framework
Nextflow
52
star
28

bactmap

A mapping-based pipeline for creating a phylogeny from bacterial whole genome sequences
Nextflow
51
star
29

proteinfold

Protein 3D structure prediction pipeline
Nextflow
49
star
30

spatialvi

Pipeline for processing spatially-resolved gene counts with spatial coordinates and image data. Designed for 10x Genomics Visium transcriptomics.
Nextflow
48
star
31

circrna

circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data
Nextflow
43
star
32

epitopeprediction

A bioinformatics best-practice analysis pipeline for epitope prediction and annotation
Nextflow
41
star
33

demultiplex

Demultiplexing pipeline for sequencing data
Nextflow
41
star
34

rnasplice

rnasplice is a bioinformatics pipeline for RNA-seq alternative splicing analysis
Nextflow
41
star
35

oncoanalyser

A comprehensive cancer DNA/RNA analysis and reporting pipeline
Nextflow
39
star
36

rnavar

gatk4 RNA variant calling pipeline
Nextflow
33
star
37

proteomicslfq

Proteomics label-free quantification (LFQ) analysis pipeline
Nextflow
33
star
38

mhcquant

Identify and quantify MHC eluted peptides from mass spectrometry raw data
Nextflow
32
star
39

circdna

Pipeline for the identification of extrachromosomal circular DNA (ecDNA) from Circle-seq, WGS, and ATAC-seq data that were generated from cancer and other eukaryotic cells.
Python
28
star
40

isoseq

Genome annotation with PacBio Iso-Seq. Takes raw subreads as input, generate Full Length Non Chemiric (FLNC) sequences and produce a bed annotation.
Nextflow
27
star
41

readsimulator

A pipeline to simulate sequencing reads, such as Amplicon, Target Capture, Metagenome, and Whole genome data.
Nextflow
27
star
42

scdownstream

A single cell transcriptomics pipeline for QC, integration and making the data presentable
Nextflow
25
star
43

crisprseq

A pipeline for the analysis of CRISPR edited data. It allows the evaluation of the quality of gene editing experiments using targeted next generation sequencing (NGS) data (`targeted`) as well as the discovery of important genes from knock-out or activation CRISPR-Cas9 screens using CRISPR pooled DNA (`screening`).
Nextflow
25
star
44

imcyto

Image Mass Cytometry analysis pipeline
Python
24
star
45

metatdenovo

Assembly and annotation of metatranscriptomic or metagenomic data for prokaryotic, eukaryotic and viruses.
Nextflow
21
star
46

bamtofastq

Converts bam or cram files to fastq format and does quality control.
Nextflow
21
star
47

gwas

UNDER CONSTRUCTION: A pipeline for Genome Wide Association Studies
Nextflow
21
star
48

hgtseq

A pipeline to investigate horizontal gene transfer from NGS data
Nextflow
21
star
49

clipseq

CLIP sequencing analysis pipeline for QC, pre-mapping, genome mapping, UMI deduplication, and multiple peak-calling options.
Nextflow
19
star
50

cookiecutter

DEPRECIATED! Please use nf-core/tools instead
Nextflow
19
star
51

kmermaid

k-mer similarity analysis pipeline
Nextflow
19
star
52

dualrnaseq

Analysis of Dual RNA-seq data - an experimental method for interrogating host-pathogen interactions through simultaneous RNA-seq.
Nextflow
18
star
53

nascent

Nascent Transcription Processing Pipeline
Nextflow
18
star
54

fastquorum

Pipeline to produce consensus reads using unique molecular indexes/barcodes (UMIs)
Nextflow
18
star
55

genomeannotator

Pipeline for the identification of (coding) gene structures in draft genomes.
Nextflow
18
star
56

exoseq

Please consider using/contributing to https://github.com/nf-core/sarek
Nextflow
16
star
57

phaseimpute

Nextflow
16
star
58

metaboigniter

Pre-processing of mass spectrometry-based metabolomics data with quantification and identification based on MS1 and MS2 data.
Nextflow
16
star
59

genomeassembler

Nextflow
15
star
60

viralintegration

Analysis pipeline for the identification of viral integration events in genomes using a chimeric read approach.
Python
15
star
61

vipr

Assembly and intrahost / low-frequency variant calling for viral samples
Nextflow
14
star
62

multiplesequencealign

A pipeline to run and systematically evaluate Multiple Sequence Alignment (MSA) methods.
Nextflow
13
star
63

nanostring

An analysis pipeline for Nanostring nCounter expression data.
Nextflow
13
star
64

scnanoseq

Single-cell/nuclei pipeline for data derived from Oxford Nanopore and 10X Genomics
Nextflow
13
star
65

detaxizer

A pipeline to identify (and remove) certain sequences from raw genomic data. Default taxa to identify (and remove) are Homo and Homo sapiens. Removal is optional.
Nextflow
13
star
66

diaproteomics

Automated quantitative analysis of DIA proteomics mass spectrometry measurements.
Nextflow
13
star
67

vscode-extensionpack

A VSCode extension pack for nf-core developers.
13
star
68

cageseq

CAGE-sequencing analysis pipeline with trimming, alignment and counting of CAGE tags.
Nextflow
11
star
69

setup-nextflow

A GitHub action to install Nextflow
TypeScript
11
star
70

pathogensurveillance

Surveillance of pathogens using population genomics and sequencing
HTML
11
star
71

phageannotator

Pipeline for identifying, annotation, and quantifying phage sequences in (meta)-genomic sequences.
Nextflow
10
star
72

mnaseseq

MNase-seq analysis pipeline using BWA and DANPOS2.
Nextflow
10
star
73

coproid

Coprolite host Identification pipeline
Nextflow
9
star
74

molkart

A pipeline for processing Molecular Cartography data from Resolve Bioscience (combinatorial FISH)
Nextflow
9
star
75

variantbenchmarking

A nextflow variant benchmarking pipeline - premature
Nextflow
9
star
76

tfactivity

Bioinformatics pipeline that makes use of expression and open chromatin data to identify differentially active transcription factors across conditions.
Nextflow
9
star
77

variantcatalogue

Pipeline to generate variant catalogues, a list of variants and their frequencies in a population, from whole genome sequences.
Nextflow
8
star
78

metapep

From metagenomes to epitopes and beyond
Nextflow
8
star
79

pixelator

Pipeline to generate Molecular Pixelation data with Pixelator (Pixelgen Technologies AB)
Nextflow
8
star
80

datasync

nf-core/datasync is a system operation pipeline that provides several workflows for handling system operation / automation tasks
Nextflow
7
star
81

phyloplace

nf-core/phyloplace is a bioinformatics best-practice analysis pipeline that performs phylogenetic placement with EPA-NG.
Groovy
7
star
82

createtaxdb

Parallelised and automated construction of metagenomic classifier databases of different tools
Nextflow
7
star
83

mcmicro

An end-to-end processing pipeline that transforms multi-channel whole-slide images into single-cell data.
Nextflow
7
star
84

hicar

Pipeline for HiCAR data, a robust and sensitive multi-omic co-assay for simultaneous measurement of transcriptome, chromatin accessibility and cis-regulatory chromatin contacts.
Nextflow
6
star
85

prettier-plugin-nextflow

JavaScript
6
star
86

nf-core.github.io

Now replaced with a new website >>>
Ruby
6
star
87

radseq

Variant-calling pipeline for Restriction site-associated DNA sequencing (RADseq).
Nextflow
6
star
88

riboseq

Pipeline for the analysis of ribosome profiling, or Ribo-seq (also named ribosome footprinting) data.
Nextflow
6
star
89

omicsgenetraitassociation

A nextflow pipeline which integrates multiple omic data streams and performs coordinated analysis
Nextflow
5
star
90

createpanelrefs

Generate Panel of Normals, models or other similar references from lots of samples
Nextflow
5
star
91

gh-actions-lint

GitHub Action to run nf-core code linting on a Nextflow pipeline
Shell
5
star
92

crisprvar

[WIP] Evaluate outcomes from genome editing experiments
Nextflow
5
star
93

awsmegatests

CloudFormation templates to setup the aws megatests necessary cloud infrastructure
Shell
5
star
94

marsseq

MARS-seq v2 pre-processing pipeline with velocity
Nextflow
5
star
95

tbanalyzer

An nf-core (meta) pipeline for analysis of different members of Mycobacterium tuberculosis complex.
Nextflow
5
star
96

reportho

nf-core pipeline for comparative analysis of ortholog predictions
Nextflow
5
star
97

callingcards

A pipeline for processing calling cards data
Nextflow
4
star
98

nft-utils

nf-test utility functions
Shell
4
star
99

pairgenomealign

Pairwise genome comparison pipeline using the LAST software to align a list of query genomes to a target genome, and plot the results
Nextflow
4
star
100

testpipeline

A small example pipeline used to test new nf-core infrastructure and common code.
Nextflow
4
star