• Stars
    star
    141
  • Rank 259,971 (Top 6 %)
  • Language Nextflow
  • License
    MIT License
  • Created over 6 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A fully reproducible and state-of-the-art ancient DNA analysis pipeline

nf-core/eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline.

GitHub Actions CI Status GitHub Actions Linting Status Nextflow nf-core DOI Published in PeerJ

install with bioconda Docker Singularity Container available

Get help on Slack

Important

nf-core/eager versions 2.* are only compatible with Nextflow versions up to 22.10.6!

Introduction

nf-core/eager is a scalable and reproducible bioinformatics best-practise processing pipeline for genomic NGS sequencing data, with a focus on ancient DNA (aDNA) data. It is ideal for the (palaeo)genomic analysis of humans, animals, plants, microbes and even microbiomes.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. The pipeline pre-processes raw data from FASTQ inputs, or preprocessed BAM inputs. It can align reads and performs extensive general NGS and aDNA specific quality-control on the results. It comes with docker, singularity or conda containers making installation trivial and results highly reproducible.

nf-core/eager schematic workflow

Quick Start

  1. Install nextflow (>=20.07.1 && <=22.10.6)

  2. Install any of Docker, Singularity, Podman, Shifter or Charliecloud for full pipeline reproducibility (please only use Conda as a last resort; see docs)

  3. Download the pipeline and test it on a minimal dataset with a single command:

    nextflow run nf-core/eager -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>

    Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile <institute> in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.

  4. Start running your own analysis!

    nextflow run nf-core/eager -profile <docker/singularity/podman/conda/institute> --input '*_R{1,2}.fastq.gz' --fasta '<your_reference>.fasta'
  5. Once your run has completed successfully, clean up the intermediate files.

    nextflow clean -f -k

See usage docs for all of the available options when running the pipeline.

N.B. You can see an overview of the run in the MultiQC report located at ./results/MultiQC/multiqc_report.html

Modifications to the default pipeline are easily made using various options as described in the documentation.

Pipeline Summary

Default Steps

By default the pipeline currently performs the following:

  • Create reference genome indices for mapping (bwa, samtools, and picard)
  • Sequencing quality control (FastQC)
  • Sequencing adapter removal, paired-end data merging (AdapterRemoval)
  • Read mapping to reference using (bwa aln, bwa mem, CircularMapper, or bowtie2)
  • Post-mapping processing, statistics and conversion to bam (samtools)
  • Ancient DNA C-to-T damage pattern visualisation (DamageProfiler or mapDamage)
  • PCR duplicate removal (DeDup or MarkDuplicates)
  • Post-mapping statistics and BAM quality control (Qualimap)
  • Library Complexity Estimation (preseq)
  • Overall pipeline statistics summaries (MultiQC)

Additional Steps

Additional functionality contained by the pipeline currently includes:

Input

  • Automatic merging of complex sequencing setups (e.g. multiple lanes, sequencing configurations, library types)

Preprocessing

  • Illumina two-coloured sequencer poly-G tail removal (fastp)
  • Post-AdapterRemoval trimming of FASTQ files prior mapping (fastp)
  • Automatic conversion of unmapped reads to FASTQ (samtools)
  • Host DNA (mapped reads) stripping from input FASTQ files (for sensitive samples)

aDNA Damage manipulation

  • Damage removal/clipping for UDG+/UDG-half treatment protocols (BamUtil)
  • Damaged reads extraction and assessment (PMDTools)
  • Nuclear DNA contamination estimation of human samples (angsd)

Genotyping

  • Creation of VCF genotyping files (GATK UnifiedGenotyper, GATK HaplotypeCaller and FreeBayes)
  • Creation of EIGENSTRAT genotyping files (pileupCaller)
  • Creation of Genotype Likelihood files (angsd)
  • Consensus sequence FASTA creation (VCF2Genome)
  • SNP Table generation (MultiVCFAnalyzer)

Biological Information

  • Mitochondrial to Nuclear read ratio calculation (MtNucRatioCalculator)
  • Statistical sex determination of human individuals (Sex.DetERRmine)

Metagenomic Screening

  • Low-sequenced complexity filtering (BBduk)
  • Taxonomic binner with alignment (MALT)
  • Taxonomic binner without alignment (Kraken2)
  • aDNA characteristic screening of taxonomically binned data from MALT (MaltExtract)

Functionality Overview

A graphical overview of suggested routes through the pipeline depending on context can be seen below.

nf-core/eager metro map

Documentation

The nf-core/eager pipeline comes with documentation about the pipeline: usage and output.

  1. Nextflow installation
  2. Pipeline configuration
  3. Running the pipeline
    • This includes tutorials, FAQs, and troubleshooting instructions
  4. Output and how to interpret the results

Credits

This pipeline was mostly written by Alexander Peltzer (apeltzer) and James A. Fellows Yates, with contributions from Stephen Clayton, Thiseas C. Lamnidis, Maxime Borry, Zandra Fagernäs, Aida Andrades Valtueña and Maxime Garcia and the nf-core community.

We thank the following people for their extensive assistance in the development of this pipeline:

Authors (alphabetical)

Additional Contributors (alphabetical)

Those who have provided conceptual guidance, suggestions, bug reports etc.

If you've contributed and you're missing in here, please let us know and we will add you in of course!

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #eager channel (you can join with this invite).

Citations

If you use nf-core/eager for your analysis, please cite the eager preprint as follows:

Fellows Yates JA, Lamnidis TC, Borry M, Valtueña Andrades A, Fagernäs Z, Clayton S, Garcia MU, Neukamm J, Peltzer A. 2021. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ 9:e10947. DOI: 10.7717/peerj.10947.

You can cite the eager zenodo record for a specific version using the following doi: 10.5281/zenodo.3698082

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

In addition, references of tools and data used in this pipeline are as follows:

Data References

This repository uses test data from the following studies:

  • Fellows Yates, J. A. et al. (2017) ‘Central European Woolly Mammoth Population Dynamics: Insights from Late Pleistocene Mitochondrial Genomes’, Scientific reports, 7(1), p. 17714. doi: 10.1038/s41598-017-17723-1.
  • Gamba, C. et al. (2014) ‘Genome flux and stasis in a five millennium transect of European prehistory’, Nature communications, 5, p. 5257. doi: 10.1038/ncomms6257.
  • Star, B. et al. (2017) ‘Ancient DNA reveals the Arctic origin of Viking Age cod from Haithabu, Germany’, Proceedings of the National Academy of Sciences of the United States of America, 114(34), pp. 9152–9157. doi: 10.1073/pnas.1710186114.
  • de Barros Damgaard, P. et al. (2018). '137 ancient human genomes from across the Eurasian steppes.', Nature, 557(7705), 369–374. doi: 10.1038/s41586-018-0094-2

More Repositories

1

rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
Nextflow
863
star
2

sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
Nextflow
388
star
3

modules

Repository to host tool-specific module files for the Nextflow DSL2 community!
Nextflow
276
star
4

tools

Python package with helper tools for the nf-core community.
Python
232
star
5

mag

Assembly and binning of metagenomes
Nextflow
210
star
6

scrnaseq

A single-cell RNAseq pipeline for 10X genomics data
Nextflow
208
star
7

chipseq

ChIP-seq peak-calling, QC and differential analysis pipeline.
Nextflow
186
star
8

atacseq

ATAC-seq peak-calling and QC analysis pipeline
Nextflow
179
star
9

ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
Nextflow
179
star
10

nanoseq

Nanopore demultiplexing, QC and alignment pipeline
Nextflow
168
star
11

fetchngs

Pipeline to fetch metadata and raw FastQ files from public databases
Nextflow
144
star
12

rnafusion

RNA-seq analysis pipeline for detection of gene-fusions
Nextflow
138
star
13

methylseq

Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel
Nextflow
137
star
14

taxprofiler

Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data
Nextflow
118
star
15

viralrecon

Assembly and intrahost/low-frequency variant calling for viral samples
Nextflow
117
star
16

test-datasets

Test data to be used for automated testing with the nf-core pipelines
100
star
17

configs

Config files used to define parameters specific to compute environments at different Institutions
Nextflow
85
star
18

raredisease

Call and score variants from WGS/WES of rare disease patients.
Nextflow
85
star
19

hic

Analysis of Chromosome Conformation Capture data (Hi-C)
Nextflow
85
star
20

cutandrun

Analysis pipeline for CUT&RUN and CUT&TAG experiments that includes QC, support for spike-ins, IgG controls, peak calling and downstream analysis.
Nextflow
79
star
21

smrnaseq

A small-RNA sequencing analysis pipeline
Nextflow
72
star
22

funcscan

(Meta-)genome screening for functional and natural product gene sequences
Nextflow
69
star
23

website

Code and files for the main nf-core website.
Astro
65
star
24

pangenome

Renders a collection of sequences into a pangenome graph.
Nextflow
63
star
25

hlatyping

Precision HLA typing from next-generation sequencing data
Nextflow
60
star
26

bacass

Simple bacterial assembly and annotation pipeline
Nextflow
60
star
27

differentialabundance

Differential abundance analysis for feature/ observation matrices from platforms such as RNA-seq
Nextflow
56
star
28

airrflow

B-cell and T-cell Adaptive Immune Receptor Repertoire (AIRR) sequencing analysis pipeline using the Immcantation framework
Nextflow
52
star
29

bactmap

A mapping-based pipeline for creating a phylogeny from bacterial whole genome sequences
Nextflow
51
star
30

proteinfold

Protein 3D structure prediction pipeline
Nextflow
49
star
31

spatialvi

Pipeline for processing spatially-resolved gene counts with spatial coordinates and image data. Designed for 10x Genomics Visium transcriptomics.
Nextflow
48
star
32

circrna

circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data
Nextflow
44
star
33

rnasplice

rnasplice is a bioinformatics pipeline for RNA-seq alternative splicing analysis
Nextflow
44
star
34

demultiplex

Demultiplexing pipeline for sequencing data
Nextflow
43
star
35

epitopeprediction

A bioinformatics best-practice analysis pipeline for epitope prediction and annotation
Nextflow
41
star
36

oncoanalyser

A comprehensive cancer DNA/RNA analysis and reporting pipeline
Nextflow
40
star
37

rnavar

gatk4 RNA variant calling pipeline
Nextflow
33
star
38

proteomicslfq

Proteomics label-free quantification (LFQ) analysis pipeline
Nextflow
33
star
39

mhcquant

Identify and quantify MHC eluted peptides from mass spectrometry raw data
Nextflow
32
star
40

circdna

Pipeline for the identification of extrachromosomal circular DNA (ecDNA) from Circle-seq, WGS, and ATAC-seq data that were generated from cancer and other eukaryotic cells.
Python
28
star
41

isoseq

Genome annotation with PacBio Iso-Seq. Takes raw subreads as input, generate Full Length Non Chemiric (FLNC) sequences and produce a bed annotation.
Nextflow
27
star
42

readsimulator

A pipeline to simulate sequencing reads, such as Amplicon, Target Capture, Metagenome, and Whole genome data.
Nextflow
27
star
43

scdownstream

A single cell transcriptomics pipeline for QC, integration and making the data presentable
Nextflow
25
star
44

crisprseq

A pipeline for the analysis of CRISPR edited data. It allows the evaluation of the quality of gene editing experiments using targeted next generation sequencing (NGS) data (`targeted`) as well as the discovery of important genes from knock-out or activation CRISPR-Cas9 screens using CRISPR pooled DNA (`screening`).
Nextflow
25
star
45

imcyto

Image Mass Cytometry analysis pipeline
Python
24
star
46

metatdenovo

Assembly and annotation of metatranscriptomic or metagenomic data for prokaryotic, eukaryotic and viruses.
Nextflow
22
star
47

bamtofastq

Converts bam or cram files to fastq format and does quality control.
Nextflow
21
star
48

gwas

UNDER CONSTRUCTION: A pipeline for Genome Wide Association Studies
Nextflow
21
star
49

hgtseq

A pipeline to investigate horizontal gene transfer from NGS data
Nextflow
21
star
50

fastquorum

Pipeline to produce consensus reads using unique molecular indexes/barcodes (UMIs)
Nextflow
20
star
51

clipseq

CLIP sequencing analysis pipeline for QC, pre-mapping, genome mapping, UMI deduplication, and multiple peak-calling options.
Nextflow
19
star
52

cookiecutter

DEPRECIATED! Please use nf-core/tools instead
Nextflow
19
star
53

kmermaid

k-mer similarity analysis pipeline
Nextflow
19
star
54

dualrnaseq

Analysis of Dual RNA-seq data - an experimental method for interrogating host-pathogen interactions through simultaneous RNA-seq.
Nextflow
18
star
55

nascent

Nascent Transcription Processing Pipeline
Nextflow
18
star
56

genomeannotator

Pipeline for the identification of (coding) gene structures in draft genomes.
Nextflow
18
star
57

viralintegration

Analysis pipeline for the identification of viral integration events in genomes using a chimeric read approach.
Python
16
star
58

exoseq

Please consider using/contributing to https://github.com/nf-core/sarek
Nextflow
16
star
59

phaseimpute

Nextflow
16
star
60

metaboigniter

Pre-processing of mass spectrometry-based metabolomics data with quantification and identification based on MS1 and MS2 data.
Nextflow
16
star
61

genomeassembler

Nextflow
15
star
62

vipr

Assembly and intrahost / low-frequency variant calling for viral samples
Nextflow
14
star
63

vscode-extensionpack

A VSCode extension pack for nf-core developers.
14
star
64

multiplesequencealign

A pipeline to run and systematically evaluate Multiple Sequence Alignment (MSA) methods.
Nextflow
13
star
65

nanostring

An analysis pipeline for Nanostring nCounter expression data.
Nextflow
13
star
66

scnanoseq

Single-cell/nuclei pipeline for data derived from Oxford Nanopore and 10X Genomics
Nextflow
13
star
67

detaxizer

A pipeline to identify (and remove) certain sequences from raw genomic data. Default taxa to identify (and remove) are Homo and Homo sapiens. Removal is optional.
Nextflow
13
star
68

diaproteomics

Automated quantitative analysis of DIA proteomics mass spectrometry measurements.
Nextflow
13
star
69

setup-nextflow

A GitHub action to install Nextflow
TypeScript
12
star
70

cageseq

CAGE-sequencing analysis pipeline with trimming, alignment and counting of CAGE tags.
Nextflow
11
star
71

pathogensurveillance

Surveillance of pathogens using population genomics and sequencing
HTML
11
star
72

phageannotator

Pipeline for identifying, annotation, and quantifying phage sequences in (meta)-genomic sequences.
Nextflow
10
star
73

mnaseseq

MNase-seq analysis pipeline using BWA and DANPOS2.
Nextflow
10
star
74

coproid

Coprolite host Identification pipeline
Nextflow
9
star
75

molkart

A pipeline for processing Molecular Cartography data from Resolve Bioscience (combinatorial FISH)
Nextflow
9
star
76

variantbenchmarking

A nextflow variant benchmarking pipeline - premature
Nextflow
9
star
77

tfactivity

Bioinformatics pipeline that makes use of expression and open chromatin data to identify differentially active transcription factors across conditions.
Nextflow
9
star
78

variantcatalogue

Pipeline to generate variant catalogues, a list of variants and their frequencies in a population, from whole genome sequences.
Nextflow
8
star
79

metapep

From metagenomes to epitopes and beyond
Nextflow
8
star
80

pixelator

Pipeline to generate Molecular Pixelation data with Pixelator (Pixelgen Technologies AB)
Nextflow
8
star
81

datasync

nf-core/datasync is a system operation pipeline that provides several workflows for handling system operation / automation tasks
Nextflow
7
star
82

phyloplace

nf-core/phyloplace is a bioinformatics best-practice analysis pipeline that performs phylogenetic placement with EPA-NG.
Groovy
7
star
83

createtaxdb

Parallelised and automated construction of metagenomic classifier databases of different tools
Nextflow
7
star
84

mcmicro

An end-to-end processing pipeline that transforms multi-channel whole-slide images into single-cell data.
Nextflow
7
star
85

createpanelrefs

Generate Panel of Normals, models or other similar references from lots of samples
Nextflow
6
star
86

prettier-plugin-nextflow

JavaScript
6
star
87

nf-core.github.io

Now replaced with a new website >>>
Ruby
6
star
88

hicar

Pipeline for HiCAR data, a robust and sensitive multi-omic co-assay for simultaneous measurement of transcriptome, chromatin accessibility and cis-regulatory chromatin contacts.
Nextflow
6
star
89

radseq

Variant-calling pipeline for Restriction site-associated DNA sequencing (RADseq).
Nextflow
6
star
90

reportho

nf-core pipeline for comparative analysis of ortholog predictions
Nextflow
6
star
91

riboseq

Pipeline for the analysis of ribosome profiling, or Ribo-seq (also named ribosome footprinting) data.
Nextflow
6
star
92

omicsgenetraitassociation

A nextflow pipeline which integrates multiple omic data streams and performs coordinated analysis
Nextflow
5
star
93

gh-actions-lint

GitHub Action to run nf-core code linting on a Nextflow pipeline
Shell
5
star
94

crisprvar

[WIP] Evaluate outcomes from genome editing experiments
Nextflow
5
star
95

awsmegatests

CloudFormation templates to setup the aws megatests necessary cloud infrastructure
Shell
5
star
96

marsseq

MARS-seq v2 pre-processing pipeline with velocity
Nextflow
5
star
97

tbanalyzer

An nf-core (meta) pipeline for analysis of different members of Mycobacterium tuberculosis complex.
Nextflow
5
star
98

rnadnavar

Pipeline for RNA and DNA integrated analysis for somatic mutation detection
Nextflow
5
star
99

meerpipe

nf-core/meerpipe is a astronomy pipeline that processes MeerKAT pulsar data to produce images and data products for pulsar timing analysis
Nextflow
5
star
100

callingcards

A pipeline for processing calling cards data
Nextflow
4
star