• Stars
    star
    351
  • Rank 120,160 (Top 3 %)
  • Language
    C++
  • License
    GNU General Publi...
  • Created almost 8 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Strelka2 germline and somatic small variant caller

Strelka2 Small Variant Caller

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. The germline caller employs an efficient tiered haplotype model to improve accuracy and provide read-backed phasing, adaptively selecting between assembly and a faster alignment-based haplotyping approach at each variant locus. The germline caller also analyzes input sequencing data using a mixture-model indel error estimation method to improve robustness to indel noise. The somatic calling model improves on the original Strelka method for liquid and late-stage tumor analysis by accounting for possible tumor cell contamination in the normal sample. A final empirical variant re-scoring step using random forest models trained on various call quality features has been added to both callers to further improve precision.

Compared with submissions to the recent PrecisionFDA Consistency and Truth challenges, the average indel F-score for Strelka2 running in its default configuration is 3.1% and 0.08% higher, respectively, than the best challenge submissions. Runtime on a 28-core server is ~40 minutes for 40x WGS germline analysis and ~3 hours for a 110x/40x WGS tumor-normal somatic analysis. More details on Strelka2 methods and benchmarking for both germline and somatic calling are described in:

Kim, S., Scheffler, K. et al. (2018) Strelka2: fast and accurate calling of germline and somatic variants. Nature Methods, 15, 591-594. doi:10.1038/s41592-018-0051-x

...and the corresponding open-access pre-print

Strelka accepts input read mappings from BAM or CRAM files, and optionally candidate and/or forced-call alleles from VCF. It reports all small variant predictions in VCF 4.1 format. Germline variant reporting uses the gVCF conventions to represent both variant and reference call confidence. For best somatic indel performance, Strelka is designed to be run with the Manta structural variant and indel caller, which provides additional indel candidates up to a given maximum indel size (49 by default). By design, Manta and Strelka run together with default settings provide complete coverage over all indel sizes (in additional to SVs and SNVs). See the user guide for a full description of capabilities and limitations.

Getting Started

To get started installing and using Strelka, please consult the quick start guide.

Data Analysis and Interpretation

After completing installation and reviewing the quick start guide, see the Strelka user guide for full instructions on how to run Strelka, interpret results and estimate hardware requirements/compute cost, in addition to a high-level methods overview.

License

Strelka source code is provided under the GPLv3 license. Strelka includes several third party packages provided under other open source licenses, please see COPYRIGHT.txt for additional details.

Strelka Code Development

For strelka code development and debugging details, see the Strelka developer guide. This includes details on Strelka's development protocols, special build instructions, recommended workflows for investigating calls, and internal documentation details.

More Repositories

1

hap.py

Haplotype VCF comparison tools
C++
401
star
2

manta

Structural variant and indel caller for mapped sequencing data
C++
391
star
3

SpliceAI

A deep learning-based tool to identify splice variants
Python
388
star
4

ExpansionHunter

A tool for estimating repeat sizes
C++
175
star
5

Nirvana

The nimble & robust variant annotator
C#
167
star
6

DRAGMAP

DRAGEN open-source mapper
C++
153
star
7

paragraph

Graph realignment tools for structural variants
C++
147
star
8

pyflow

A lightweight parallel task engine
Python
143
star
9

canvas

Canvas - Copy number variant (CNV) calling from DNA sequencing data
C#
121
star
10

PrimateAI

deep residual neural network for classifying the pathogenicity of missense mutations.
Python
110
star
11

Pisces

Somatic and germline variant caller for amplicon data. Recommended caller for tumor-only workflows.
C#
93
star
12

PlatinumGenomes

The Platinum Genomes Truthset
84
star
13

ExpansionHunterDenovo

A suite of tools for detecting expansions of short tandem repeats
C++
77
star
14

interop

C++ Library to parse Illumina InterOp files
C++
75
star
15

REViewer

A tool for visualizing alignments of reads in regions containing tandem repeats
C++
75
star
16

akt

Ancestry and Kinship Tools
C++
68
star
17

PrimateAI-3D

Python
55
star
18

Polaris

Data and information about the Polaris study
52
star
19

SMNCopyNumberCaller

A copy number caller for SMN1 and SMN2 to enable SMA diagnosis and carrier screening with WGS
Python
49
star
20

Cyrius

A tool to genotype CYP2D6 with WGS data
Python
46
star
21

BeadArrayFiles

Python library to parse file formats related to Illumina bead arrays
Python
45
star
22

GTCtoVCF

Script to convert GTC/BPM files to VCF
Python
41
star
23

GraphAlignmentViewer

Python
33
star
24

gvcfgenotyper

A utility for merging and genotyping Illumina-style GVCFs.
C++
31
star
25

witty.er

What is true, thank you, ernestly. A large variant benchmarking tool analogous to hap.py for small variants.
C#
27
star
26

isaac2

Aligner for sequencing data
C++
21
star
27

Gauchian

A variant caller for the GBA gene using WGS data
Python
20
star
28

BaseSpace_Clarity_LIMS

API libraries, application examples, and custom tools for BaseSpace Clarity LIMS
Python
18
star
29

Isaac3

Aligner for sequencing data
C++
18
star
30

RepeatCatalogs

17
star
31

Isaac4

Isaac aligner version 4
C++
16
star
32

happyR

R tools to interact with hap.py output
R
15
star
33

agg

gvcf aggregation tool
12
star
34

tHapMix

Haplotype-based somatic genome simulator
Python
10
star
35

happyCompare

Reporting toolbox for happy output
R
7
star
36

zippy

The ZIPPY pipeline prototyping system
Python
5
star
37

MarViN

C++
5
star
38

ica-sdk-python

Python
4
star
39

NirvanaDocumentation

MDX
4
star
40

novaseq-lims-api

Documentation and tools for users of the NovaSeq LIMS API
C#
3
star
41

NeoMutalyzer

Inspired by Mutalyzer and frustrated by RefSeq, we created this transcript annotation validator
C#
3
star
42

dragen-azure-quickstart

HTML
3
star
43

Pelops

Python
3
star
44

licenses

2
star
45

BlockCompression

Block compression library used by Nirvana
C++
2
star
46

dragen-aws-batch-quickstart

HTML
1
star