• This repository has been archived on 15/May/2020
  • Stars
    star
    121
  • Rank 292,258 (Top 6 %)
  • Language
    C#
  • License
    Other
  • Created about 9 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Canvas - Copy number variant (CNV) calling from DNA sequencing data

Canvas Copy Number Variant Caller

Canvas is a tool for calling copy number variants (CNVs) from human DNA sequencing data. It can work either with germline data, or paired tumor/normal samples. Its primary input is aligned reads (in .bam format), and its primary output is a report (in a .vcf file) giving the copy number status of the genome.

Canvas is used as the copy number caller in the Isaac Whole Genome Sequencing workflow in BaseSpace (https://basespace.illumina.com), and in HiSeq Analysis Software (HAS) (http://support.illumina.com/sequencing/sequencing_software/hiseq-analysis-software.html).

Canvas is written in C# and runs either under a recent version of Mono (e.g. 3.10.0), .NET 4.5.1 and .NET Core 1.1.

For more information about Canvas and the algorithms it uses see the software design description.

Note: Germline-WGS mode has been deprecated. Use SmallPedigree-WGS even for a single sample analysis.

Publications

Canvas was first described in the publication Canvas: versatile and scalable detection of copy number variants in the journal OUP Bioinformatics:

Publication: http://dx.doi.org/10.1093/bioinformatics/btw163 Preprint: https://doi.org/10.1101/036194

The Canvas Small Pedigree Workflow was published in the journal OUP Bioinformatics: Canvas SPW: calling de novo copy number variants in pedigrees

Publication: https://doi.org/10.1093/bioinformatics/btx618 Preprint: https://doi.org/10.1101/121939

License

Copyright (c) 2013-2017 Illumina, Inc. All rights reserved.

This software is provided under the terms and conditions of the GNU GENERAL PUBLIC LICENSE Version 3

You should have received a copy of the GNU GENERAL PUBLIC LICENSE Version 3 along with this program. If not, see https://github.com/illumina/licenses/.

Canvas includes several third party packages provided under other open source licenses, please see COPYRIGHT.txt for additional details.

Build instructions

Binaries:

It is recommended to start from one of the binary distributions on the Canvas releases page if a suitable version is available. Executables can be run either under .NET Core or mono. .NET Core environment is recommended as it provides higher speed and lower RAM usage.

Source code organization:

Canvas consists of several projects all built from one solution file (Src/Canvas/Canvas/Canvas.sln). The main Canvas project is a command line tool for launching the various workflows. Additionally, there are projects for each Canvas module - e.g. CanvasBin counts coverage for each bin, CanvasSomaticCaller makes CNV calls for tumor/normal data - as well as some shared libraries with utility functions (math functions, file I/O for various formats, etc.)

Linux

Canvas was tested under Linux using .Net core 2.1

See #99

Windows

Canvas is known to run on Windows 7 or Windows 8 systems using .NET core 2.0

Run instructions

Canvas can be run on a variety of sequencing inputs. See the help information from the Canvas.exe command line executable for the supported workflows and required input files:

$Canvas.exe --help (or Canvas.dll --help under .NET Core) Canvas Canvas 1.25.0.49+master Copyright © Illumina 2017-03-23 Usage: Canvas.exe [MODE] [OPTIONS]+
Available modes:

  • Germline-WGS - CNV calling of a germline sample from whole genome sequencing data
  • Somatic-Enrichment - CNV calling of a somatic sample from targeted sequencing data
  • Somatic-WGS - CNV calling of a somatic sample from whole genome sequencing data
  • Tumor-normal-enrichment - CNV calling of a tumor/normal pair from targeted sequencing data
  • SmallPedigree-WGS - CNV calling of a small pedigree from whole genome sequencing data

Options:
-h, --help show this message and exit
-v, --version print version and exit

Reference genome

The required input files for Human reference genome builds GRCh37, hg19, and GRCh38 can be downloaded from S3 http://canvas-cnv-public.s3.amazonaws.com/. You can use wget to download any of the files listed there. For example, to download the hg19 GenomeSize.xml file run:

wget http://canvas-cnv-public.s3.amazonaws.com/hg19/WholeGenomeFasta/GenomeSize.xml

When using a custom reference genome the equivalent files need to be created. Use the FlagUniqueKmers project to generate the annotated fasta file (kmer.fa) for a custom reference genome.

Installation

The easiest way to install Canvas is to use the latest pre-copiled binaries from releases:https://github.com/Illumina/canvas/releases (just download and uncopress).

.Net Core

See #99

DEMO (SmallPedigree-WGS workflow)

Here we provide an example on how to run Canvas SPW (Small Pedigree Workflow) on a simulated trio (bam files of 60x coverage) and then using EvaluateCNV (under Tools) to estimate performance metrics. This demo will work with the Canvas release v1.25 and above. Amazon AWS m4.4xlarge instance was used to create this demo. It is recommended that the amount of RAM per core is 4G. More information on input options and output formats can be found on the canvas wiki and software design document.

Data and binaries

  1. Install .Net Core and download Canvas binary (CanvasDIR)
  2. Add BaseSpace project https://basespace.illumina.com/s/f1ganFhSPsBo with simulation bams to your account (you might need to register first).
  3. Install BaseMount and load the canvas-spw project
sudo bash -c "$(curl -L https://basemount.basespace.illumina.com/install/)"
mkdir /tmp/BaseSpace
basemount --scopes="Create Global, Browse Global, Create Projects, Read Global" /tmp/BaseSpace
cd /tmp/BaseSpace
  1. This should show the following folders under canvas-spw/AppResults
- bams = simulated trio bams of 60x coverage aligned with Isaac
- canvasdata = hg19 genome reference files for running Canvas (can also be downloaded from S3 http://canvas-cnv-public.s3.amazonaws.com/) 
- snvvcf = SNV vcf files to accompany bams (joint germline CNV calls using Strelka2 https://github.com/Illumina/strelka)
- simdata = bed files with simulated inherited and de novo variants

Running Canvas

  1. In this example we are accessing files through basemount (Canvas should be run as user rather than sudo root). Files could also be copied to a local drive and run from there.
  2. Issue the following command (output directory - /tmp/gHapMixDemo)
dotnet /CanvasDIR/Canvas.dll SmallPedigree-WGS --bam=/basespace/Projects/canvas/AppResults/bams/Files/father.bam --bam=/basespace/Projects/canvas/AppResults/bams/Files/mother.bam --bam=/basespace/Projects/canvas/AppResults/bams/Files/child1.bam --mother=mother --father=father --proband=child1 -r /basespace/Projects/canvas/AppResults/canvasdata/Files/kmer.fa -g /basespace/Projects/canvas/AppResults/canvasdata/Files/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta --sample-b-allele-vcf /basespace/Projects/canvas/AppResults/snvvcf/Files/Pedigree.vcf.gz -f /basespace/Projects/canvas/AppResults/canvasdata/Files/filter13.bed -o /tmp/gHapMixDemo --ploidy-vcf="/basespace/Projects/canvas/AppResults/snvvcf/Files/MultiSamplePloidy.vcf"
  1. The runtime will depend on the number of available CPUs and whereas bam files were copied to a local drive. The run on a bare Amazon m4.4xlarge instance (16 CPUs and 64G RAM) with network I/O took 03h34m. Results are available as VCF files: either a multi-sample VCF under gHapMixDemo or single-sample equivalents under gHapMixDemo/TempCNV folders. Here we will use EvaluateCNV tool supplied with Canvas distribution to calculate various performance metrics for inherited and de novo CNVs.

Running EvaluateCNV

  1. First, we can run EvaluateCNV to produce recall and precision metrics for inherited Canvas CNV calls using truth variant files.
zcat /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf (remove REF calls)
/CanvasDIR/Tools/EvaluateCNV/EvaluateCNV.dll /ihart/BaseSpace/Projects/CanvasSPW/AppResults/simdata/Files/child1_truth.bed /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf /CanvasDIR/Tools/EvaluateCNV/generic.cnaqc.excluded_regions.bed inheritedCNVs.txt 

This gives us for PASS variants: 
Recall  97.46
Precision       93.85
  1. Next, we run a similar command but using the de novo variant truth file and a -q 20 argument to extract variants with DQ20.
/CanvasDIR/Tools/EvaluateCNV/EvaluateCNV.dll /ihart/BaseSpace/Projects/CanvasSPW/AppResults/simdata/Files/child1_truth.bed /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf.gz /CanvasDIR/Tools/EvaluateCNV/generic.cnaqc.excluded_regions.bed -q 20 denovoCNVs.txt  

This gives us for PASS variants:
Recall  97.98
Precision       96.51

DEMO (Tumor-normal-enrichment workflow)

This demo will run Canvas on exome data for HCC2218 breast carcinoma cell lines and compare results with previously curated ground truth set. The demo presumes mono runtime and that binary files were installed to WORKDIR/canvas/canvas-1.3.4_x64/.

Data

To download demo data, add BaseSpace project https://basespace.illumina.com/s/DcPnOqHmtPNB to your account (you might need to register first). The actual files can then be downloaded from the following subdirectories: https://basespace.illumina.com/analyses/30697313/files/28317292?projectId=26760736 https://basespace.illumina.com/analyses/30697313/files/28296383?projectId=26760736 In addition to manual download, a command line basemount (https://basemount.basespace.illumina.com ) can be used for file transfer. To install basemount run

sudo bash -c "$(curl -L https://basemount.basespace.illumina.com/install/)"
mkdir /tmp/BaseSpace
basemount  /tmp/BaseSpace
cd /tmp/BaseSpace

BaseSpace files are now available under your current directory. To run demo, transfer the following files into WORKDIR/testing/files/

“Projects/HiSeq 2500 RR: NRC Exome (HCC1187 & HCC2218)/AppResults/HCC1187BL/Files/HCC1187BL_S1.vcf" (germline vcf)
"Projects/HiSeq 2500 RR: NRC Exome (HCC1187 & HCC2218)/AppResults/HCC2218C/Files/HCC2218C_S1.bam" (somatic bam)
"Projects/HiSeq 2500 RR: NRC Exome (HCC1187 & HCC2218)/AppResults/HCC2218C/Files/HCC2218C_S1.bam.bai"
"Projects/HiSeq 2500 RR: NRC Exome (HCC1187 & HCC2218)/AppResults/HCC2218BL/Files/HCC2218BL_S1.bam" (normal bam)
"Projects/HiSeq 2500 RR: NRC Exome (HCC1187 & HCC2218)/AppResults/HCC2218BL/Files/HCC2218BL_S1.bam.bai"
“Projects/HiSeq 2500 RR:  NRC\ Exome\ (HCC1187 & HCC2218)/AppSessions/Isaac Enrichment 11|24|2015 9:23:23/AppResults.28295376.HCC1187BL/Files/Additional Files/NexteraRapidCapture_Exome_TargetedRegions_v1.2Used.txt” (targeted regions)

Genome reference files

Download hg19 genome reference files from S3 (http://canvas-cnv-public.s3.amazonaws.com/) into WORKDIR/testing/hg19/.

Running demo

With all files copied and installed, we are now ready to run Canvas. This demo will use Tumor-normal-enrichment workflow that runs on Nextera exome data. Execute the command below.

dotnet Canvas.exe Tumor-normal-enrichment -b $WORKDIR/testing/files/HCC2218C_S1.bam --normal-bam=$WORKDIR/testing/files/HCC2218BL_S1.bam --reference=$WORKDIR/testing/hg19/kmer.fa --manifest=$WORKDIR/testing/files/NexteraRapidCapture_Exome_TargetedRegions_v1.2Used.txt -g $WORKDIR/testing/hg19/ -n HCC2218C -f $WORKDIR/testing/hg19/filter13.bed -o $WORKDIR/testing/HCC2218_v2 --b-allele-vcf=$WORKDIR/testing/files/HCC2218BL_S1.vcf --custom-parameters=CanvasBin,-m=TruncatedDynamicRange

CNV.vcf.gz files will be saved to HCC2218_v2 output directory. Depending on the number of available CPUs, the demo will take from few minutes to under an hour to complete.

Inspecting results

Now we can test Canvas performance by using a set of previously curated HCC2218 copy number calls from whole-genome data (HCC2218Truth.vcf) and a set of repetitive or ambiguous regions (HCC2218.cnaqc.excluded_regions.bed), which are available in the TruthSets directory in S3 http://canvas-cnv-public.s3.amazonaws.com/. The evaluation is accomplished by using EvaluateCNV; the latest binary distribution for the tool can be found in releases:https://github.com/Illumina/canvas/releases.

EvaluateCNV usage info:

EvaluateCNV $TruthSetPath $CNV.vcf $ExcludedRegionsBed $OutputPath  [$RegionOfInterestBed]

In our case, given that truth files location in WORKDIR/tools/EvaluateCNV, the command is:

mono $WORKDIR/tools/EvaluateCNV/EvaluateCNV.exe WORKDIR/TruthSets/HCC2218Truth.vcf $WORKDIR/testing/HCC2218/CNV.vcf.gz 
$WORKDIR/TruthSets/HCC2218.cnaqc.excluded_regions.bed $WORKDIR/testing/HCC2218/EvaluateCNV.txt

This will save evaluation data into $WORKDIR/testing/HCC2218/EvaluateCNV.txt. Inspecting it suggests that Canvas performed quite well in calling somatic CNV variants in HCC2218, below is an extract from the file (results obtained using Canvas 1.3.4 with the command line shown above, other versions and main/custom parameters might alter performance metrics)

Accuracy        92.0255
DirectionAccuracy       93.1368
Recall  88.0894
DirectionRecall 92.0237
Precision       81.3032
DirectionPrecision      84.9345

More Repositories

1

hap.py

Haplotype VCF comparison tools
C++
401
star
2

manta

Structural variant and indel caller for mapped sequencing data
C++
391
star
3

SpliceAI

A deep learning-based tool to identify splice variants
Python
388
star
4

strelka

Strelka2 germline and somatic small variant caller
C++
351
star
5

ExpansionHunter

A tool for estimating repeat sizes
C++
175
star
6

Nirvana

The nimble & robust variant annotator
C#
167
star
7

DRAGMAP

DRAGEN open-source mapper
C++
153
star
8

paragraph

Graph realignment tools for structural variants
C++
147
star
9

pyflow

A lightweight parallel task engine
Python
143
star
10

PrimateAI

deep residual neural network for classifying the pathogenicity of missense mutations.
Python
110
star
11

Pisces

Somatic and germline variant caller for amplicon data. Recommended caller for tumor-only workflows.
C#
93
star
12

PlatinumGenomes

The Platinum Genomes Truthset
84
star
13

ExpansionHunterDenovo

A suite of tools for detecting expansions of short tandem repeats
C++
77
star
14

interop

C++ Library to parse Illumina InterOp files
C++
75
star
15

REViewer

A tool for visualizing alignments of reads in regions containing tandem repeats
C++
75
star
16

akt

Ancestry and Kinship Tools
C++
68
star
17

PrimateAI-3D

Python
55
star
18

Polaris

Data and information about the Polaris study
52
star
19

SMNCopyNumberCaller

A copy number caller for SMN1 and SMN2 to enable SMA diagnosis and carrier screening with WGS
Python
49
star
20

Cyrius

A tool to genotype CYP2D6 with WGS data
Python
46
star
21

BeadArrayFiles

Python library to parse file formats related to Illumina bead arrays
Python
45
star
22

GTCtoVCF

Script to convert GTC/BPM files to VCF
Python
41
star
23

GraphAlignmentViewer

Python
33
star
24

gvcfgenotyper

A utility for merging and genotyping Illumina-style GVCFs.
C++
31
star
25

witty.er

What is true, thank you, ernestly. A large variant benchmarking tool analogous to hap.py for small variants.
C#
27
star
26

isaac2

Aligner for sequencing data
C++
21
star
27

Gauchian

A variant caller for the GBA gene using WGS data
Python
20
star
28

BaseSpace_Clarity_LIMS

API libraries, application examples, and custom tools for BaseSpace Clarity LIMS
Python
18
star
29

Isaac3

Aligner for sequencing data
C++
18
star
30

RepeatCatalogs

17
star
31

Isaac4

Isaac aligner version 4
C++
16
star
32

happyR

R tools to interact with hap.py output
R
15
star
33

agg

gvcf aggregation tool
12
star
34

tHapMix

Haplotype-based somatic genome simulator
Python
10
star
35

happyCompare

Reporting toolbox for happy output
R
7
star
36

zippy

The ZIPPY pipeline prototyping system
Python
5
star
37

MarViN

C++
5
star
38

ica-sdk-python

Python
4
star
39

NirvanaDocumentation

MDX
4
star
40

novaseq-lims-api

Documentation and tools for users of the NovaSeq LIMS API
C#
3
star
41

NeoMutalyzer

Inspired by Mutalyzer and frustrated by RefSeq, we created this transcript annotation validator
C#
3
star
42

dragen-azure-quickstart

HTML
3
star
43

Pelops

Python
3
star
44

licenses

2
star
45

BlockCompression

Block compression library used by Nirvana
C++
2
star
46

dragen-aws-batch-quickstart

HTML
1
star