• Stars
    star
    371
  • Rank 115,103 (Top 3 %)
  • Language
    Perl
  • License
    Other
  • Created almost 11 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms

vcfmaf

To convert a VCF into a MAF, each variant must be mapped to only one of all possible gene transcripts/isoforms that it might affect. But even within a single isoform, a Missense_Mutation close enough to a Splice_Site, can be labeled as either in MAF format, but not as both. This selection of a single effect per variant, is often subjective. And that's what this project attempts to standardize. The vcf2maf and maf2maf scripts leave most of that responsibility to Ensembl's VEP, but allows you to override their "canonical" isoforms, or use a custom ExAC VCF for annotation. Though the most useful feature is the extensive support in parsing a wide range of crappy MAF-like or VCF-like formats we've seen out in the wild.

Build Status

Quick start

Find the latest stable release, download it, and view the detailed usage manuals for vcf2maf and maf2maf:

export VCF2MAF_URL=`curl -sL https://api.github.com/repos/mskcc/vcf2maf/releases | grep -m1 tarball_url | cut -d\" -f4`
curl -L -o mskcc-vcf2maf.tar.gz $VCF2MAF_URL; tar -zxf mskcc-vcf2maf.tar.gz; cd mskcc-vcf2maf-*
perl vcf2maf.pl --man
perl maf2maf.pl --man

If you don't have VEP installed, then follow this gist. Of the many annotators out there, VEP is preferred for its large team of active coders, and its CLIA-compliant HGVS formats. After installing VEP, test out vcf2maf like this:

perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf

To fill columns 16 and 17 of the output MAF with tumor/normal sample IDs, and to parse out genotypes and allele counts from matched genotype columns in the VCF, use options --tumor-id and --normal-id. Skip option --normal-id if you didn't have a matched normal:

perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf --tumor-id WD1309 --normal-id NB1308

VCFs from variant callers like VarScan use hardcoded sample IDs TUMOR/NORMAL to name genotype columns. To have vcf2maf correctly locate the columns to parse genotypes, while still printing proper sample IDs in the output MAF:

perl vcf2maf.pl --input-vcf tests/test_varscan.vcf --output-maf tests/test_varscan.vep.maf --tumor-id WD1309 --normal-id NB1308 --vcf-tumor-id TUMOR --vcf-normal-id NORMAL

If VEP is installed under /opt/vep and the VEP cache is under /srv/vep, there are options available to tell vcf2maf where to find them:

perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf --vep-path /opt/vep --vep-data /srv/vep

If you want to skip running VEP and need a minimalist MAF-like file listing data from the input VCF only, then use the --inhibit-vep option. If your input VCF contains VEP annotation, then vcf2maf will try to extract it. But be warned that the accuracy of your resulting MAF depends on how VEP was operated upstream. In standard operation, vcf2maf runs VEP with very specific parameters to make sure everyone produces comparable MAFs. So, it is strongly recommended to avoid --inhibit-vep unless you know what you're doing.

maf2maf

If you have a MAF or a MAF-like file that you want to reannotate, then use maf2maf, which simply runs maf2vcf followed by vcf2maf:

perl maf2maf.pl --input-maf tests/test.maf --output-maf tests/test.vep.maf

After tests on variant lists from many sources, maf2vcf and maf2maf are quite good at dealing with formatting errors or "MAF-like" files. It even supports VCF-style alleles, as long as Start_Position == POS. But it's OK if the input format is imperfect. Any variants with a reference allele mismatch are kept aside in a separate file for debugging. The bare minimum columns that maf2maf expects as input are:

Chromosome	Start_Position	Reference_Allele	Tumor_Seq_Allele2	Tumor_Sample_Barcode
1	3599659	C	T	TCGA-A1-A0SF-01
1	6676836	A	AGC	TCGA-A1-A0SF-01
1	7886690	G	A	TCGA-A1-A0SI-01

See data/minimalist_test_maf.tsv for a sampler. Addition of Tumor_Seq_Allele1 will be used to determine zygosity. Otherwise, it will try to determine zygosity from variant allele fractions, assuming that arguments --tum-vad-col and --tum-depth-col are set correctly to the names of columns containing those read counts. Specifying the Matched_Norm_Sample_Barcode with its respective columns containing read-counts, is also strongly recommended. Columns containing normal allele read counts can be specified using argument --nrm-vad-col and --nrm-depth-col.

License

Apache-2.0 | Apache License, Version 2.0 | https://www.apache.org/licenses/LICENSE-2.0

Citation

Cyriac Kandoth. mskcc/vcf2maf: vcf2maf v1.6.19. (2020). doi:10.5281/zenodo.593251

More Repositories

1

facets

Algorithm to implement Fraction and Copy number Estimate from Tumor/normal Sequencing.
R
140
star
2

RNAseqDB

Perl
139
star
3

mutation-signatures

Create mutation signatures from MAF's, and decompose them into Stratton signatures
R
60
star
4

facets-suite

Utility functions for FACETS
R
34
star
5

ngs-filters

Filters for false-positive mutation calls in NGS
R
30
star
6

lohhla

Fork from https://bitbucket.org/mcgranahanlab/lohhla/src, modified for MSKCC needs
R
28
star
7

mimsi

Microsatellite Instability Classification using Multiple Instance Learning
Python
19
star
8

roslin-variant

Roslin is a reproducible and reusable workflow for Cancer Genomic Sequencing Analysis
Python
15
star
9

cbsp-hackathon

Computational Biology Summer Program Hackathon
Jupyter Notebook
13
star
10

tempo

CCS research pipeline to process WES and WGS TN pairs
Groovy
12
star
11

Innovation-IMPACT-Pipeline

Framework to process and call somatic variation from NGS dataset generated using MSK-IMPACT assay
Perl
11
star
12

ACCESS-Pipeline

cfDNA Sequencing Pipeline with UMI
Python
10
star
13

facets2n

Algorithm to implement Fraction and Allelic Copy number Estimate from Tumor/normal Sequencing using unmatched normal sample(s) for log ratio calculations
R
10
star
14

Marianas

Software for processing molecular barcoding (UMI)-based NGS data
Java
9
star
15

cmo

Command-line tools for data analysts at the CMO
Python
7
star
16

forte

Functional Observation of RNA Transcriptome Elements/Expression
Nextflow
7
star
17

tempoSig

Fitting mutational catalog to signatures with maximum likelihood
HTML
7
star
18

igo-demux

Demultiplex Illumina sequencer output via DRAGEN, create fastq files and launch pipelines
Python
6
star
19

htstools

C++
5
star
20

redcap-ddp-laravel

A Laravel implementation of the REDCap DDP (Dynamic Data Pull) middleware.
PHP
4
star
21

vcf2tsv

a tool that takes the output files of different callers and creates maf-like, row-based output
Python
4
star
22

hermes

Data collection management and analysis of information
JavaScript
4
star
23

Medidata.RWS.NET

Medidata.RWS.NET is a comprehensive, fluent .NET API library for Medidata RAVE Web Services (RWS). It handles a large portion of the boilerplate C# code you'd normally have to write in order to communicate with RWS, allowing you to get up and running faster.
C#
4
star
24

cwl-commandlinetools

Central location for CWL CommandLineTools
Common Workflow Language
3
star
25

Halo_Melanoma_IL2

Code for Melanoma paper
R
3
star
26

igo-qc

HTML
3
star
27

LimsRest

the restful service used by the IGO LIMS
Java
3
star
28

Medidata.RWS.NET.Standard

Medidata.RWS.NET.Standard is a comprehensive, fluent .NET Standard 2.0 API library for Medidata RAVE Web Services (RWS). It handles a large portion of the boilerplate C# code you'd normally have to write in order to communicate with RWS, allowing you to get up and running faster.
C#
3
star
29

facetsAPI

Python
3
star
30

igo-lims-plugins

Sapio LIMS customizations to workflows for IGO
Java
2
star
31

igo-request-tracker

Tracks state of IGO projects
JavaScript
2
star
32

roslin-core

Core of the Roslin pipeline
Python
2
star
33

beagle

Voyager Backend
Python
2
star
34

smile-server

Java
2
star
35

roslin-qc

Python
2
star
36

redcap-linter

JavaScript
1
star
37

hera

First web app to be hosted on delphi.mskcc.org.
Python
1
star
38

igo-genomics

Vue
1
star
39

pointer

Voyager Frontend
JavaScript
1
star
40

helix_filters_01

Workflows for post-pipeline data processing and file generation
Python
1
star
41

dragen_util

Helper DRAGEN scripts
Shell
1
star
42

hera-sample-tracker

Python
1
star
43

igo-sample-qc-backend

Python
1
star
44

pluto-cwl

CWL workflows for helix filter scripts
Python
1
star
45

concordance-workflow

Python
1
star
46

artisan-fhir-server

PHP
1
star
47

roslin-helix

Documentation for Roslin/Helix Pipeline
1
star
48

variant_launch_validation

a legacy project to validate m/p/g files
Python
1
star
49

ngs-stats

NGS Statistics Database with historical Picard Stats, IGO fastq.gz paths and Sequencer Start & Stop Times
HTML
1
star
50

sample-qc-node

JavaScript
1
star
51

vcf_accuracy

vcf accuracy evaluator using VT, BEDTOOLS, PyVCF, and TABIX
Python
1
star
52

ridgeback

Toil API
Python
1
star
53

awsbatch_mock

AWSBatch Mock built for scaling tests
1
star
54

Chronos

Generate uuid's that will sort chronologically
Python
1
star
55

common-domain

Domain objects
Java
1
star
56

dotfiles

Unix dotfiles for users of MSK compute
Shell
1
star
57

smile-commons

Centralized configurations for checkstyle plugin and dependency management.
Java
1
star
58

Waltz

Fast, efficient bam metrics, pileups and genotyping
Java
1
star
59

GDD-Phase2

This is the Second Generation of Genome Derived Diagnosis AI Project
Python
1
star
60

pipeline-kickoff

Java
1
star
61

msisensor

C++
1
star
62

process_fastq

This package will help process, merge and link fastq in user specified directory from manifest file
Python
1
star
63

facets-preview-dev

R
1
star
64

neoantigen-pipeline

Pipeline for computing neoantigen qualities from DNA and RNA-Seq data
Python
1
star
65

cHL-spatial-profiling

Code supporting results of cHL manuscript
R
1
star
66

DeepSig

Single-Base Substitution Mutational Signature Inference for WES and MSK-IMPACT
HTML
1
star
67

nf-fastq-plus

Generate IGO fastqs, bams, stats and fingerprinting
Shell
1
star