• Stars
    star
    102
  • Rank 328,559 (Top 7 %)
  • Language Common Workflow Language
  • License
    MIT License
  • Created almost 8 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Open workflow definitions for genomic analysis from MGI at WUSM.

Build Status

analysis-workflows

Overview

The McDonnell Genome Institute (MGI) and contributing staff, faculty, labs and departments of Washington University School of Medicine (WUSM) share Common Workflow Language (CWL) workflow definitions focused on reusable, reproducible analysis pipelines for genomics data.

Structure

The main structure of this repo is described in the following table:

Path Description
definitions parent directory containing all CWL tool and workflow definitions
definitions/pipelines all workflows which rely on subworkflows and tools to produce final outputs
definitions/subworkflows workflows that combine multiple tools to produce intermediate (used as inputs to other subworkflows) pipeline outputs
definitions/tools CWL that wrap command line interfaces or scripts connecting multiple tools
definitions/types custom CWL data types for inputs to tools and workflows
example_data example input data, input YAML files, and expected output files for testing

Documentation

All documentation of CWL pipelines, subworkflows, and tools as well as additional information regarding test data, continous integration, and configuration can be found on the GitHub wiki: https://github.com/genome/analysis-workflows/wiki

Quick Start

Workflows

Download our repository with git clone https://github.com/genome/analysis-workflows.git

The official CWL user guide covers the basics of reading and writing CWL files, constructing input files, and running workflows.

Workflow Execution Service

These workflow definitions are built for interoperability with any Workflow Execution Service (WES) schema compatible implementation that supports CWL.

Each CWL file is validated using cwltool. Additional workflow definition testing is performed with Cromwell. However, currently there are no automated workflow tests using Cromwell.

Docker

In order to provide a portable environment, each tool in our workflow has a designated Docker container. Download Docker here.

All MGI supported Docker images used in the tool workflow definitions are available on mgibio DockerHub.

Many tools rely on third-party Docker images publicly available from sources such as Docker Hub and BioContainers.

Data

Full reference data is documented and available for download* on the wiki *Coming soon

Example data, packaged together with fully populated yamls corresponding to top level workflows in this repo's definitions/pipelines directory, can be found on our public GCP bucket. To download this package, use our helper docker container: docker run -v <desired_absolute_path>:/staging mgibio/data_downloader:0.1.0 gsutil -m cp -r gs://analysis-workflows-example-data /staging

Note: We are currently migrating and updating our example data. Files within the example_data directory of this repository are no longer fully supported, and some are out of date. Moving forward, all data will be hosted in GCP. The instructions above currently download the full, uncompressed example data set (~800 mb). More granular, compressed downloads are upcoming. Advanced users may explore the bucket structure and download individual files using wget https://storage.googleapis.com/analysis-workflows-example-data/[path_to_file] (omitting path_to_file will download a manifest describing the directory structure).

Contributions

A big thanks to all of the developers, bioinformaticians and scientists who built this resource. For a complete list of software contributions, i.e. commits, to this repository, please see the GitHub Contributors section.

Collaborators

The following WUSM collaborators have provided significant contributions in terms of workflow design, scientific direction, and validation of analysis-workflows output.

Departments, Institutes, and Labs

Partners

DOI

More Repositories

1

bam-readcount

Count bases in BAM/CRAM files
CMake
294
star
2

pindel

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
C++
160
star
3

sciclone

An R package for inferring the subclonal architecture of tumors
R
113
star
4

breakdancer

SV detection from paired end reads mapping
C++
110
star
5

gms

The Genome Modeling System installer
Perl
77
star
6

genome

Core modules used by the GMS
Perl
61
star
7

bfx-workshop

HTML
48
star
8

somatic-sniper

A tool to call somatic single nucleotide variants.
C
40
star
9

docs

22
star
10

joinx

a tool for processing .bed and .vcf files
C++
21
star
11

scrna_mutations

Supplementary data for Petti, et al 2019 scRNA mutation publication
Python
16
star
12

mendelscan

Analyze exome data for Mendelian disorders. Still in alpha-testing.
Java
10
star
13

ptero

Shell
9
star
14

tigra-sv

8
star
15

sciclone-meta

accessory scripts and documentation related to the sciclone R package at genome/sciclone
R
6
star
16

rss2jira

Create JIRA issues when keywords are matched in RSS feeds.
Python
5
star
17

docker-rnaseq

A fat docker image for running RnaSeq
R
4
star
18

UR

Rich Transactional Objects for Perl
Perl
4
star
19

bmm

R package that uses a variational Bayesian approach to fitting a mixture of Beta distributions
R
4
star
20

dindel-tgi

A fork of dindel
C++
3
star
21

vcf-evaluation

scripts and modules to facilitate comparing gold standard VCFs
Perl
3
star
22

docker-star

Docker container for the star aligner
Dockerfile
3
star
23

docker-bam_readcount_helper-cwl

Python
3
star
24

graphite

Graphite Config and Cron Scripts
Perl 6
3
star
25

ptero-lsf

Ptero services to run commands via LSF
Python
2
star
26

cle-chromoseq

Repository for CLE ChromoSeq Assay
Python
2
star
27

pairoscope

simple static plots of read pairing information
C++
2
star
28

docker-dna-alignment

A fat docker image for running alignment
Shell
2
star
29

docker-bcftools

Docker container for bcftools
Dockerfile
2
star
30

aml31Benchmarking

R
2
star
31

cle-myeloseqhd

Python
2
star
32

tgi-workflow

the workflow server used at TGI
Perl
2
star
33

flow-core

JavaScript
2
star
34

cncwl

1
star
35

docker-vep-cwl

Variant of vep image without an ENTRYPOINT
1
star
36

cle-myeloseq

Repo for cle myeloseq/haloplex assay
Perl
1
star
37

docker-gossamer

Docker container for gossamer bioinformatics suite
1
star
38

ptero-workflow

Client-facing API for the PTero system
Python
1
star
39

cle

Repo for cle related software
WDL
1
star
40

qc-metric-aggregator

Given the output directory of a QC pipeline and a threshold config file, parse out the desired metrics and evalute them against the thresholds.
Python
1
star
41

docker-custom-clinvar-vcf

Python
1
star
42

docker-samtools-cwl

Dockerfile
1
star
43

ptero-petri

Petri net core of the PTero system
Python
1
star
44

cancer-genomics-workflow-wiki

A full featured, including pull requests, git repo for the arvados_trial Wiki
Shell
1
star
45

somatic-snv-test-data

Example Data for SomaticSniper
1
star
46

build-common

common build scripts used in c/c++ projects
Python
1
star
47

nessy-client-perl

Perl client for the nessy-server lock daemon
Perl
1
star
48

nessy-server

Python
1
star
49

docker-fgbio

A docker image for using fgbio
1
star
50

docker-strelka

A docker image for Strelka
1
star