• Stars
    star
    201
  • Rank 194,491 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)

Activity by Contact Model of Enhancer-Gene Specificity

The Activity-by-Contact (ABC) model predicts which enhancers regulate which genes on a cell type specific basis. This repository contains the code needed to run the ABC model as well as small sample data files, example commands, and some general tips and suggestions. We provide a brief description of the model below, see Fulco, Nasser et al [1] for a full description.

v0.2 is the recommended version for the majority of users. There are some minor methodological differences between v0.2 and the model as described in [1]. These differences are related to Hi-C data processing and were implemented to improve the speed and scalability of the codebase. As such ABC scores computing using v0.2 will not exactly match those published in [1], although they will be very close. The codebase used to generate the results in [1] is available in the NG2019 branch of this repository. The NG2019 branch is no longer maintained.

If you use the ABC model in published research, please cite:

[1] Fulco CP, Nasser J, Jones TR, Munson G, Bergman DT, Subramanian V, Grossman SR, Anyoha R, Doughty BR, Patwardhan TA, Nguyen TH, Kane M, Perez EM, Durand NC, Lareau CA, Stamenova EK, Aiden EL, Lander ES & Engreitz JM. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019). https://www.nature.com/articles/s41588-019-0538-0

[2] Nasser J, Bergman DT, Fulco CP, Guckelberger P, Doughty BR, Patwardhan TA, Jones TR, Nguyen TH, Ulirsch JC, Lekschas F, Mualim K, Natri HM, Weeks EM, Munson G, Kane M, Kang HY, Cui A, Ray JP, Eisenhaure TM, Collins RL, Dey K, Pfister H, Price AL, Epstein CB, Kundaje A, Xavier RJ, Daly MJ, Huang H, Finucane HK, Hacohen N, Lander ES, Engreitz JM. Genome-wide enhancer maps link risk variants to disease genes. Nature. 2021 May;593(7858):238-243. doi: 10.1038/s41586-021-03446-x

Requirements

For each cell-type, the inputs to the ABC model are:

  • Required Inputs
    • bam file for Dnase-Seq or ATAC-Seq (indexed and sorted)
    • bam file for H3K27ac ChIP-Seq (indexed and sorted)
  • Optional Inputs

In addition the following (non-cell-type specific) genome annotation files are required

  • bed file containing gene annotations (may change across cell types if using cell-type specific TSS's)
  • bed file containing chromosome annotations

Dependencies

The codebase relies on the following dependancies (tested version provided in parentheses):

Python (3.6.4)
samtools (0.1.19)
bedtools (2.26.0)
Tabix (0.2.5) - Partial dependancy
MACS2 (2.1.1.20160309) - Partial dependancy
Java (1.8) - Partial dependancy
Juicer Tools (1.7.5) - Partial dependancy

Python packages:
pyranges (0.0.55)
numpy (1.15.2)
pandas (0.23.4)
scipy (0.18.1)
pysam (0.15.1)
pyBigWig (0.3.2) - Partial dependancy

Description of the ABC Model

The Activity by Contact (ABC) model is designed to represent a mechanistic model in which enhancers activate gene transcription upon enhancer-promoter contact. In a simple conception of such a model, the quantitative effect of an enhancer depends on the frequency with which it contacts a promoter multiplied by the strength of the enhancer (i.e., the ability of the enhancer to activate transcription upon contacting a promoter). Moreover, the contribution of a specific enhancer to a gene’s expression should depend on the surrounding context (ie, the strength and contact frequency of other enhancers for the gene).

To convert this conceptual framework into a practical score (which can be applied genome-wide), we formulated the ABC score:

ABC score for effect of element E on gene G = Activity of E × Contact frequency between E and G / Sum of (Activity × Contact Frequency) over all candidate elements within 5 Mb.

Operationally, Activity (A) is defined as the geometric mean of the read counts of DNase-seq and H3K27ac ChIP-seq at an element E, and Contact (C) as the KR normalized Hi-C contact frequency between E and the promoter of gene G. Elements are defined as ~500bp regions centered on DHS peaks.

Note that the ABC model only considers candidate elements and genes on the same chromosome. It does not make interchromosomal predictions.

Running the ABC Model

Running the ABC model consists of the following steps:

  1. Define candidate enhancer regions
  2. Quantify enhancer activity
  3. Compute ABC Scores

Step 1. Define candidate elemets

'Candidate elements' are the set of putative enhancer elements for which ABC Scores will be computed. A typical way to define candidate elements is by calling peaks on a DNase-Seq or ATAC-Seq bam file. In this implementation we first call peaks using MACS2 and then process these peaks using makeCandidateRegions.py.

makeCandidateRegions.py will take as input the narrowPeak file produced by MACS2 and then perform the following processing steps:

  1. Count DNase-seq reads in each peak and retain the top N peaks with the most read counts
  2. Resize each of these N peaks to be a fixed number of base pairs centered on the peak summit
  3. Remove any regions listed in the 'blocklist' and include any regions listed in the 'includelist'
  4. Merge any overlapping regions

See 'Defining Candidate Enhancers' section below for more details.

Activate macs conda environment via:

conda env create -f macs.yml 

Sample commands:

#Call peaks

macs2 callpeak \
-t example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
-n wgEncodeUwDnaseK562AlnRep1.chr22.macs2 \
-f BAM \
-g hs \
-p .1 \
--call-summits \
--outdir example_chr22/ABC_output/Peaks/ 

#Sort narrowPeak file
bedtools sort -faidx example_chr22/reference/chr22 -i example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak > example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted

Activate abc conda environment via:

conda env create -f abcenv.yml

#Call candidate regions

python src/makeCandidateRegions.py \
--narrowPeak example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
--bam example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
--outDir example_chr22/ABC_output/Peaks/ \
--chrom_sizes example_chr22/reference/chr22 \
--regions_blocklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \
--regions_includelist example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed \
--peakExtendFromSummit 250 \
--nStrongestPeaks 3000 

We recommend using --nStrongestPeaks 150000 when making genome-wide peak calls. 3000 is just used for the small example on chr22.

We have found that on some systems, MACS2 and Python3 are incompatible. It may be necessary to change virtual environments, or installed packages/dotkits after running MACS2.

Main output files:

  • *macs2_peaks.narrowPeak: MACS2 narrowPeak file
  • *macs2_peaks.narrowPeak.candidateRegions.bed: filtered, extended and merged peak calls from MACS2. These are the candidate regions used in downstream scripts.

Step 2. Quantifying Enhancer Activity:

run.neighborhoods.py will count DNase-seq (or ATAC-seq) and H3K27ac ChIP-seq reads in candidate enhancer regions. It also makes GeneList.txt, which counts reads in gene bodies and promoter regions.

Replicate epigenetic experiments should be included as comma delimited list of files. Read counts in replicate experiments will be averaged when computing enhancer Activity.

Sample Command:

python src/run.neighborhoods.py \
--candidate_enhancer_regions example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.candidateRegions.bed \
--genes example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.chr22.bed \
--H3K27ac example_chr22/input_data/Chromatin/ENCFF384ZZM.chr22.bam \
--DHS example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam,example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep2.chr22.bam \
--expression_table example_chr22/input_data/Expression/K562.ENCFF934YBO.TPM.txt \
--chrom_sizes example_chr22/reference/chr22 \
--ubiquitously_expressed_genes reference/UbiquitouslyExpressedGenesHG19.txt \
--cellType K562 \
--outdir example_chr22/ABC_output/Neighborhoods/ 

Main output files:

  • EnhancerList.txt: Candidate enhancer regions with Dnase-seq and H3K27ac ChIP-seq read counts
  • GeneList.txt: Dnase-seq and H3K27ac ChIP-seq read counts on gene bodies and gene promoter regions

Step 3. Computing the ABC Score

Compute ABC scores by combining Activity (as calculated by run.neighborhoods.py) and Hi-C.

Sample Command:

python src/predict.py \
--enhancers example_chr22/ABC_output/Neighborhoods/EnhancerList.txt \
--genes example_chr22/ABC_output/Neighborhoods/GeneList.txt \
--HiCdir example_chr22/input_data/HiC/raw/ \
--chrom_sizes example_chr22/reference/chr22 \
--hic_resolution 5000 \
--scale_hic_using_powerlaw \
--threshold .02 \
--cellType K562 \
--outdir example_chr22/ABC_output/Predictions/ \
--make_all_putative

Step 4. Get Prediction Files for Variant Overlap

Perform filtering strategies to prepare prediction files for downstream variant overlap analysis (including a more lenient filter on the ABC score of 0.015, and shrinking the regions for overlapping with variants, see Nasser et al. [2]):

  • this step is already being performed in src/predict.py

Sample Command:

python src/getVariantOverlap.py \
--all_putative EnhancerPredictionsAllPutative.txt.gz \
--chrom_sizes example_chr22/reference/chr22 \
--outdir . 			

The main output files are:

  • EnhancerPredictions.txt: all element-gene pairs with scores above the provided threshold. Only includes expressed genes and does not include promoter elements. This file defines the set of 'positive' predictions of the ABC model.
  • EnhancerPredictionsFull.txt: same as above but includes more columns. See https://docs.google.com/spreadsheets/d/1UfoVXoCxUpMNPfGypvIum1-RvS07928grsieiaPX67I/edit?usp=sharing for column definitions
  • EnhancerPredictions.bedpe: Same as above in .bedpe format. Can be loaded into IGV.
  • EnhancerPredictionsAllPutative.txt.gz: ABC scores for all element-gene pairs. Includes promoter elements and pairs with scores below the threshold. Only includes expressed genes. This file includes both the 'positive' and 'negative' predictions of the model. (use --make_all_putative to generate this file).
  • EnhancerPredictionsAllPutativeNonExpressedGenes.txt.gz: Same as above for non-expressed genes. This file is provided for completeness but we generally do not recommend using these predictions.

The default threshold of 0.02 corresponds to approximately 70% recall and 60% precision [1].

Defining Candidate Enhancers

'Candidate elements' are the set of putative enhancers; ABC scores will be computed for all 'Candidate elements' within 5Mb of each gene. In computing the ABC score, the product of DNase-seq (or ATAC-seq) and H3K27ac ChIP-seq reads will be counted in each candidate element. Thus the candidate elements should be regions of open (nucleasome depleted) chromatin of sufficient length to capture H3K27ac marks on flanking nucleosomes. In [1], we defined candidate regions to be 500bp (150bp of the DHS peak extended 175bp in each direction).

Given that the ABC score uses absolute counts of Dnase-seq reads in each region, makeCandidateRegions.py selects the strongest peaks as measured by absolute read counts (not by pvalue). In order to do this, we first call peaks using a lenient significance threshold (.1 in the above example) and then consider the peaks with the most read counts. This procedure implicitly assumes that the active karyotype of the cell type is constant.

We recommend removing elements overlapping regions of the genome that have been observed to accumulate anomalous number of reads in epigenetic sequencing experiments (‘block-listed regions’). For convenience, we provide the list of block-listed regions available from https://sites.google.com/site/anshulkundaje/projects/blacklists.

We also force the candidate enhancer regions to include gene promoters, even if the promoter is not among the candidate elements with the strongest signals genome-wide in a cell type, by specifying --region_includelist.

Contact and Hi-C

Given that cell-type specific Hi-C data is more difficult to generate than ATAC-seq or ChIP-seq, we have explored alternatives to using cell-type specific Hi-C data. It has been shown that Hi-C contact frequencies generally follow a powerlaw relationship (with respect to genomic distance) and that many TADs, loops and other structural features of the 3D genome are not cell-type specific (Sanborn et al 2015, Rao et al 2014).

We have found that, for most genes, using an average Hi-C profile in the ABC model gives approximately equally good performance as using a cell-type specific Hi-C profile. To facilitate making ABC predictions in a large panel of cell types, including those without cell type-specific Hi-C data, we have provided an average Hi-C matrix (averaged across 10 cell lines, at 5kb resolution).

Format of Hi-C data

The ABC model supports two Hi-C data formats.

  • Juicer format: three column 'sparse matrix' format representation of a Hi-C matrix.
  • bedpe format: More general format which can support variable and arbitrary bin sizes.

Average Hi-C

The celltypes used for averaging are: GM12878, NHEK, HMEC, RPE1, THP1, IMR90, HUVEC, HCT116, K562, KBM7.

Average Hi-C data can be downloaded from: ftp://ftp.broadinstitute.org/outgoing/lincRNA/average_hic/average_hic.v2.191020.tar.gz (20 GB)

Pipeline to Download Hi-C data

The below pipeline will download a Hi-C matrix from Juicebox (in .hic format) and fit a powerlaw distribution.

#Download hic matrix file from juicebox
python src/juicebox_dump.py \
--hic_file https://hicfiles.s3.amazonaws.com/hiseq/k562/in-situ/combined_30.hic \
--juicebox "java -jar juicer_tools.jar" \
--outdir example_chr22/input_data/HiC/raw/ \
--chromosomes 22
#Fit HiC data to powerlaw model and extract parameters
python src/compute_powerlaw_fit_from_hic.py \
--hicDir example_chr22/input_data/HiC/raw/ \
--outDir example_chr22/input_data/HiC/raw/powerlaw/ \
--maxWindow 1000000 \
--minWindow 5000 \
--resolution 5000 \
--chr 'chr22'

Contact data in bedpe formats

Contact data can also be loaded in a generic bedpe format. This supports promoter-capture Hi-C and mixed-resolution Hi-C matrices. Use --hic_type bedpe in predict.py to enable this. Note that if contact data is provided in bedpe format, some of the default Hi-C normalizations are not applied. (For example, the code will not adjust the diagonal of the contact matrix)

The bedpe file should be a tab delimited file containing 8 columns (chr1,start1,end1,chr2,start2,end2,name,score) where score denotes the contact frequency. An example is given here: input_data/HiC/raw/chr22/chr22.bedpe.gz

python src/predict.py \
--enhancers example_chr22/ABC_output/Neighborhoods/EnhancerList.txt \
--genes example_chr22/ABC_output/Neighborhoods/GeneList.txt \
--HiCdir example_chr22/input_data/HiC/raw/ \
--hic_type bedpe \
--hic_resolution 5000 \
--scale_hic_using_powerlaw \
--threshold .02 \
--cellType K562 \
--outdir example_chr22/ABC_output/Predictions/ \
--make_all_putative

ABC model without experimental contact data

If experimentally derived contact data is not available, one can run the ABC model using the powerlaw estimate only. In this case the --HiCdir argument should be excluded from predict.py and the --score_column powerlaw.Score argument should be included in predict.py. In this case the ABC.Score column of the predictions file will be set to NaN. The powerlaw.Score column of the output prediction files will be the relevant Score column to use.

Gene Expression in ABC

The ABC model is designed to predict the effect of enhancers on expressed genes. If a gene is not expressed in a given cell type (or cell state) then we assume it does not have any activating enhancers (enhancers for which inhibition of the enhancer would lead to decrease in gene expression). Thus we typically only report enhancer-gene connections for expressed genes.

In the absence of expression data, DNase-seq and H3K27ac ChIP-seq at the gene promoter can be used as a proxy for expression. We suggest only considering enhancer-gene connections for genes with sufficiently active promoters (for instance in the top 60% of gene promoters in the cell type)

Quantile Normalization

The ABC Score uses the quantitative signal of Hi-C, ATAC-Seq and H3K27ac ChIP-Seq. As such it is sensitive to differences in these datasets due to experimental or technical procedures. For example, ABC scores computed on an epigenetic dataset with low signal to noise will be lower than ABC scores computed on an epigenetic dataset with high signal to noise.

In an effort to make ABC scores comparable across cell types, the ABC model code supports quantile normalizing the epigenetic signal in candidate enhancer regions to some reference. The reference we provide in this repository is computed on ~160,000 DHS peaks in K562.

Empirically, we have found that applying quantile normalization makes ABC predictions more comparable across cell types (particularly there is substantial variability in the signal to noise ratio of the epigenetic datasets across cell types). However, care should be taken as quantile normalization may not be applicable to all circumstances.

Additionally, the threshold value on the ABC score of .02 is calculated based on the K562 epigenetic data.

Quantile normalization can be applied using --qnorm src/EnhancersQNormRef.K562.txt in run.neighborhoods.py

Tips and Comments

  • Accurate transcription start site annotations are critical. The ABC model uses the TSS of the gene in order to assign enhancer-promoter contact frequency. If the TSS annotation is inaccurate (off by >5kb) it will lead to inaccurate predictions.
  • We have found that ubiquitously expressed genes appear insensitive to the effects of distal enhancers. For completeness, this code calculates the ABC score for all genes and flags ubiquitously expressed genes.
  • The size of candidate enhancer elements is important. For example, if two candidate regions are merged, then the ABC score of the merged region will be approximately the sum of the ABC scores for each individual region.
  • In our testing the ABC model typically predicts on average ~3 distal enhancers per expressed gene. If you run the model on a cell type and find a large deviation from this number (say <2 or >5) this may mean the ABC model is not well calibrated in the cell type. Typical remedies are to use quantile normalization, scale Hi-C or to lower/raise the cutoff on the ABC score.

Contact

Please submit a github issue with any questions or if you experience any issues/bugs.

More Repositories

1

gatk

Official code repository for GATK versions 4 and up
Java
1,691
star
2

cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
Scala
990
star
3

picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Java
965
star
4

infercnv

Inferring CNV from Single-Cell RNA-Seq
R
558
star
5

keras-rcnn

Keras package for region-based convolutional neural networks (RCNNs)
Python
554
star
6

gtex-pipeline

GTEx & TOPMed data production and analysis pipelines
Python
334
star
7

pilon

Pilon is an automated genome assembly improvement and variant detection tool
Scala
306
star
8

keras-resnet

Keras package for deep residual networks
Python
294
star
9

CellBender

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
Python
293
star
10

Tangram

Spatial alignment of single cell transcriptomic data.
Jupyter Notebook
249
star
11

ssGSEA2.0

Single sample Gene Set Enrichment analysis (ssGSEA) and PTM Enrichment Analysis (PTM-SEA)
R
237
star
12

warp

WDL Analysis Research Pipelines
WDL
200
star
13

viral-ngs

Viral genomics analysis pipelines
Python
180
star
14

seqr

web-based analysis tool for rare disease genomics
Python
176
star
15

gatk-sv

A structural variation pipeline for short-read sequencing
Python
170
star
16

tensorqtl

Ultrafast GPU-enabled QTL mapper
Python
159
star
17

ichorCNA

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
R
159
star
18

long-read-pipelines

Long read production pipelines
Jupyter Notebook
140
star
19

wot

A software package for analyzing snapshots of developmental processes
Jupyter Notebook
136
star
20

ml4h

Jupyter Notebook
122
star
21

depmap_omics

What you need to process the Quarterly DepMap-Omics releases from Terra
HTML
110
star
22

xtermcolor

Python library for terminal color support (including 256-color support)
Python
104
star
23

Drop-seq

Java tools for analyzing Drop-seq data
Java
100
star
24

mutect

MuTect -- Accurate and sensitive cancer mutation detection
Java
92
star
25

genomics-in-the-cloud

Source code and related materials for the O'Reilly book
Jupyter Notebook
91
star
26

gnomad_methods

Hail helper functions for the gnomAD project and Translational Genomics Group
Python
89
star
27

pyro-cov

Pyro models of SARS-CoV-2 variants
Jupyter Notebook
77
star
28

catch

A package for designing compact and comprehensive capture probe sets.
Python
74
star
29

gatk-docs

Documentation archive for GATK tools and workflows
HTML
71
star
30

oncotator

Python
67
star
31

gnomad-browser

Explore gnomAD datasets on the web
TypeScript
66
star
32

gtex-viz

GTEx Visualizations
JavaScript
63
star
33

single_cell_portal_core

Rails/Docker application for the Broad Institute's single cell RNA-seq data portal
Ruby
62
star
34

PhylogicNDT

HTML
57
star
35

docker-terraform

Docker container for running the Terraform application
Shell
56
star
36

2020_scWorkshop

Code and data repository for the 2020 physalia course on single cell RNA sequencing.
Shell
56
star
37

cromshell

CLI for interacting with Cromwell servers
Python
53
star
38

cellpainting-gallery

Cell Painting Gallery
52
star
39

viral-pipelines

viral-ngs: complete pipelines
WDL
51
star
40

gnomad_qc

Jupyter Notebook
48
star
41

single_cell_portal

Tutorials, workflows, and convenience scripts for Single Cell Portal
HTML
47
star
42

gistic2

Genomic Identification of Significant Targets in Cancer (GISTIC), version 2
MATLAB
44
star
43

sam

workbench identity and access management
Scala
42
star
44

dsde-deep-learning

DSDE Deep Learning Club
Python
40
star
45

gamgee

A C++14 library for NGS data formats
C++
40
star
46

wdl-ide

Rich IDE support for Workflow Description Language
Python
39
star
47

gtex-v8

Notebooks and scripts for reproducing analyses and figures from the V8 GTEx Consortium paper
Jupyter Notebook
39
star
48

pyqtl

Collection of analysis tools for quantitative trait loci
Python
38
star
49

SignatureAnalyzer-GPU

GPU implementation of ARD NMF
Python
37
star
50

poasta

Fast and exact gap-affine partial order alignment
Rust
37
star
51

Celligner_ms

Code related to the Celligner manuscript
R
36
star
52

cell-health

Predicting Cell Health with Morphological Profiles
HTML
35
star
53

PANOPLY

Repository for the Broad Institute Proteogenomic Data Analysis Center (PGDAC) established by the NIH Clinical Proteomics Tumor Analysis Consortium (CPTAC)
R
33
star
54

gatk-protected

Obsolete/Legacy GATK repository -- go to https://github.com/broadinstitute/gatk instead
Java
33
star
55

StrainGE

strain-level analysis tools
Python
33
star
56

firecloud-orchestration

Scala
31
star
57

gdctools

Python and UNIX CLI utilities to simplify interaction with the NIH/NCI Genomics Data Commons
Python
31
star
58

python-cert_manager

Python interface to the Sectigo Certificate Manager REST API
Python
31
star
59

str-analysis

Scripts and utilities related to analyzing short tandem repeats (STRs).
Python
29
star
60

adapt

A package for designing activity-informed nucleic acid diagnostics for viruses.
Python
29
star
61

2019_scWorkshop

Repo for Physalia course Analysis of Single Cell RNA-Seq data
TeX
29
star
62

chronos

Modeling of time series data for CRISPR KO experiments
Python
28
star
63

fiss

FireCloud Service Selector (FISS) -- Python bindings and CLI for FireCloud execution engine
Python
28
star
64

pyfrost

Python bindings for Bifrost with a NetworkX compatible API
Python
27
star
65

single_cell_analysis

Documents used for workshops on single cell analysis
HTML
26
star
66

deepometry

Image classification for imaging flow cytometry.
Python
26
star
67

lincs-cell-painting

Processed Cell Painting Data for the LINCS Drug Repurposing Project
Jupyter Notebook
25
star
68

rawls

Rawls service for DSDE
Scala
25
star
69

delphy

Fast, scalable, accurate and accessible Bayesian phylogenetics
C++
25
star
70

firepony

Efficient base quality score recalibrator for NGS data
Cuda
24
star
71

protigy

Proteomics Toolset for Integrative Data Analysis
R
22
star
72

GATK-for-Microbes

WDL
22
star
73

seqr-loading-pipelines

hail-based pipelines for annotating variant callsets and exporting them to elasticsearch
Python
22
star
74

BipolarCell2016

R
21
star
75

cromwell-tools

A collection of Python clients and accessory scripts for interacting with the Cromwell
Python
21
star
76

covid19-testing

COVID-19 Diagnostic Processing Dashboard
HTML
20
star
77

single_cell_classification

Methods to use SNPs or gene expression to classify single cell RNAseq to reference profiles
R
20
star
78

VariantBam

Filtering and profiling of next-generational sequencing data using region-specific rules
Makefile
20
star
79

longbow

Annotation and segmentation of MAS-seq data
Python
20
star
80

gtex-single-nucleus-reference

Code repository for the snRNA-seq cross-tissue atlas project
Jupyter Notebook
20
star
81

flipbook

A tool that lets you quickly flip through images in a local directory and record notes or answer questions about each one.
Python
19
star
82

AwesomeGenomics

Cancer Data Science's go to place for excellent genomics tools and packages
19
star
83

firecloud-ui

FireCloud user interface for web browsers.
Clojure
19
star
84

BARD

BioAssay Research Database
Groovy
19
star
85

wdltool

Scala
18
star
86

vim-wdl

Vim syntax highlighting for WDL
Vim Script
18
star
87

SpliceAI-lookup

Website for checking SpliceAI and Pangolin scores:
Python
17
star
88

palantir-workflows

Utility workflows for the DSP hydro.gen team (formerly palantir)
WDL
17
star
89

epi-SHARE-seq-pipeline

Epigenomics Program pipeline to analyze SHARE-seq data.
WDL
16
star
90

wordpress-crowd-plugin

Crowd Authentication Plugin for Wordpress
PHP
16
star
91

mix_seq_ms

Code associated with MIX-seq manuscript
R
14
star
92

imaging-platform-pipelines

Cell Painting and other pipelines from the Imaging Platform
13
star
93

wdl-runner

Easily run WDL workflows on GCP
Python
13
star
94

widdler

A command-line tool for executing, managing, and querying WDL workflows on Cromwell servers.
Python
13
star
95

cms

Composite of Multiple Signals: tests for selection in meiotically recombinant populations
Python
13
star
96

regional_missense_constraint

Code to calculate regional missense constraint
Python
12
star
97

scRNA-Seq

Python
12
star
98

scalable_analytics

Public collaboration of Scalable Single Cell Analytics
Python
12
star
99

ml4ht_data_source

Multimodal data loader compatible with pytorch and tensorflow
Python
12
star
100

gene-hints

Discoverability for gene search 🧬 🔍
Python
12
star