• Stars
    star
    1,239
  • Rank 37,655 (Top 0.8 %)
  • Language
    HTML
  • License
    Other
  • Created about 8 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A collaboratively written review paper on deep learning, genomics, and precision medicine

The Deep Review

HTML Manuscript PDF Manuscript GitHub Actions Status Code Climate

Manuscript description

This repository is home to the Deep Review, a review article on deep learning in precision medicine. The Deep Review is collaboratively written on GitHub using a tool called Manubot (see below). The project operates on an open contribution model, welcoming contributions from anyone (see CONTRIBUTING.md or an existing example for more info). To see what's incoming, check the open pull requests. For project discussion and planning see the Issues.

The original version of the Deep Review was published in 2018 and should be cited as:

Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow P-M, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, and Greene CS. 2018. Opportunities and obstacles for deep learning in biology and medicine. Journal of The Royal Society Interface 15(141):20170387. doi:10.1098/rsif.2017.0387

Current stage: planning Deep Review version 2.0

As of writing, we are aiming to publish an update of the deep review. We will continue to make project preprints available on bioRxiv or another preprint service and aim to continue publishing the finished reviews in a peer-reviewed venue as well. Like the initial release, we are planning for an open and collaborative effort. New contributors are welcome and will be listed as version 2.0 authors. Please see issue #810 to contribute to the discussion of future plans, and help decide how to best continue this project.

Manubot updates: We recently updated this repository to use the latest Manubot version. Citations must now be semicolon separated like [@doi:10.1002/minf.201501008; @doi:10.1002/jcc.24764] and citation tags are required when the identifier contains forbidden characters. Previously, multiple citations were just separated by whitespace. In addition, we're switching from wrapping text at a character cutoff to "one sentence per line" as described in USAGE.md. Please make sure you base your pull requests off of the latest version of the greenelab:master branch. Keep your fork synced by setting its upstream remote to greenelab and running:

# If your branch only has commits from greenelab:master but is outdated
git pull --ff-only upstream master

# If your branch is outdated and has diverged from greenelab:master
git pull --rebase upstream master

Headline review format

The initial manuscript was a headline review for Journal of the Royal Society Interface on a topic overlapping the computer and life sciences in the area of systems pharmacology. The headline review solicitation states:

A Headline Review is one in a short, targeted series of high-level reviews within a particular topic of a burgeoning research area. We encourage authors to write in a style that opens the door to a broad range of readers working at the physical sciences - life sciences interface. We intend the reviews to address critical developments in an area of cross-disciplinary research and, when possible, to place such research in a broader context. This is not a place for comprehensive literature surveys.

We do encourage you to speculate in an informed way, and to be topical and provocative about the subject without worrying unduly about space, (the provisional target length is 8-12,000 words). Please think of this as an article which will be a landmark in your area, and will come to be considered as a classic paper of the literature.

Inspiration

On August 2, 2016, project maintainer Casey Greene introduced the project and its motivations:

I was recently inspired by Harold Pimentel's crowd-sourced collection of deep learning papers. Instead of having one individual write this, I thought that this invitation provided a wonderful opportunity to take advantage of the wisdom of crowds to bring a team together around this topic.

This repository provides a home for the paper. We'll operate on a pull request model. Anyone whose contributions meet the ICJME standards of authorship will be included as an author on the manuscript. I can't guarantee that it will be accepted, but I look forward to trying this approach out.

On August 5, Deep Review was announced with a tweet.

Manubot

Manubot is a system for writing scholarly manuscripts via GitHub. Manubot automates citations and references, versions manuscripts using git, and enables collaborative writing via GitHub. An overview manuscript presents the benefits of collaborative writing with Manubot and its unique features. The rootstock repository is a general purpose template for creating new Manubot instances. See USAGE.md for documentation how to write a manuscript.

Please open an issue for questions related to Manubot usage, bug reports, or general inquiries.

Repository directories & files

The directories are as follows:

  • content contains the manuscript source, which includes markdown files as well as inputs for citations and references. See USAGE.md for more information.
  • output contains the outputs (generated files) from Manubot including the resulting manuscripts. You should not edit these files manually, because they will get overwritten.
  • webpage is a directory meant to be rendered as a static webpage for viewing the HTML manuscript.
  • build contains commands and tools for building the manuscript.
  • ci contains files necessary for deployment via continuous integration.

Local execution

The easiest way to run Manubot is to use continuous integration to rebuild the manuscript when the content changes. If you want to build a Manubot manuscript locally, install the conda environment as described in build. Then, you can build the manuscript on POSIX systems by running the following commands from this root directory.

# Activate the manubot conda environment (assumes conda version >= 4.4)
conda activate manubot

# Build the manuscript, saving outputs to the output directory
bash build/build.sh

# At this point, the HTML & PDF outputs will have been created. The remaining
# commands are for serving the webpage to view the HTML manuscript locally.
# This is required to view local images in the HTML output.

# Configure the webpage directory
manubot webpage

# You can now open the manuscript webpage/index.html in a web browser.
# Alternatively, open a local webserver at http://localhost:8000/ with the
# following commands.
cd webpage
python -m http.server

Sometimes it's helpful to monitor the content directory and automatically rebuild the manuscript when a change is detected. The following command, while running, will trigger both the build.sh script and manubot webpage command upon content changes:

bash build/autobuild.sh

Continuous Integration

Whenever a pull request is opened, CI (continuous integration) will test whether the changes break the build process to generate a formatted manuscript. The build process aims to detect common errors, such as invalid citations. If your pull request build fails, see the CI logs for the cause of failure and revise your pull request accordingly.

When a commit to the master branch occurs (for example, when a pull request is merged), CI builds the manuscript and writes the results to the gh-pages and output branches. The gh-pages branch uses GitHub Pages to host the following URLs:

For continuous integration configuration details, see .github/workflows/manubot.yaml if using GitHub Actions or .travis.yml if using Travis CI.

License

License: CC BY 4.0 License: CC0 1.0

Except when noted otherwise, the entirety of this repository is licensed under a CC BY 4.0 License (LICENSE.md), which allows reuse with attribution. Please attribute by linking to https://github.com/greenelab/deep-review.

Since CC BY is not ideal for code and data, certain repository components are also released under the CC0 1.0 public domain dedication (LICENSE-CC0.md). All files matched by the following glob patterns are dual licensed under CC BY 4.0 and CC0 1.0:

  • *.sh
  • *.py
  • *.yml / *.yaml
  • *.json
  • *.bib
  • *.tsv
  • .gitignore

All other files are only available under CC BY 4.0, including:

  • *.md
  • *.html
  • *.pdf
  • *.docx

Please open an issue for any question related to licensing.

More Repositories

1

lab-website-template

An easy-to-use, flexible website template for labs.
HTML
304
star
2

scihub

Source code and data analyses for the Sci-Hub Coverage Study
Jupyter Notebook
289
star
3

tybalt

Training and evaluating a variational autoencoder for pan-cancer gene expression data
HTML
161
star
4

pancancer

Building classifiers using cancer transcriptomes across 33 different cancer-types
Jupyter Notebook
120
star
5

covid19-review

A collaborative review of the emerging COVID-19 literature. Join the chat here:
TeX
116
star
6

SPRINT_gan

Privacy-preserving generative deep neural networks support clinical data sharing
Jupyter Notebook
105
star
7

continuous_analysis

Computational reproducibility using Continuous Integration to produce verifiable end-to-end runs of scientific analysis.
Jupyter Notebook
81
star
8

BioBombe

BioBombe: Sequentially compressed gene expression features enhances biological signatures
Jupyter Notebook
63
star
9

adage

Data and code related to the paper "ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa..." Jie Tan, et al · mSystems · 2016
Python
61
star
10

crossref

Download metadata for all DOIs using the Crossref API
Jupyter Notebook
59
star
11

snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊
Jupyter Notebook
59
star
12

meta-review

Manuscript describing open collaborative writing with Manubot
HTML
48
star
13

multi-plier

An unsupervised transfer learning approach for rare disease transcriptomics
HTML
42
star
14

pubtator

Retrieve and process PubTator annotations
Python
41
star
15

DAPS

Denoising Autoencoders for Phenotype Stratification
HTML
41
star
16

TDM

R package for normalizing RNA-seq data to make them comparable to microarray data.
R
37
star
17

RNAseq_titration_results

Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously
HTML
34
star
18

scihub-manuscript

Manuscript for the Sci-Hub Coverage Study
HTML
32
star
19

onboarding

Onboarding materials for the Greene Lab
31
star
20

shared-latent-space

Shared Latent Space VAE's
Python
23
star
21

pdx_exomeseq

Pipeline analysis for whole exome sequencing of pancreatic cancer PDX models
HTML
21
star
22

GCB535

Materials for GCB535 at Penn.
Jupyter Notebook
20
star
23

preprint-similarity-search

A web app that uses machine learning to recommend the most suitable journals based on the text content of your preprint
Python
19
star
24

hclust

Agglomerative hierarchical clustering in JavaScript
JavaScript
17
star
25

miQC

Flexible, probablistic metrics for quality control of scRNA-seq data
R
17
star
26

opencitations

Processing OpenCitations Data
Jupyter Notebook
17
star
27

manubot-gpt-manuscript

Manuscript describing software for the automated revision of manubot manuscripts.
TeX
16
star
28

ccc

Jupyter Notebook
15
star
29

gbm_immune_validation

Validating glioblastoma immune cell immunohistochemsitry using computational deconvolution of TCGA tumors
R
14
star
30

scrumlord

Continuous administration of the Greene Lab's electronic scrum
Python
13
star
31

finish-that-manuscript

Ten quick tips to finish that manuscript that's 90% complete.
HTML
13
star
32

phenoplier

PhenoPLIER
Jupyter Notebook
12
star
33

nf1_inactivation

Using Machine Learning to Identify Glioblastoma patients with NF1 inactivation
Python
11
star
34

tad_pathways_pipeline

Pipeline to implement a "TAD_Pathways" analysis. Discover candidate genes based on association signals in TADs
Python
11
star
35

hgsc_subtypes

Two or three subtypes of high grade serous ovarian cancer subtypes fit data from different populations better than four
R
11
star
36

generic-expression-patterns

Distinguishing between generic and experiment-specific gene expression signals.
Jupyter Notebook
11
star
37

knowledge-graph-review

A literature review for constructing and using knowledge graphs in a biomedical setting.
HTML
11
star
38

CZI-Latent-Assessment

Supplement to the report in https://greenelab.github.io/czi-hca-report/
R
9
star
39

pancancer-evaluation

Evaluating genome-wide prediction of driver mutations using pan-cancer data
Jupyter Notebook
9
star
40

compbio-edi

Curated summary of efforts to promote equity, diversity, and inclusion in computational biology PhD programs and societies.
8
star
41

connectivity-search-analyses

hetnet connectivity search research notebooks (previously hetmech)
Jupyter Notebook
8
star
42

TDMresults

Scripts and data for re-creating TDM results.
R
8
star
43

gcb535challenge

We play a prediction game in our GCB 535 class. The class aims to teach students, primarily biologists, about machine learning methods and their use. This repository hosts the challenge for individuals outside of our lab.
Python
8
star
44

scihub-browser-data

Data for the Sci-Hub Stats Browser
Jupyter Notebook
7
star
45

hetontology

Biological ontologies as hetnets in Neo4j
Shell
7
star
46

mpmp

Multimodal Pan-cancer Mutation Prediction
Jupyter Notebook
7
star
47

GEA_Community_Detection

Overrepresentation analysis for KEGG and PID pathways using community detection
Python
7
star
48

iscb-diversity

Analyzing diversity of ISCB keynote speakers & fellows compared to the field of bioinformatics
Jupyter Notebook
7
star
49

PathCORE-T-analysis

This repository is in support of the PathCORE-T paper (https://doi.org/10.1101/147645). It contains all the code and necessary data/metadata to repeat all analyses in the paper.
Jupyter Notebook
7
star
50

ADAGEpath

An R package (ADAGEpath) to perform signature analysis using methodology from the ADAGE manuscript
R
6
star
51

word-lapse

Explore how a word changes over time
JavaScript
6
star
52

connectivity-search-backend

Django backend for hetnet connectivity search
Python
6
star
53

old-adage-server

DEPRECATED - the old Adage web app. Replaced by: https://github.com/greenelab/py3-adage-backend and https://github.com/greenelab/adage-frontend
JavaScript
6
star
54

staNMF

A python implementation of Stability NMF
Python
6
star
55

czi-rfa

Application to "Collaborative Computational Tools for the Human Cell Atlas" https://chanzuckerberg.com/initiatives/rfa
6
star
56

rheum-plier-data

Data repository for the MultiPLIER project
R
5
star
57

linear_signal

Comparing the performance of linear and nonlinear models in transcriptomic prediction
Jupyter Notebook
5
star
58

iscb-diversity-manuscript

Analysis of ISCB Fellows and Keynotes Reveals Disparities
HTML
5
star
59

biopriors-review

A literature review of biologically constrained machine learning models
HTML
5
star
60

nature_news_disparities

Analysis pipeline for Nature news articles
R
5
star
61

connectivity-search-manuscript

Manuscript describing Hetnet Connectivity Search
HTML
5
star
62

annotation-refinery

A python package that consists of functions that process publicly available annotated sets of genes
Python
5
star
63

model-free-data

Case-control genetics datasets evolved to be epistatic
PostScript
5
star
64

simulate-expression-compendia

Evaluating the effect of technical sources of variability in large-scale gene expression compendia.
Jupyter Notebook
5
star
65

library-access

Collecting data on whether library access to scholarly literature
Jupyter Notebook
5
star
66

2022-microberna

A pipeline to generate a compendia of bacterial and archaeal RNA-seq data
Jupyter Notebook
4
star
67

wiki-nationality-estimate

Name-based nationality prediction trained on Wikipedia
Python
4
star
68

snorkeling-full-text

This is an upgraded version of the original snorkeling project.
Jupyter Notebook
4
star
69

continuous_analysis_phylo

A simple phylogenetic tree building example of Continuous Analysis
Shell
4
star
70

greedy-geneset-selection

Source code associated with "Leveraging global gene expression patterns to predict expression of unmeasured genes"
R
4
star
71

PathCORE-T

Methods to build a network of pathway co-occurrence relationships out of expression signatures extracted from transcriptomic compendia.
Python
4
star
72

adni-netwas

Repository associated with Song et al. manuscript describing a Network-wide Association Study of ADNI Cohorts.
Python
4
star
73

multiple-myeloma-classifier

Multi-class KRAS/NRAS Classifier for Multiple Myeloma
HTML
4
star
74

annorxiver

Annotating Rxiv preprints in an automated fashion
Jupyter Notebook
4
star
75

adage-frontend

The Adage web app, a tool to explore gene expression data and discover new insights from machine learning models
JavaScript
4
star
76

biovectors

Detecting Biomedical Relationships using Word2vec On Pubtator Central
Jupyter Notebook
3
star
77

text_mined_hetnet_manuscript

Manuscript is on using snorkel to extract Heterogeneous Relationships from Pubmed Abstracts.
HTML
3
star
78

xswap-manuscript

Manuscript on XSwap network permutation and hetnet node degrees
HTML
3
star
79

xswap-analysis

Analysis and experiments for https://github.com/greenelab/xswap-manuscript
Jupyter Notebook
3
star
80

czi-hca-report

A repository that collects our findings from our efforts under the CZI/HCA project.
CSS
3
star
81

tribe

An open-source webserver that allows for easy, reproducible genomics analyses between different webservers
Python
3
star
82

LINCS_latent_space

Training VAE on LINCS dataset.
Jupyter Notebook
3
star
83

greenblack

Does green OA via preprinting reduce Sci-Hub usage?
Jupyter Notebook
3
star
84

sophie

Software to distinguish between common and experiment-specific gene expression signals
Jupyter Notebook
3
star
85

continuous_analysis_rnaseq

Example of how continuous analysis can be used for RNA-Seq differential expression.
R
3
star
86

django-genes

A Django package to represent genes
Python
2
star
87

deconvolution_pilot

R
2
star
88

connectivity-search-frontend

Frontend code for connectivity search (formerly "Hetmech")
JavaScript
2
star
89

adage-backend

The backend for Adage web app
Python
2
star
90

whistl

Transfer learning + gene expression
Jupyter Notebook
2
star
91

computational-reagents

Rigor, Reproducibility, Transparency, and Reagent Validity for Computational Biologists
HTML
2
star
92

cycleGAN_gene_expression

Experimenting using cycleGAN to transform P. aeruginosa gene expression data between planktonic and biofilm conditions.
Jupyter Notebook
2
star
93

ponyo

Software to simulate compendium-wide gene expression data using a VAE.
Python
2
star
94

wenda_gpu

Fast domain adaptation method for building prediction models on genomic data
Python
2
star
95

buddi

BuDDI model implementation
Python
2
star
96

linear_models_manuscript

HTML
1
star
97

2022-cf-sputum

Analysis of cystic fibrosis sputum RNA-seq samples for discovery of interesting gene expression pathways
Jupyter Notebook
1
star
98

pred_missing_celltypes

Predicting missing cell-type proportions from deconvolution residual using NNLS.
Jupyter Notebook
1
star
99

buddi_analysis

Analysis notebooks for the BuDDI manuscript
Jupyter Notebook
1
star
100

wenda_gpu_paper

Python
1
star