• Stars
    star
    117
  • Rank 300,048 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Official code repository for EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation

EquiFold

This is the official open source repository for EquiFold developed by Prescient Design, a Genentech accelerator.

Notes

  • This light-weight research version of the code was used to produce figures reported in the manuscript (to be updated soon). We plan to release a higher-quality version of the code with additional, user-level features in the future.
  • There are known issues occasionally seen in predicted structures including nonphysical bond geometry and clashes. We are currently developing approaches to minimize these issues for future releases.

Setup and Usage

Environment

We used the following GPU-enabled setup with conda (originally run in an HPC environment with NVIDIA A100 GPUs).

$ conda create -n ef python=3.9 -y
$ conda activate ef
$ conda install pytorch=1.11 cudatoolkit=11.3 -c pytorch -y
$ pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.11.0+cu113.html 
$ pip install e3nn pytorch-lightning biopython pandas tqdm einops

Alternatively, for use without GPUs:

conda create -n ef python=3.9 -y
conda activate ef
conda install pytorch=1.12 -c pytorch -y
conda install pyg -c pyg
pip install e3nn pytorch-lightning biopython pandas tqdm einops

Model weights

PyTorch model weights and hyper-parameter configs for the models trained on mini-protein and antibody datasets as described in the manuscript are stored in models directory.

Run model predictions

To make predictions using a trained model, users can run the following scripts providing input sequences as a CSV table:

# For antibodies
$ python run_inference.py --model ab --model_dir models --seqs tests/data/inference_ab_input.csv --ncpu 1 --out_dir out_tests

# For mini-proteins
$ python run_inference.py --model science --model_dir models --seqs tests/data/inference_science_input.csv --ncpu 1 --out_dir out_tests

Contributing

We welcome contributions. If you would like to submit pull requests, please make sure you base your pull requests off the latest version of the main branch. Keep your fork synced by setting its upstream remote to Genentech/equifold and running:

# If your branch only has commits from master but is outdated:

$ git pull --ff-only upstream main


# If your branch is outdated and has diverged from main branch:

$ git pull --rebase upstream main

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Citations

If you use the code and/or model, please cite:

@article {Lee2022.10.07.511322,
    author = {Lee, Jae Hyeon and Yadollahpour, Payman and Watkins, Andrew and Frey, Nathan C. and Leaver-Fay, Andrew and Ra, Stephen and Cho, Kyunghyun and Gligorijevi{\'c}, Vladimir and Regev, Aviv and Bonneau, Richard},
    title = {EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation},
    elocation-id = {2022.10.07.511322},
    year = {2023},
    doi = {10.1101/2022.10.07.511322},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2023/01/02/2022.10.07.511322},
    eprint = {https://www.biorxiv.org/content/early/2023/01/02/2022.10.07.511322.full.pdf},
    journal = {bioRxiv}
}

More Repositories

1

gReLU

gReLU is a python library to train, interpret, and apply deep learning models to DNA sequences.
Python
182
star
2

scimilarity

A unifying representation of single cell expression profiles that quantifies similarity between expression states and generalizes to represent new studies without additional training.
Python
82
star
3

pviz

Pviz
JavaScript
69
star
4

sVAE

Python
55
star
5

walk-jump

Official repository for discrete Walk-Jump Sampling (dWJS)
Python
49
star
6

iterative-perturb-seq

Sequential Optimal Experimental Design of Perturbation Screens Guided by Multimodal Priors
Jupyter Notebook
31
star
7

polygraph

Polygraph evaluates and compares groups of nucleic acid sequences based on their sequence and functional content for effective design of regulatory elements.
Jupyter Notebook
27
star
8

regLM

Toolkit for training hyenaDNA-based autoregressive language models on DNA sequences.
Jupyter Notebook
22
star
9

dcdfg

Python
21
star
10

covtracer

Tools for contextualizing tests, built using covr test traces.
R
21
star
11

jmpost

R
17
star
12

Absolve

Absolve antibody variable domain sequence analysis
C++
16
star
13

psborrow2

psborrow2: Bayesian Dynamic Borrowing Simulation Study and Analysis
R
16
star
14

ringer

RINGER: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion
Python
15
star
15

beignet

A standard library for biological research.
Python
14
star
16

GraphGUIDE

Jupyter Notebook
13
star
17

Isosceles

The Isoforms from Single-Cell; Long-read Expression Suite
R
12
star
18

midasHLA

R package enabling statistical association analysis and using immunogenetic data transformation functions for HLA amino acid fine mapping, analysis of HLA evolutionary divergence as well as HLA-KIR interactions.
R
11
star
19

GameRank

R
10
star
20

gPKPDviz

R
10
star
21

Perturb-OT

Cross-modality matching and prediction with labeled Gromov-Wasserstein Optimal Transport.
Jupyter Notebook
9
star
22

nodags-flows

Jupyter Notebook
8
star
23

branched-diffusion

Jupyter Notebook
8
star
24

SASS

Python
8
star
25

rd2markdown

R
7
star
26

uniprot-mongodb

JavaScript
7
star
27

uniprot-js

JavaScript
7
star
28

cremp

CREMP: Conformer-Rotamer Ensembles of Macrocyclic Peptides for Machine Learning
Python
7
star
29

multiGroupVI

Jupyter Notebook
6
star
30

fishtones-js

JavaScript
5
star
31

Islander

Jupyter Notebook
5
star
32

battery

Architecture framework for Shiny Apps
R
4
star
33

FacileDataSet

A fluent API for accessing multi-assay high-throughput genomics data.
R
4
star
34

bandwidth-graph-generation

Python
4
star
35

data-detective

Python
4
star
36

contrastive-ops

Jupyter Notebook
3
star
37

g_ani

Python
3
star
38

Isosceles_Paper

HTML
2
star
39

pviz-app-vep

JavaScript
2
star
40

pviz-app-proteomics-3d

JavaScript
2
star
41

topological_PRM_for_lungHRCT

Lung image analysis pipeline for topological PRM maps
Python
2
star
42

phase1b

R
2
star
43

spex_frontend

JavaScript
1
star
44

genentech.github.io

HTML
1
star
45

ConMedClassify

Process clinical ADaM-format data and classify subjects into profiles based on their concomitant medication usage within a clinical trial.
R
1
star
46

cancer-sc-embed

Jupyter Notebook
1
star
47

spex_demo

PLpgSQL
1
star
48

epiregulon

Construct gene regulatory networks and infer transcription factor (TF) activity in single cells by integration of scATAC-seq and scRNA-seq data and incorporation of public bulk TF ChIP-seq data.
R
1
star
49

t_opt

Python
1
star
50

cdd_chem

Python
1
star