• Stars
    star
    209
  • Rank 188,280 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Joint sequence and structure generation with RoseTTAFold sequence space diffusion

ProteinGenerator: Generate sequence-structure pairs with RoseTTAFold

Getting Started

The easiest way to get started is with PROTEIN GENERATOR a HuggingFace space where you can play around with the model!

And checkout the preprint now live on bioRxiv

Before running inference you will need to set up a custom conda environment.

Start by creating a new conda environment using the environment.yml file provided in the repository conda env create -f environment.yml and activating it source activate proteingenerator. Please make sure to modify the CUDA version and dgl version accordingly. Please refer to the dgl website for more information.

Once everything has been installed you can download checkpoints:

The easiest way to get started is opening the protein_generator.ipynb notebook and running the sampler class interactively, when ready to submit a production run use the output agrs.json file to launch:

python ./inference.py -input_json ./examples/out/design_000000_args.json

* note that to get the notebook running you will need to add the custom conda environment as a jupyter kernel, see how to do this here

Check out the templates in the example folder to see how you can set up jobs for the various design strategies

Adding new sequence based potentials

To add a custom potential to guide the sequence diffusion process toward your desired space, you can add potentials into utils/potentials.py. At the top of the file a template class is provided with functions that are required to implement your potential. It can be helpful to look through the other potentials in this file to see examples of how to implement. At the bottom of the file is a dictionary mapping the name used in the potentials argument to the class name in file.

pic

About the model

ProteinGenerator is trained on the same dataset and uses the same architecture as RoseTTAFold. To train the model, a ground truth sequence is transformed into an Lx20 continuous space and gaussian noise is added to diffuse the sequence to the sampled timestep. To condition on structure and sequence, the structre for a motif is given and then corresponding sequence is denoised in the input. The rest of the structure is blackhole initialized. For each example the model is trained to predict Xo and losses are applied on the structure and sequence respectively. During training big T is set to 1000 steps, and a square root schedule is used to add noise.

Looking ahead

We are excited for the community to get involved writing new potentials and building out the codebase further!

Acknowledgements

We would like to thank Frank DiMaio and Minkyung Baek who developed RoseTTAFold which allowed us to build out this platform. Other acknowledgements for code and development please see the preprint.

More Repositories

1

RoseTTAFold

This package contains deep learning models and related scripts for RoseTTAFold
Python
1,767
star
2

RFdiffusion

Code for running RFdiffusion
Python
1,672
star
3

PyRosetta.notebooks

Jupyter Notebooks for learning the PyRosetta platform for biomolecular structure prediction and design
Jupyter Notebook
353
star
4

binder

Binder, tool for automatic generation of Python bindings
C++
274
star
5

RFDesign

Protein hallucination and inpainting with RoseTTAFold
Python
239
star
6

rosetta

The Rosetta Bio-macromolecule modeling package.
C++
143
star
7

DeepAb

Deep learning models and structure realization scripts for the DeepAb antibody structure prediction method.
Python
142
star
8

trRosetta2

Repository for publicly available deep learning models developed in Rosetta community
Python
92
star
9

Rosetta-DL

A bundle of deep-learning packages for biomolecular structure prediction and design contributed to the Rosetta Commons
25
star
10

FvHallucinator

Python
25
star
11

rosetta_clone_tools

A public repository with scripts and tools for cloning and setting up RosettaCommons repositories.
Shell
22
star
12

pyrosetta_viewer3d

Display PackedPose objects, Pose objects, or PDB files within a Jupyter notebook and Google Colab
Python
8
star
13

pyrosettascripts_demo

Demonstration python-rosettascripts integration repo.
Jupyter Notebook
6
star
14

AbExpress

Antibody expressability assessment and optimization using LSTM
Python
4
star
15

bakerlab_dotfiles

Configuration files the way we like 'em
Python
3
star
16

pyrosetta_colab_setup

Files for our auxiliary pyrosetta_colab_setup PyPI package
Python
3
star
17

tensorflow_graphs

trRosetta model binaries for using the trRosetta protocol within Rosetta proper
PureBasic
2
star
18

privileged_residues

Python
1
star