• Stars
    star
    383
  • Rank 111,995 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 5 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An implementation of the DeepMind's AlphaFold based on PyTorch for research

AlphaFold - PyTorch

This project provides an implementation of the DeepMind's AlphaFold based on PyTorch for research, also includes the converted model weights and inputs. Note that this code can also works well on the original .ckpt format model weights and .tfrec format inputs.

The original DeepMind's implementation is based on TensorFlow, related publication paper is Senior, A.W., Evans, R., Jumper, J. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020). [code] [paper]

Comparison

I calculated the differences of the final output distogram probs between the PyTorch version and original TensorFlow version. The distogram probs is full set of distance distribution predictions constructed by combining such predictions that covers the entire distance map, which size is LxLx64. Take target T1019s2 for example, the error of distogram probs (88x88x64) between these two results is 0.467 per channel. The picture above is the result of T1019s2.

Despite the speed is around 7 times slower than the TensorFlow version, I still recommend this project for you because I refactored the entire code and provided a simplest way for you to understand AlphaFold. In addition, contrast to TensorFlow, PyTorch is imperative and you can simply throw in a pdb breakpoint anywhere into your model.

Usage

Dependencies

  • Python 3.6+
  • PyTorch 1.3+
  • TensorFlow 2.0+ (This is optional if you want to load original .ckpt format model weights and .tfrec format inputs)

Run example prediction

You can use the alphafold.sh script to run the entire Distogram prediction system.

./alphafold.sh

You can simply modify the keywords such as TARGET, TARGET_FILE in this file to run prediction for other targets.

Detailed script usage

> python alphafold.py -h
usage: alphafold.py [-h] -i INPUT [-o OUT] [-m MODEL] [-r REPLICA]
                    [-t {D,B,T}] [-e]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        target protein, support both .pkl or .tfrec format
  -o OUT, --out OUT     output dir
  -m MODEL, --model MODEL
                        model dir
  -r REPLICA, --replica REPLICA
                        model replica
  -t {D,B,T}, --type {D,B,T}
                        model type: D - Distogram, B - Background, T - Torsion
  -e, --ensemble        ensembling all replica outputs

For example:

python alphafold.py -i test_data/T1019s2.pkl -o T1019s2_out -t D -r 0

This uses the replica 0 of Distogram models to predict the distogram probs of the input data.

It also supports tensorflow data input and model:

python alphafold.py -i test_data/T1019s2.tfrec -o T1019s2_out -m tf_model_path/

Data

Model weights

All converted model weights data can be downloaded from http://bit.ly/alphafold-model. The weights data are in a zip file which has about 210 MB, in the model folder which contains:

  • A directory 873731. This contains the weights for the distogram model.
  • A directory 916425. This contains the weights for the background distogram model.
  • A directory 941521. This contains the weights for the torsion model.

Each directory with model weights contains a number of different model configurations. Each model has a config file and associated weights. There is only one torsion model. Each model directory also contains a stats file that is used for feature normalization specific to that model.

The original TensorFlow model checkpoints can be downloaded from http://bit.ly/alphafold-casp13-weights.

Input data

For now the input data is .pkl, .npy or .tfrec format file which contains required features. The details of those feature generation can be found in the README of DeepMind's AlphaFold project.

For convenience, I provided a shell script feature.sh to generate those required features data from given target sequence (.seq file, fasta format). Before run this script, there are a few steps you need to start with:

  1. Setup PSI-BLAST from NCBI BLAST.
  2. Setup HHBlits from HH-suite3.
    # Installation HH-suite
    git clone https://github.com/soedinglab/hh-suite.git
    mkdir -p hh-suite/build && cd hh-suite/build
    cmake -DCMAKE_INSTALL_PREFIX=. ..
    make -j 4 && make install
    export PATH="$(pwd)/bin:$(pwd)/scripts:$PATH"
    # Download Databases
    cd ..; mkdir databases; cd databases
    wget http://wwwuser.gwdg.de/~compbiol/uniclust/2018_08/uniclust30_2018_08_hhsuite.tar.gz
    tar xzvf uniclust30_2018_08_hhsuite.tar.gz
  3. Setup plmDCA, which need Matlab or Octave to run this code. Here I provided a modified plmDCA.m file for Octave which can save the intermediate data that alphafold needs, but I haven't test it in matlab.
    git clone https://github.com/magnusekeberg/plmDCA.git
    mv plmDCA/plmDCA_asymmetric_v2 plmDCA/plmDCA
    cp plmDCA.m plmDCA/plmDCA_asymmetric_v2/
    # mex .c file, if you use matlab you need do this in matlab console
    cd plmDCA/plmDCA_asymmetric_v2/functions/; for i in *.c; do octave --eval "mex $i";done
    cd ../3rd_party_code/minFunc/; for i in *.c; do octave --eval "mex $i"; done
  4. In feature.sh set the following:
    • TARGET to the name of the target.
    • TARGET_DIR to the path to the directory with the target input data.
    • TARGET_SEQ to the path of the target input seq file.
    • PLMDCA_DIR to the path of plmDCA folder.

The example target T1019s2 input data, output results by two version AlphaFold for comparison and its generated features you get download from http://bit.ly/alphafold-T1019s2, which has about 210 MB.

The dataset to reproduce AlphaFold's CASP13 results can be downloaded from http://bit.ly/alphafold-casp13-data, which has about 43.5 GB.

Note that, profile_with_prior and profile_with_prior_without_gaps two features I can't figure it out, so it just be set to all zeros for now. Please let me know if you have any idea.

More Repositories

1

WeixinBot

网页版微信API,包含终端版微信及微信机器人
Python
7,212
star
2

iOSAppHook

专注于非越狱环境下iOS应用逆向研究,从dylib注入,应用重签名到App Hook
Swift
2,338
star
3

Books

无它术,唯勤读书而多为之,自工
1,553
star
4

LaTeX-PPT-Template

Seven awesome latex ppt templates for researchers or students.
TeX
363
star
5

SublimeCode

一个代码阅读应用 - iOS
Swift
192
star
6

browspy

浏览器用户全部信息收集js
JavaScript
163
star
7

ReinforcementLearning

Reinforcing Your Learning of Reinforcement Learning
Python
84
star
8

SomeCodes

代码临时聚集地
JavaScript
69
star
9

NeteaseLyric

网易云音乐歌曲的歌词分享图片生成脚本
Python
59
star
10

Lifeline_SilentNight

生命线:静夜 on Telegram & Terminal
Python
54
star
11

WriteTyper

復古打字機 — Mac OS X App
Swift
48
star
12

Iconista

Mac OS X 主题美化工具
Shell
46
star
13

Device-9

实时监测网速,IP,内存大小,温度等设备信息并显示在通知中心的 iOS App
Swift
31
star
14

ConfessionGuys

告白小人微信小程序
JavaScript
23
star
15

Psychic-meme

Chrome浏览器保存密码查看器
Python
20
star
16

Muzik.js

音乐可视化前端框架
JavaScript
18
star
17

weChat_php

自己微信公众号 Urinx 的后台
PHP
18
star
18

Vu

唯舞 - 街舞视频 iOS App
Swift
18
star
19

crack.nc.hust

华科校园网用户名密码破解脚本
Python
16
star
20

dict

terminal dictionary - mac
Swift
14
star
21

RNAWorld

A gym environment for the research which apply the reinforcement learning algorithm to the RNA structure prediction
Python
12
star
22

motionCapture.js

一个通过浏览器摄像头捕捉用户动作的前端JS插件
JavaScript
7
star
23

Moi

天気予報 - iOS Weather App
Objective-C
7
star
24

QuantumComputing

IBM Q Experience Documentation - Chinese Version
Python
4
star
25

xiaoU_TypeWriting

JavaScript
3
star
26

IntelligentBus

智能公交Chrome插件
JavaScript
3
star
27

captcha

Python
3
star
28

chinese_highlight

Python
3
star
29

Hardware_Software_Interface

The program assignments of the Hardware Software Interface on Coursera.
C
2
star
30

Project_Euler_Answers

Python
2
star
31

pokemonfetch

A command-line system information tool
Shell
2
star
32

Machine_Learning

Coursera-Andrew Ng-Stanford University-Machine Learning class
MATLAB
2
star
33

2dRNA-Fold

Learning RNA Folding Path based on Reinforcement Learning and Monte Carlo Tree Search
Python
2
star
34

RL_intro_code

Implementation for Reinforcement Learning: An Introduction
Python
2
star
35

scriptTools

Python
1
star
36

PyQt4.tutorial

Python
1
star
37

urinx.github.io

HTML
1
star