• Stars
    star
    112
  • Rank 312,240 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created 10 months ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for ICLR 2024 (Spotlight) paper "MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding"

MAPE-PPI

MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding (Spotlight)

Lirong Wu, Yijun Tian, Yufei Huang, Siyuan Li, Haitao Lin, Nitesh V Chawla, Stan Z. Li. In ICLR, 2024.

Dependencies

conda env create -f environment.yml
conda activate MAPE-PPI

The default PyTorch version is 2.0.0 and cudatoolkit version is 11.7. They can be changed in environment.yml.

Dataset

Raw data of the three datasets (SHS27k, SHS148k, and STRING) can be downloaded from the Google Drive:

  • protein.STRING.sequences.dictionary.tsv Protein sequences of STRING
  • protein.actions.STRING.txt PPI network of STRING
  • STRING_AF2DB PDB files of protein structures predicted by AlphaFold2

Pre-process raw data to generate feature and adjacency matrices (also applicable to any new dataset):

python ./raw_data/data_process.py --dataset data_name

where data_name is one of the three datasets (SHS27k, SHS148k, and STRING).

For ease of use, we have pre-processed these three datasets and placed the processed data in Google Drive.

To use the processed data, please put them in `./data/processed_data/.

Usage

Pre-training and Inference on SHS27k/SHS148k/STRING

python -B train.py --dataset STRING --split_mode bfs

The hyperparameters customized for each dataset and data partitions are available in ./configs/param_config.json.

Pre-training on additional data, Inference on SHS27k/SHS148k/STRING

To pre-train with customized data (e.g., CATH or AlphaFoldDB datasets), please refer to the following steps:

(1) Download additional pre-training data (including their PDF files) from the official website.

(2) Pre-process pre-training PDB files as done in ./raw_data/data_process.py and transform into three files:

  • protein.nodes.pretrain_data.pt
  • protein.rball.edges.pretrain_data.npy
  • protein.knn.edges.pretrain_data.npy

where pretrain_data is the name of the additional pre-training dataset.

(3) Load pre-processed data and perform pretraining on it, running

python -B train.py --dataset STRING --split_mode bfs --pre_train pretrain_data

Loading the pre-trained model and Inference on SHS27k/SHS148k/STRING

We provide a pre-trained model in ./trained_model/ for PPI prediction on STRING. To use it, please run

python -B train.py --dataset STRING --split_mode bfs --ckpt_path ../trained_model/vae_model.ckpt

Citation

If you are interested in our repository and our paper, please cite the following paper:

@article{wu2024mape,
  title={MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding},
  author={Wu, Lirong and Tian, Yijun and Huang, Yufei and Li, Siyuan and Lin, Haitao and Chawla, Nitesh V and Li, Stan Z},
  journal={arXiv preprint arXiv:2402.14391},
  year={2024}
}
@inproceedings{
    wy2024mapeppi,
    title={MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding},
    author={Wu, Lirong and Tian, Yijun and Huang, Yufei and Li, Siyuan and Lin, Haitao and Chawla, Nitesh V and Li, Stan Z},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=itGkF993gz}
}

Feedback

If you have any issue about this work, please feel free to contact me by email:

More Repositories

1

awesome-graph-self-supervised-learning

Code for TKDE paper "Self-supervised learning on graphs: Contrastive, generative, or predictive"
1,345
star
2

awesome-protein-representation-learning

Awesome Protein Representation Learning
517
star
3

KDGA

Code for NeurIPS 2022 paper "Knowledge Distillation Improves Graph Structure Augmentation for Graph Neural Networks"
Python
196
star
4

GCML

Code for WACV 2022 paper "Generalized Clustering and Multi-Manifold Learning with Geometric Structure Preservation"
Python
18
star
5

GraphMixup

Code for ECML-PKDD 2022 paper "GraphMixup: Improving Class-Imbalanced Node Classification by Reinforcement Mixup and Self-supervised Context Prediction"
Python
16
star
6

FF-G2M

Code for AAAI 2023 (Oral) paper "Extracting Low-/High- Frequency Knowledge from Graph Neural Networks and Injecting it into MLPs: An Effective GNN-to-MLP Distillation Framework"
Python
16
star
7

KRD

Code for ICML 2023 paper "Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs"
Python
11
star
8

RFA-GNN

Code for TNNLS paper "Beyond Homophily and Homogeneity Assumption: Relation-based Frequency Adaptive Graph Neural Networks"
Python
9
star
9

MD-GNN

Code for NCAA paper "Multi-level Disentanglement Graph Neural Network"
Python
8
star
10

PSC-CPI

Code for AAAI 2024 paper "PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction"
Python
8
star
11

DCV

Code for TNNLS paper "Deep Clustering and Visualization for End-to-End High Dimensional Data analysis"
Python
7
star
12

Homophily-Enhanced-Self-supervision

Code for TNNLS paper "Homophily-Enhanced Self-supervision for Graph Structure Learning: Insights and Directions"
Python
7
star
13

L2A

Code for ECML-PKDD 2023 paper "Learning to Augment Graph Structure for both Homophily and Heterophily Graphs"
Python
4
star
14

LirongWu

1
star
15

lirongwu.github.io

HTML
1
star
16

TGS

Code for TKDE paper "A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation"
Python
1
star