• Stars
    star
    125
  • Rank 286,335 (Top 6 %)
  • Language
    Python
  • License
    Other
  • Created almost 4 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

MolRep: A Deep Representation Learning Library for Molecular Property Prediction

MolRep: A Deep Representation Learning Library for Molecular Property Prediction

Summary

MolRep is a Python package for fairly measuring algorithmic progress on chemical property datasets. It currently provides a complete re-evaluation of 16 state-of-the-art deep representation models over 16 benchmark property datsaets.

architecture

If you found this package useful, please cite our papers: MolRep and Mol-XAI for now:

@article{rao2021molrep,
  title={MolRep: A Deep Representation Learning Library for Molecular Property Prediction},
  author={Rao, Jiahua and Zheng, Shuangjia and Song, Ying and Chen, Jianwen and Li, Chengtao and Xie, Jiancong and Yang, Hui and Chen, Hongming and Yang, Yuedong},
  journal={bioRxiv},
  year={2021},
  publisher={Cold Spring Harbor Laboratory}
}

@article{rao2021quantitative,
  title={Quantitative Evaluation of Explainable Graph Neural Networks for Molecular Property Prediction},
  author={Rao, Jiahua and Zheng, Shuangjia and Yang, Yuedong},
  journal={arXiv preprint arXiv:2107.04119},
  year={2021}
}

Install & Usage

We provide a script to install the environment. You will need the conda package manager, which can be installed from here.

To install the required packages, follow there instructions (tested on a linux terminal):

  1. clone the repository

    git clone https://github.com/biomed-AI/MolRep

  2. cd into the cloned directory

    cd MolRep

  3. run the install script

    source install.sh

Where <your_conda_path> is your conda path, and <CUDA_VERSION> is an optional argument that can be either cpu, cu92, cu100, cu101, cu110. If you do not provide a cuda version, the script will default to cu110. The script will create a virtual environment named MolRep, with all the required packages needed to run our code. Important: do NOT run this command using bash instead of source!

Data

Data (including Explainable Dataset) could be download from Google_Driver

[!NEWS] The human experiments fro explainability task (molecules and results) are available at Here

Current Dataset

Dataset Task Task type #Molecule Splits Metric Reference
QM7 1 Regression 7160 Stratified MAE Wu et al.
QM8 12 Regression 21786 Random MAE Wu et al.
QM9 12 Regression 133885 Random MAE Wu et al.
ESOL 1 Regression 1128 Random RMSE Wu et al.
FreeSolv 1 Regression 642 Random RMSE Wu et al.
Lipophilicity 1 Regression 4200 Random RMSE Wu et al.
BBBP 1 Classification 2039 Scaffold ROC-AUC Wu et al.
Tox21 12 Classification 7831 Random ROC-AUC Wu et al.
SIDER 27 Classification 1427 Random ROC-AUC Wu et al.
ClinTox 2 Classification 1478 Random ROC-AUC Wu et al.
Liver injury 1 Classification 2788 Random ROC-AUC Xu et al.
Mutagenesis 1 Classification 6511 Random ROC-AUC Hansen et al.
hERG 1 Classification 4813 Random ROC-AUC Li et al.
MUV 17 Classification 93087 Random PRC-AUC Wu et al.
HIV 1 Classification 41127 Random ROC-AUC Wu et al.
BACE 1 Classification 1513 Random ROC-AUC Wu et al.

Methods

Current Methods

Self-/unsupervised Models

Methods Descriptions Reference
Mol2Vec Mol2Vec is an unsupervised approach to learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Jaeger et al.
N-Gram graph N-gram graph is a simple unsupervised representation for molecules that first embeds the vertices in the molecule graph and then constructs a compact representation for the graph by assembling the ver-tex embeddings in short walks in the graph. Liu et al.
FP2Vec FP2Vec is a molecular featurizer that represents a chemical compound as a set of trainable embedding vectors and combine with CNN model. Jeon et al.
VAE VAE is a framework for training two neural networks (encoder and decoder) to learn a mapping from high-dimensional molecular representation into a lower-dimensional space. Kingma et al.

Sequence Models

Methods Descriptions Reference
BiLSTM BiLSTM is an artificial recurrent neural network (RNN) architecture to encoding sequences from compound SMILES strings. Hochreiter et al.
SALSTM SALSTM is a self-attention mechanism with improved BiLSTM for molecule representation. Zheng et al
Transformer Transformer is a network based solely on attention mechanisms and dispensing with recurrence and convolutions entirely to encodes compound SMILES strings. Vaswani et al.
MAT MAT is a molecule attention transformer utilized inter-atomic distances and the molecular graph structure to augment the attention mechanism. Maziarka et al.

Graph Models

Methods Descriptions Reference
DGCNN DGCNN is a deep graph convolutional neural network that proposes a graph convolution model with SortPooling layer which sorts graph vertices in a consistent order to learning the embedding of molec-ular graph. Zhang et al.
GraphSAGE GraphSAGE is a framework for inductive representation learning on molecular graphs that used to generate low-dimensional representations for atoms and performs sum, mean or max-pooling neigh-borhood aggregation to updates the atom representation and molecular representation. Hamilton et al.
GIN GIN is the Graph Isomorphism Network that builds upon the limitations of GraphSAGE to capture different graph structures with the Weisfeiler-Lehman graph isomorphism test. Xu et al.
ECC ECC is an Edge-Conditioned Convolution Network that learns a different parameter for each edge label (bond type) on the molecular graph, and neighbor aggregation is weighted according to specific edge parameters. Simonovsky et al.
DiffPool DiffPool combines a differentiable graph encoder with its an adaptive pooling mechanism that col-lapses nodes on the basis of a supervised criterion to learning the representation of molecular graphs. Ying et al.
MPNN MPNN is a message-passing graph neural network that learns the representation of compound molecular graph. It mainly focused on obtaining effective vertices (atoms) embedding Gilmer et al.
D-MPNN DMPNN is another message-passing graph neural network that messages associated with directed edges (bonds) rather than those with vertices. It can make use of the bond attributes. Yang et al.
CMPNN CMPNN is the graph neural network that improve the molecular graph embedding by strengthening the message interactions between edges (bonds) and nodes (atoms). Song et al.

Training

To train a model by K-fold, run 5-fold-training_example.ipynb.

Testing

To test a pretrained model, run testing-example.ipynb.

Explainable

To explain the GNN model, run Explainer_Experiments.py

More results will be updated soon.

More Repositories

1

GraphSite

GraphSite: protein-DNA binding site prediction using graph transformer and predicted protein structures
Python
56
star
2

PROTAC-RL

Python
56
star
3

GraphPPIS

Python
49
star
4

SPROF-GO

Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion
Python
36
star
5

Hist2ST

Jupyter Notebook
32
star
6

DiffDec

Python
28
star
7

STAMP-DPI

Python
27
star
8

MUSE

Python
24
star
9

DRlinker

Python
21
star
10

LMetalSite

LMetalSite: alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning
Python
18
star
11

GraphSCI

Imputing Single-cell RNA-seq data by combining Graph Convolution and Autoencoder Neural Networks
Jupyter Notebook
18
star
12

GraphEC

Python
17
star
13

GPSite

Geometry-aware protein binding site predictor
Python
16
star
14

GraphCS

Jupyter Notebook
14
star
15

nucleic-acid-binding

Python
12
star
16

CoSMIG

Communicative Subgraph Representation Learning for Multi-Relational Inductive Drug-Gene Interaction Prediction
Python
11
star
17

GraphBepi

Python
10
star
18

GraphSCC

single-cell RNA-seq clustering
Python
10
star
19

GraphEBM

Python
9
star
20

ConGI

Jupyter Notebook
9
star
21

PharmKG

Python
8
star
22

SANGO

The official implementation for "SANGO".
Jupyter Notebook
8
star
23

scAdapt

Python
8
star
24

TransEPI

Enhancer-promoter interaction model
Python
8
star
25

LMDisorder

Python
7
star
26

Meta-MO

Python
7
star
27

CellFM

Jupyter Notebook
7
star
28

ADClust

A parameter-free clustering method for single-cell data
Python
5
star
29

DeepMutSol

Python
4
star
30

DeepBayesianCox

Python
3
star
31

MTDsite

Predicting binding sites through multiple-task deep neural networks
2
star
32

CMPRY

Python
2
star
33

GP-nano

GP-nano: a geometric graph network for nanobody polyreactivity prediciton
Python
2
star
34

SCHAP

R
1
star
35

MSASC

A few codes of our method from method darn (github.com/junfengwen/DARN)
Python
1
star
36

MeHi-SCC

Hierarchical scRNA-seq clustering
Python
1
star
37

scFPN

Python
1
star
38

ML-GVP

The CAGI6-Sherloc model
Python
1
star
39

DrugVNN

Python
1
star
40

ProTact

Python
1
star
41

DeepEPI

Capturing large genomic contexts for accurately predicting enhancer-promoter interactions
Python
1
star