• This repository has been archived on 16/Mar/2023
  • Stars
    star
    132
  • Rank 272,637 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Molecular optimization by capturing chemist’s intuition using the Seq2Seq with attention and the Transformer

Please note: this repository is no longer being maintained.

Maturity level-0

Molecular Optimization by Capturing Chemist's Intuition Using Deep Neural Networks

Description

Implementation of the Seq2Seq with attention and the Transformer used in Molecular Optimization by Capturing Chemist's Intuition Using Deep Neural Networks. Given a molecule and desirable property changes, the goal is to generate molecules with desirable property changes. This problem can be viewed as a machine translation problem in natural language processing. Property changes are incorporated into input together with SMILES.

Alt text

Usage

Create environment

conda env create -f environment.yml
source activate molopt

1. Preprocess data

Encode property change, build vocabulary, and split data into train, validation and test. Outputs are saved in the same directory with input data path.

python preprocess.py --input-data-path data/chembl_02/mmp_prop.csv

2. Train model

Train the model and save results and logs to experiments/save_directory/; The model from each epoch is saved in experiments/save_directory/checkpoint/; The training loss, validation loss and validation accuracy are saved in experiments/save_directory/tensorboard/.

python train.py --data-path data/chembl_02 --save-directory train_transformer --model-choice transformer transformer

A pre-trained Transformer model can be found here.

3. Generate molecules

Use the model saved at a given epoch (e.g. 60) to generate molecules for the given test filename, and save the results to experiments/save_directory/test_file_name/evaluation_epoch/generated_molecules.csv. The three test sets used in our paper can be found in data/chembl_02/ as below,

  • Test-Original -> data/chembl_02/test.csv
  • Test-Molecule -> data/chembl_02/test_not_in_train.csv
  • Test-Property -> data/chembl_02/test_unseen_L-1_S01_C10_range.csv
python generate.py --model-choice transformer --data-path data/chembl_02 --test-file-name test --model-path experiments/train_transformer/checkpoint --save-directory evaluation_transformer --epoch 60
python generate.py --model-choice transformer --data-path data/chembl_02 --test-file-name test_not_in_train --model-path experiments/train_transformer/checkpoint --save-directory evaluation_transformer --epoch 60
python generate.py --model-choice transformer --data-path data/chembl_02 --test-file-name test_unseen_L-1_S01_C10_range --model-path experiments/train_transformer/checkpoint --save-directory evaluation_transformer --epoch 60

4. Compute properties for generated molecules

Since we build the property prediction model based on the in-house experimental data, we can't make it public. But the computed properties can be found in experiments/evaluation_transformer/test_file_name/evaluation_60/generated_molecules_prop.csv

5.Evaluate the generated molecules in term of satisfying the desirable properties and draw molecules

python evaluate.py --data-path experiments/evaluation_transformer/test/evaluation_60/generated_molecules_prop.csv
python evaluate.py --data-path experiments/evaluation_transformer/test_not_in_train/evaluation_60/generated_molecules_prop.csv
python evaluate.py --data-path experiments/evaluation_transformer/test_unseen_L-1_S01_C10_range/evaluation_60/generated_molecules_prop.csv --range-evaluation lower

6. Matched molecular pair analysis between starting molecules and generated molecules

  • Download mmpdb for matched molecular pair generation
  • Parse the downloaded mmpdb path (i.e. path/mmpdb/) to --mmpdb-path of mmp_analysis.py

Between starting molecules and all the generated molecules

python mmp_analysis.py --data-path experiments/evaluation_transformer/test/evaluation_60/generated_molecules_prop.csv --train-path data/chembl_02/train.csv --mmpdb-path path/mmpdb/
python mmp_analysis.py --data-path experiments/evaluation_transformer/test_not_in_train/evaluation_60/generated_molecules_prop.csv --train-path data/chembl_02/train.csv --mmpdb-path path/mmpdb/
python mmp_analysis.py --data-path experiments/evaluation_transformer/test_unseen_L-1_S01_C10_range/evaluation_60/generated_molecules_prop.csv --train-path data/chembl_02/train.csv --mmpdb-path path/mmpdb/

Between starting molecules and all the generated molecules with desirable properties

python mmp_analysis.py --data-path experiments/evaluation_transformer/test/evaluation_60/generated_molecules_prop_statistics.csv --train-path data/chembl_02/train.csv --mmpdb-path path/mmpdb/ --only-desirable
python mmp_analysis.py --data-path experiments/evaluation_transformer/test_not_in_train/evaluation_60/generated_molecules_prop_statistics.csv --train-path data/chembl_02/train.csv --mmpdb-path path/mmpdb/ --only-desirable
python mmp_analysis.py --data-path experiments/evaluation_transformer/test_unseen_L-1_S01_C10_range/evaluation_60/generated_molecules_prop_statistics.csv --train-path data/chembl_02/train.csv --mmpdb-path path/mmpdb/ --only-desirable

License

The code is copyright 2020 by Jiazhen He and distributed under the Apache-2.0 license. See LICENSE for details.

More Repositories

1

aizynthfinder

A tool for retrosynthetic planning
Python
565
star
2

GraphINVENT

Graph neural networks for molecular design.
Python
361
star
3

Reinvent

Python
335
star
4

Chemformer

Python
202
star
5

REINVENT4

AI molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization.
Python
165
star
6

ReinventCommunity

Jupyter Notebook
148
star
7

DockStream

DockStream: A Docking Wrapper to Enhance De Novo Molecular Design
Python
91
star
8

QSARtuna

QSARtuna: QSAR model building with the optuna framework
Jupyter Notebook
77
star
9

PaRoutes

Home of the PaRoutes framework for benchmarking multi-step retrosynthesis predictions.
Python
62
star
10

reaction_utils

Utilities for working with datasets of chemical reactions, reaction templates and template extraction.
Python
62
star
11

pysmilesutils

Utilities for working with SMILES based encodings of molecules for deep learning (PyTorch oriented)
Python
56
star
12

Icolos

Icolos: A workflow manager for structure based post-processing of de novo generated small molecules
Python
53
star
13

Lib-INVENT

Jupyter Notebook
49
star
14

MolBART

Pretrained SMILES transformation model for finetuning for diverse molecular tasks.
Python
43
star
15

maize

A graph-based workflow manager for computational chemistry pipelines
Python
31
star
16

DockStreamCommunity

Jupyter Notebook
24
star
17

aizynthtrain

Tools to train synthesis prediction models
Python
21
star
18

reinvent-hitl

Code for paper "Human-in-the-Loop Assisted de Novo Molecular Design".
Python
21
star
19

route-distances

Tools and routines to calculate distances between synthesis routes and to cluster them.
Python
20
star
20

Deep-Drug-Coder

Python
17
star
21

Lib-INVENT-dataset

Python
15
star
22

SMILES-RL

Python
12
star
23

reinvent-scoring

Python
10
star
24

NonadditivityAnalysis

Notebook for standardization of actvity data, nonadditivity analysis and its evaluation.
Jupyter Notebook
9
star
25

Levenshtein

Levenshtein SMILES augmentation for reaction datasets
Python
8
star
26

Siamese-RNN-Self-Attention

Contains code for Siamese Recurrent Neural Network with Self-Attention for Bioactivity Prediction
Python
7
star
27

IcolosCommunity

Repository contains jupyter notebooks illustrating the use of the Icolos workflow manager
Jupyter Notebook
6
star
28

reaction-graph-link-prediction

Python
6
star
29

maize-contrib

Contributed and additional nodes for maize
Python
5
star
30

MMP_project

Code for paper
Jupyter Notebook
5
star
31

reinvent-chemistry

Python
5
star
32

molwall

MolWall: "Wall of molecules" interface to see and rate molecules
Python
4
star
33

reinvent-models

Python
4
star
34

reinforcement-learning-active-learning

Python
4
star
35

IcolosData

Contains the data required for the example workflows and jupyter notebooks utilizing the Icolos workflow manager
Rich Text Format
1
star
36

reinvent-scoring-gpflow

Code for paper "Human-in-the-Loop Assisted de Novo Molecular Design".
Python
1
star