• This repository has been archived on 11/Mar/2023
  • Stars
    star
    361
  • Rank 117,939 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created about 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Graph neural networks for molecular design.

Please note: this repository is no longer being maintained.

GraphINVENT

cover image

Description

GraphINVENT is a platform for graph-based molecular generation using graph neural networks. GraphINVENT uses a tiered deep neural network architecture to probabilistically generate new molecules a single bond at a time. All models implemented in GraphINVENT can quickly learn to build molecules resembling training set molecules without any explicit programming of chemical rules. The models have been benchmarked using the MOSES distribution-based metrics, showing how the best GraphINVENT model compares well with state-of-the-art generative models.

Updates

The following versions of GraphINVENT exist in this repository:

  • v1.0 (and all commits up to here) is the label corresponding to the "original" version, and corresponds with the publications below.
  • v2.0 is an outdated version, created March 10, 2021.
  • v3.0 is the latest version, created August 20, 2021.

20-08-2021:

Large update:

  • Added a reinforcement learning framework to allow for fine-tuning models. Fine-tuning jobs can now be run using the --job-type "fine-tune" flag.
  • An example submission script for fine-tuning jobs was added (submit-fine-tuning.py), and the old example submission script was renamed (submit.py --> submit-pre-training.py).
  • Note: the tutorials have not yet been updated to reflect the changes, this will be done soon but for now be aware that there may be small discrepancies between what is written in the tutorial and the actual instructions. I will delete this bulletpoint when I have updated the tutorials.

26-03-2021:

Small update:

  • Pre-trained models created with GraphINVENT v1.0 can now be used with GraphINVENT v2.0.

10-03-2021:

The biggest changes in v2.0 from v1.0 are summarized below:

  • Data preprocessing was updated for readibility (now done in DataProcesser.py).
  • Graph generation was updated for readibility (now done in Generator.py), as well as some bugs related to how implicit Hs and chirality were handled on the GPU (not used before, despite being available for preprocessing/training).
  • Data analysis code was updated for readibility (now done in Analyzer.py).
  • The learning rate decay scheme was changed from a custom learning rate scheduler to the OneCycle scheduler (so far, it appears to be working well enough, and with a reduced set of parameters).
  • The code now runs using the latest version of PyTorch (1.8.0); the previous version was running using PyTorch 1.3. The environment has correspondingly been updated (and renamed "GraphINVENT-env" -> "graphinvent").
  • Redundant hyperparameters were removed; additionally, hyperparameters seen not to improve things were removed from defaults.py, such as the optimizer weight decay (now just 0.0) and weights initialization (fixed to Xavier uniform now).
  • Some old functions, such as models.py and loss.py were consolidated into Workflow.py.
  • A validation loss calculation was added to keep track of model training.

Additionally, minor typos and bugs were corrected, and the docstrings and error messages updated. Examples of minor bugs/changes:

  • Bug in how fraction properly terminated graphs (and fraction valid of properly terminated) was calculated (wrong function for data type, which led to errors in rare instances).
  • Errors in how analysis histograms were written to tensorboard; these were also of questionable utility so are now simply removed.
  • Some values (like the "NLL diff") were removed, as they were also not found to be useful.

If you spot any issues (big or small) since the update, please create an issue or a pull request (if you are able to fix it), and we will be happy to make changes.

Prerequisites

  • Anaconda or Miniconda with Python 3.6 or 3.8.
  • (for GPU-training only) CUDA-enabled GPU.

Instructions and tutorials

For detailed guides on how to use GraphINVENT, see the tutorials.

Examples

An example training set is available in ./data/gdb13_1K/. It is a small (1K) subset of GDB-13 and is already preprocessed.

Contributors

@rociomer

@rastemo

@edvardlindelof

@sararromeo

@JuanViguera

@psolsson

Contributions

Contributions are welcome in the form of issues or pull requests. To report a bug, please submit an issue. Thank you to everyone who has used the code and provided feedback thus far.

References

Relevant publications

If you use GraphINVENT in your research, please reference our publication.

Additional details related to the development of GraphINVENT are available in our technical note. You might find this note useful if you're interested in either exploring different hyperparameters or developing your own generative models.

The references in BibTex format are available below:

@article{mercado2020graph,
  author = "Rocío Mercado and Tobias Rastemo and Edvard Lindelöf and Günter Klambauer and Ola Engkvist and Hongming Chen and Esben Jannik Bjerrum",
  title = "{Graph Networks for Molecular Design}",
  journal = {Machine Learning: Science and Technology},
  year = {2020},
  publisher = {IOP Publishing},
  doi = "10.1088/2632-2153/abcf91"
}

@article{mercado2020practical,
  author = "Rocío Mercado and Tobias Rastemo and Edvard Lindelöf and Günter Klambauer and Ola Engkvist and Hongming Chen and Esben Jannik Bjerrum",
  title = "{Practical Notes on Building Molecular Graph Generative Models}",
  journal = {Applied AI Letters},
  year = {2020},
  publisher = {Wiley Online Library},
  doi = "10.1002/ail2.18"
}

Related work

MPNNs

The MPNN implementations used in this work were pulled from Edvard Lindelöf's repo in October 2018, while he was a masters student in the MAI group. This work is available at

https://github.com/edvardlindelof/graph-neural-networks-for-drug-discovery.

His master's thesis, describing the EMN implementation, can be found at

https://odr.chalmers.se/handle/20.500.12380/256629.

MOSES

The MOSES repo is available at https://github.com/molecularsets/moses.

GDB-13

The example dataset provided is a subset of GDB-13. This was obtained by randomly sampling 1000 structures from the entire GDB-13 dataset. The full dataset is available for download at http://gdb.unibe.ch/downloads/.

RL-GraphINVENT

Version 3.0 incorporates Sara's work into the latest GraphINVENT framework: repo and paper. Her work was presented at the RL4RealLife workshop at ICML 2021.

Exploring graph traversal algorithms in GraphINVENT

In this pre-print, we look into the effect of different graph traversal algorithms on the types of structures that are generated by GraphINVENT. We find that a BFS generally leads to better molecules than a DFS, unless the model is overtrained, at which point both graph traversal algorithms lead to indistinguishible sets of structures.

License

GraphINVENT is licensed under the MIT license and is free and provided as-is.

Link

https://github.com/MolecularAI/GraphINVENT/

More Repositories

1

aizynthfinder

A tool for retrosynthetic planning
Python
565
star
2

Reinvent

Python
337
star
3

Chemformer

Python
202
star
4

REINVENT4

AI molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization.
Python
165
star
5

ReinventCommunity

Jupyter Notebook
151
star
6

deep-molecular-optimization

Molecular optimization by capturing chemist’s intuition using the Seq2Seq with attention and the Transformer
Python
132
star
7

QSARtuna

QSARtuna: QSAR model building with the optuna framework
Jupyter Notebook
94
star
8

DockStream

DockStream: A Docking Wrapper to Enhance De Novo Molecular Design
Python
91
star
9

PaRoutes

Home of the PaRoutes framework for benchmarking multi-step retrosynthesis predictions.
Python
62
star
10

reaction_utils

Utilities for working with datasets of chemical reactions, reaction templates and template extraction.
Python
62
star
11

pysmilesutils

Utilities for working with SMILES based encodings of molecules for deep learning (PyTorch oriented)
Python
56
star
12

Icolos

Icolos: A workflow manager for structure based post-processing of de novo generated small molecules
Python
53
star
13

Lib-INVENT

Jupyter Notebook
49
star
14

MolBART

Pretrained SMILES transformation model for finetuning for diverse molecular tasks.
Python
44
star
15

maize

A graph-based workflow manager for computational chemistry pipelines
Python
31
star
16

DockStreamCommunity

Jupyter Notebook
24
star
17

aizynthtrain

Tools to train synthesis prediction models
Python
21
star
18

reinvent-hitl

Code for paper "Human-in-the-Loop Assisted de Novo Molecular Design".
Python
21
star
19

route-distances

Tools and routines to calculate distances between synthesis routes and to cluster them.
Python
20
star
20

Deep-Drug-Coder

Python
17
star
21

Lib-INVENT-dataset

Python
15
star
22

SMILES-RL

Python
12
star
23

NonadditivityAnalysis

Notebook for standardization of actvity data, nonadditivity analysis and its evaluation.
Jupyter Notebook
10
star
24

reinvent-scoring

Python
10
star
25

Levenshtein

Levenshtein SMILES augmentation for reaction datasets
Python
8
star
26

reaction-graph-link-prediction

Python
8
star
27

Siamese-RNN-Self-Attention

Contains code for Siamese Recurrent Neural Network with Self-Attention for Bioactivity Prediction
Python
7
star
28

IcolosCommunity

Repository contains jupyter notebooks illustrating the use of the Icolos workflow manager
Jupyter Notebook
6
star
29

reinforcement-learning-active-learning

Python
6
star
30

maize-contrib

Contributed and additional nodes for maize
Python
5
star
31

MMP_project

Code for paper
Jupyter Notebook
5
star
32

reinvent-chemistry

Python
5
star
33

molwall

MolWall: "Wall of molecules" interface to see and rate molecules
Python
4
star
34

reinvent-models

Python
4
star
35

IcolosData

Contains the data required for the example workflows and jupyter notebooks utilizing the Icolos workflow manager
Rich Text Format
1
star
36

reinvent-scoring-gpflow

Code for paper "Human-in-the-Loop Assisted de Novo Molecular Design".
Python
1
star