• Stars
    star
    165
  • Rank 228,873 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 1 year ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

AI molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization.

REINVENT 4

Description

REINVENT is a molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design, molecule optimization, and other small molecule design tasks. At its heart, REINVENT uses a Reinforcement Learning (RL) algorithm to generate optimized molecules compliant with a user defined property profile defined as a multi-component score. Transfer Learning (TL) can be used to create or pre-train a model that generates molecules closer to a set of input molecules.

A preprint with more details is available on ChemRxiv: REINVENT4: Modern AI-Driven Generative Molecule Design. See AUTHORS.md for references to previous papers.

Requirements

REINVENT is being developed on and primarily for Linux and supports both GPU and CPU. The Linux version is fully validated. REINVENT runs on Windows with both GPU and CPU but this platform is mostly untested. MacOSX is only supported on the CPU.

The code is written in Python 3 (>= 3.10). The list of dependencies can be found in the repository (see also Installation below).

A GPU is not strictly necessary but strongly recommended for performance reasons especially for transfer learning/model training. It should be noted that reinforcement learning (RL) requires the computation of scores. Most scoring components run on the CPU thus a GPU is of less importance for RL depending on how much time is spent on the CPU.

Note that if no GPU is installed in your computer the code will run on the CPU automatically. REINVENT supports NVIDIA and also some AMD GPUs. For most design tasks a memory of about 8 GiB for both CPU main memory and GPU memory is sufficient.

Installation

  1. Clone this Git repository.
  2. Install a compatible version of Python, for example with Conda (other virtual environments like Docker, pyenv, or the system package manager would work too).
    conda create --name reinvent4 python=3.10
    conda activate reinvent4
  3. Change directory into the repository and install the dependencies from the lockfile:
    pip install -r requirements-linux-64.lock
  4. Optional: if you want to use AMD GPUs on Linux you would need to install the ROCm PyTorch version manually after installation of the dependencies in point 3, e.g.
    pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/rocm5.2
  5. Install the tool. The dependencies were already installed in the previous step, so there is no need to install them again (flag `--no-deps). If you want to install in editable mode (changes to the code are automatically picked up) add -e before the dot.
    pip install --no-deps . 
  6. Test the tool. The installer has added a script reinvent to your PATH.
    reinvent --help

Basic Usage

REINVENT is a command line tool and works principally as follows

reinvent -l sampling.log sampling.toml

This writes logging information to the file sampling.log. If you wish to write this to the screen, leave out the -l sampling.log part. sampling.toml is the configuration file. The main user format is TOML as it tends to be more use friendly. JSON can be used too, add -f json, but a specialised editor is recommended as the format is very sensitive to minor changes.

Sample configuration files for all run modes are located in config/toml of the repository and file paths therein would need to be adjusted to your local installation. In particular, ready made prior models are located in priors and you would choose a model and the appropriate run mode depending on the research problem you are trying to address. There is additional information in config/toml in several *.md files with instructions on how to configure the TOML file.

Tutorials / Jupyter notebooks

NOTE: these will be updated at a later time!

Updating dependencies

Update the lock files with pip-tools (please, do not edit the files manually):

pip-compile --extra-index-url=https://download.pytorch.org/whl/cu113 --extra-index-url=https://pypi.anaconda.org/OpenEye/simple --resolver=backtracking pyproject.toml

To update a single package, use pip-compile --upgrade-package somepackage (see the documentation for pip-tools).

Scoring Plugins

The scoring subsystem uses a simple plugin mechanism (Python native namespace packages). If you wish to write your own plugin, follow the instructions below. The public repository contains a contrib directory with some useful examples.

  1. Create /top/dir/somewhere/reinvent\_plugins/components where /top/dir/somewhere is a convenient location for you.
  2. Do not place a __init__.py in either reinvent_plugins or components as this would break the mechanism. It is fine to create normal packages within components as long as you import those correctly.
  3. Place a file whose name starts with comp_* into reinvent_plugins/components. Files with different names will be ignored i.e. not imported. The directory will be searched recursively so structure your code as needed but directory/package names must be unique.
  4. Tag the scoring component class(es) in that file with the @add_tag decorator. More than one component class can be added to the same comp_ file. See existing code.
  5. Tag at most one dataclass as parameter in the same file, see existing code. This is optional.
  6. There is no need to touch any of the REINVENT code.
  7. Set or add /top/dir/somewhere to the PYTHONPATH environment variable or use any other mechanism to extend sys.path.
  8. The scoring component should now be automatically picked up by REINVENT.

Unit and Integration Tests

This is primarily for developers and admins/users who wish to ensure that the installation principally works. The information here is not relevant to the practical use of REINVENT. Please refer to Basic Usage for instructions on how to use the reinvent command.

The REINVENT project uses the pytest framework for its tests. Before you run them you first have to create a configuration file which the tests will use.

In the project directory, create a config.json file in the configs/ directory. You can use the example config example.config.json as a base. Make sure that you set MAIN_TEST_PATH to a non-existent directory. That is where temporary files will be written during the tests. If it is set to an existing directory, that directory will be removed once the tests have finished.

Some tests require a proprietary OpenEye license. You have to set up a few things to make the tests read your license. The simple way is to just set the OE_LICENSE environment variable to the path of the file containing the license.

Once you have a configuration and your license can be read, you can run the tests.

$ pytest tests

More Repositories

1

aizynthfinder

A tool for retrosynthetic planning
Python
565
star
2

GraphINVENT

Graph neural networks for molecular design.
Python
361
star
3

Reinvent

Python
337
star
4

Chemformer

Python
202
star
5

ReinventCommunity

Jupyter Notebook
151
star
6

deep-molecular-optimization

Molecular optimization by capturing chemist’s intuition using the Seq2Seq with attention and the Transformer
Python
132
star
7

QSARtuna

QSARtuna: QSAR model building with the optuna framework
Jupyter Notebook
94
star
8

DockStream

DockStream: A Docking Wrapper to Enhance De Novo Molecular Design
Python
91
star
9

PaRoutes

Home of the PaRoutes framework for benchmarking multi-step retrosynthesis predictions.
Python
62
star
10

reaction_utils

Utilities for working with datasets of chemical reactions, reaction templates and template extraction.
Python
62
star
11

pysmilesutils

Utilities for working with SMILES based encodings of molecules for deep learning (PyTorch oriented)
Python
56
star
12

Icolos

Icolos: A workflow manager for structure based post-processing of de novo generated small molecules
Python
53
star
13

Lib-INVENT

Jupyter Notebook
49
star
14

MolBART

Pretrained SMILES transformation model for finetuning for diverse molecular tasks.
Python
44
star
15

maize

A graph-based workflow manager for computational chemistry pipelines
Python
31
star
16

DockStreamCommunity

Jupyter Notebook
24
star
17

aizynthtrain

Tools to train synthesis prediction models
Python
21
star
18

reinvent-hitl

Code for paper "Human-in-the-Loop Assisted de Novo Molecular Design".
Python
21
star
19

route-distances

Tools and routines to calculate distances between synthesis routes and to cluster them.
Python
20
star
20

Deep-Drug-Coder

Python
17
star
21

Lib-INVENT-dataset

Python
15
star
22

SMILES-RL

Python
12
star
23

NonadditivityAnalysis

Notebook for standardization of actvity data, nonadditivity analysis and its evaluation.
Jupyter Notebook
10
star
24

reinvent-scoring

Python
10
star
25

Levenshtein

Levenshtein SMILES augmentation for reaction datasets
Python
8
star
26

reaction-graph-link-prediction

Python
8
star
27

Siamese-RNN-Self-Attention

Contains code for Siamese Recurrent Neural Network with Self-Attention for Bioactivity Prediction
Python
7
star
28

IcolosCommunity

Repository contains jupyter notebooks illustrating the use of the Icolos workflow manager
Jupyter Notebook
6
star
29

reinforcement-learning-active-learning

Python
6
star
30

maize-contrib

Contributed and additional nodes for maize
Python
5
star
31

MMP_project

Code for paper
Jupyter Notebook
5
star
32

reinvent-chemistry

Python
5
star
33

molwall

MolWall: "Wall of molecules" interface to see and rate molecules
Python
4
star
34

reinvent-models

Python
4
star
35

IcolosData

Contains the data required for the example workflows and jupyter notebooks utilizing the Icolos workflow manager
Rich Text Format
1
star
36

reinvent-scoring-gpflow

Code for paper "Human-in-the-Loop Assisted de Novo Molecular Design".
Python
1
star