• Stars
    star
    482
  • Rank 91,198 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for 10.1021/acscentsci.7b00572, now running on Keras 2.0 and Tensorflow

chemical VAE

This repository contains the framework and code for constructing a variational autoencoder (VAE) for use with molecular SMILES, as described in doi:10.1021/acscentsci.7b00572, with preprint at https://arxiv.org/pdf/1610.02415.pdf.

In short, molecular SMILES are encoded into a code vector representation, and can be decoded from the code representation back to molecular SMILES. The autoencoder may also be jointly trained with property prediction to help shape the latent space. The new latent space can then be optimized upon to find the molecules with the most optimized properties of interest.

In our example, we perform encoding/decoding with the ZINC dataset, and shape the latent space on prediction on logP, QED, and SAS properties.

Upcoming updates:

  • Updated Docker environment
  • Improved tutorial

Questions, problems?

Make a github issue πŸ˜„. Please be as clear and descriptive as possible.

How to install

Requirements:

An Anaconda python environment is recommend. Check the environment.yml file, but primarily:

  • Python >= 3.5
  • Keras >= 2.0.0 && <= 2.0.7
  • Tensorflow == 1.1
  • RDKit
  • Numpy

Jupyter notebook is required to run the ipynb examples. Make sure that the Keras backend is set to use Tensorflow

via Anaconda (recommended way)

Create a conda enviroment:

conda env create -f environment.yml
source activate chemvae
python setup.py install

via pip

Assuming you have all the requirements:

pip install git+https://github.com/aspuru-guzik-group/chemical_vae.git

Example: ZINC dataset

This repository contains an example of how to run the autoencoder on the zinc dataset.

First, take a look at the zinc directory. Parameters are set in the following jsons

  • exp.json - Sets parameters for location of data, global experimental parameters number of epochs to run, properties to predict etc.

For a full description of all the parameters, see hyperparameters.py ; parameters set in exp.json will overwrite parameters in hyperparameters.py, and parameters set in params.json will overwrite parameters in both exp.json and hyperparameters.py

Once you have set the parameters, run the autoencoder using the command from directory with exp.json:

python -m chemvae.train_vae

(Make sure you copy examples directories to not overwrite the trained weights (*.h5))

Components

train_vae.py : main script for training variational autoencoder Accepts arguments -d ... Example of how to run (with example directory here)

  • models.py - Library of models, contains the encoder, decoder and property prediction models.
  • tgru_k2_gpu.py - Custom keras layer containing custom teacher forcing/sampling
  • sampled_rnn_tf.py - Custom rnn function for tgru_k2_gpu.py, written in tensorflow backend.
  • hyperparameters.py - Some default parameter settings for the autoencoder
  • mol_utils.py - library for parsing SMILES into one-hot encoding and vice versa
  • mol_callbacks.py - library containing callbacks used by train_vae.py
    • Includes Weight_Annealer callback, which is used to update the weight of the KL loss component
  • vae_utils.py - utility functions for an autoencoder object, used post processing.

Authors:

This software is written by Jennifer Wei, Benjamin Sanchez-Lengeling, Dennis Sheberla, Rafael Gomez-Bomberelli, and Alan Aspuru-Guzik ([email protected]). It is based on the work published in https://arxiv.org/pdf/1610.02415.pdf by

Feel free to reach out to us with any questions!

Funding acknowledgements

"This work was supported by the Computational Chemical Sciences Program funded by the U.S.Department of Energy, Office of Science, Basic Energy Sciences, under Award #DE- FG02-17ER16362"

More Repositories

1

selfies

Robust representation of semantically constrained graphs, in particular for molecules in chemistry
Python
655
star
2

ORGANIC

Code repo for optimizing distributions of molecules.
Jupyter Notebook
130
star
3

stoned-selfies

This repository contains code for the paper: Beyond Generative Models: Superfast Traversal, Optimization, Novelty, Exploration and Discovery (STONED) Algorithm for Molecules using SELFIES
Jupyter Notebook
119
star
4

GA

Code for the paper: Augmenting genetic algorithms with deep neural networks for exploring the chemical space
Python
94
star
5

phoenics

Phoenics: Bayesian optimization for efficient experiment planning
Python
88
star
6

olympus

Olympus: a benchmarking framework for noisy optimization and experiment planning
Jupyter Notebook
81
star
7

JANUS

Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"
Python
75
star
8

Tartarus

A Benchmarking Platform for Realistic And Practical Inverse Molecular Design
Python
68
star
9

ChemOS

Python
60
star
10

gryffin

Python
51
star
11

group-selfies

Jupyter Notebook
50
star
12

qtorch

qTorch (Quantum Tensor Contraction Handler) https://arxiv.org/abs/1709.03636 -> for quantum simulation using tensor networks
C
48
star
13

DiffiQult

A fully autodifferentiable and variational HF
Python
41
star
14

funsies

funsies is a lightweight workflow engine πŸ”§
Python
40
star
15

gpHSP

Code to build a probabilistic predictive model for HSP
Jupyter Notebook
35
star
16

atlas

A brain for self-driving laboratories
Python
25
star
17

Theseus

Conceptual understanding through efficient inverse-design of quantum optical experiments
25
star
18

SCILLA

Automated discovery of superconducting circuits
Python
25
star
19

Pasithea

Deep Molecular Dreaming
Python
23
star
20

Computer-vision-for-the-chemistry-lab

Use convolutional neural net to detect segment and classify material phases and vessels in chemistry lab and other setting involving materials in mostly transparent vessels
Python
23
star
21

assessing_mol_prediction_confidence

https://arxiv.org/abs/2102.11439
20
star
22

QNODE

Quantum dynamics latent neural ode
Python
19
star
23

xtb-gaussian

A wrapper to run xtb inside Gaussian.
Perl
19
star
24

dionysus

For analysis of calibration, performance, and generalizability of probabilistic models on small molecular datasets. Paper on RSC Digital Discovery: https://pubs.rsc.org/en/content/articlehtml/2023/dd/d2dd00146b
Python
18
star
25

golem

Golem: an algorithm for robust experiment and process optimization
Jupyter Notebook
16
star
26

selfies_tutorial

Jupyter Notebook
14
star
27

Beyond-Molecular-Structure-ML-for-OPV-Materials-Devices

Python
13
star
28

kraken

Code to compute electronic and steric features to create a database of ligands and their properties
Python
12
star
29

Meta-VQE

Meta-VQE data and examples repository
Jupyter Notebook
9
star
30

gemini

scalable multi-fidelity machine learning
Python
9
star
31

molar

Molar is a database management to make it easy to store experiment whether computational or not
Python
9
star
32

curiosity

Python
9
star
33

da_for_polymers

Augmenting Polymer Datasets via Iterative Rearrangement
Python
9
star
34

gp_redox_rxn

Code repo for redox potentials with GPs
Jupyter Notebook
8
star
35

long-acting-injectables

Code and results for Machine Learning Models to Accelerate the Design of Polymeric Long-Acting Injectables
Jupyter Notebook
8
star
36

cheapocrest

Conformer generation on the cheap.
Perl
8
star
37

acdc_laser

Python
8
star
38

iacta

Code for the paper "Automatic Discovery of Chemical Reactions Using Imposed Activation"
Python
7
star
39

chimera

Chimera: hierarchy-based multi-objective optimization
Python
6
star
40

gryffin-known-constraints

Results for Bayesian optimization with known experimental and design constraints for chemistry applications
Jupyter Notebook
6
star
41

Organic-molcules-with-inverted-gaps

Code and data for organic molecules with inverted singlet-triplet gaps.
6
star
42

atlas-unknown-constraints

Unknown constraints in Bayesian optimization benchmark with Atlas
Jupyter Notebook
5
star
43

routescore

For working on the RouteScore/subway maps project code.
Python
5
star
44

Artificial-Design-of-Organic-Emitters

Code and data for "Artificial Design of Organic Emitters via a Genetic Algorithm Enhanced by a Deep Neural Network".
Python
4
star
45

Semantic-segmentation-of-materials-and-vessels-in-chemistry-lab-using-FCN

Given an image find the region of vessels/container and the material inside it. Assign one or class per pixel using fully convolutional net (FCN)) for semantic segmentation.
Python
4
star
46

quantum-generative-models

Python
4
star
47

kreed

Code for Reflection-Equivariant Diffusion for 3D Structure Determination from Isotopologue Rotational Spectra in Natural Abundance
Jupyter Notebook
3
star
48

Instance-segmentation-of-images-of-materials-in-transparent-vessels-using-GES-net-

Hierarchical instance aware segmentation of materials in vessels in chemistry lab setting using generator evaluator selector net
Python
3
star
49

MERMES

Multimodal Reaction Mining pipeline for ElectroSynthesis: extract reaction information from figures
Python
3
star
50

QIPA

Jupyter Notebook
2
star
51

mission_control

MissionControl: a workflow library.
Python
2
star
52

waveflow

Boundary-conditioned normalizing flows for electronic structures.
Python
2
star
53

jobman

A library for managing job submissions.
Python
1
star
54

chemspyd

1
star
55

CompositeMS

Python
1
star
56

Rational-design-of-organic-molecules-with-inverted-gaps

Code and data for "Rational Design of Organic Molecules with Inverted Gaps between First Excited Singlet and Triplet".
1
star
57

electrode-polishing

Python
1
star
58

DELFI

Python
1
star