• Stars
    star
    153
  • Rank 238,549 (Top 5 %)
  • Language
    Python
  • Created almost 9 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A project to enable optimization of molecules by transforming them to and from a continuous representation.

Molecular Autoencoder

This is the code used for the paper:

Automatic chemical design using a data-driven continuous representation of molecules

Abstract: We develop a molecular autoencoder, which converts discrete representations of molecules to and from a vector representation. This allows efficient gradient-based optimization through open-ended spaces of chemical compounds. Continuous representations also allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as interpolating between molecules.

By

bibtex file | slides

Notes

This code requires a fork of Keras that forked from the dev version around approximately version 0.3.2 and Theano > 0.8.2. (Recently, to test on OS X 10.12.2, we are running Theano 0.9.0 dev4) We want to point you to the work of Max Hodak who re-implemented this tool based on the paper. For beginning your own project, you may have greater success starting there. https://github.com/maxhodak/keras-molecules

To test the weights generated in the paper (limited to 5000 test SMILES)

    python sample_autoencoder.py \
        ../data/best_vae_model.json \
        ../data/best_vae_annealed_weights.h5 \
        ../data/250k_rndm_zinc_drugs_clean.smi \
        ../data/zinc_char_list.json \
        -l5000

Which should result is something close to this (values will range from random selection of 5000 samples from test file)

    Using Theano backend.
    ('Training set size is', 5000)
    Training set size is 5000, after filtering to max length of 120
    ('total chars:', 35)
    Loss: 0.834809958935, Accuracy: 0.948206666667

To train a new model (limit of 5000 training SMILES)

    python train_autoencoder.py \
        ../data/250k_rndm_zinc_drugs_clean.smi \
        ../data/zinc_char_list.json \
        -l5000

More Repositories

1

autograd

Efficiently computes derivatives of numpy code.
Python
6,844
star
2

Spearmint

Spearmint Bayesian optimization codebase
Python
1,537
star
3

neural-fingerprint

Convolutional nets which can take molecular graphs of arbitrary size as input.
TeX
486
star
4

hypergrad

Exploring differentiation with respect to hyperparameters
Python
293
star
5

Kayak

Kayak is a library for automatic differentiation with applications to deep neural networks.
Python
225
star
6

Probabilistic-Backpropagation

Implementation in C and Theano of the method Probabilistic Backpropagation for scalable Bayesian inference in deep neural networks.
C
188
star
7

pgmult

Dependent multinomials made easy: stick-breaking with the Pólya-gamma augmentation
Python
59
star
8

author-roulette

LaTeX package for randomizing author order based on a public seed.
TeX
40
star
9

hips-lib

Library of common tools for machine learning research.
Python
40
star
10

maxwells-daemon

Fastidious accounting of entropy streams into and out of optimization and sampling algorithms.
TeX
31
star
11

firefly-monte-carlo

Implementation of an algorithm for Markov chain Monte Carlo with data subsampling
Python
27
star
12

DESI-MCMC

MCMC for the Dark Energy Spectroscopic Instrument
Jupyter Notebook
12
star
13

BayesianStructuredSparsity

Code for performing Bayesian regression with structured sparsity from a Gaussian field.
8
star
14

gpu_numpy

A Numpy wrapper that adds a gpufloat32 dtype to Numpy.
Python
6
star
15

autopaint

Gradient-based variational autoencoders to generate class-conditional natural images.
Python
5
star
16

optofit

A python framework for fitting biophysical models to optically recorded neural signals.
Python
4
star
17

trusty_scribe_viewer

Website for viewing a git repo as a lab notebook. Figures and text files can be included with markdown-like syntax.
Python
3
star
18

lpickle

Linefeed-delimited pickle for Unix-style piping of arbitrary Python data
Python
2
star
19

Matrical

A simple abstraction layer for matrix computations in Python, making it easy to switch between CPU and NVIDIA or Intel coprocessors.
2
star