Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network
This repository contains the data and codes of the paper:
Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network (NIPS 2017) PDF
Updated Version
We have updated our reaction predictor model and improved the results (shown in the table 1). The new github repo link is here: https://github.com/connorcoley/rexgen_direct. We suggest you to use the updated repo because it is written in tensorflow version 1.3.0 (rather than 0.12.0).
Data
-
USPTO-15K/data.zip contains the train/dev/test split of USPTO-15K dataset used in our paper, borrowed from Coley et al. Each line in the file has two fields, separated by space:
-
Reaction smiles (products are not atom mapped)
-
Four types of reaction edits:
- Atoms lost Hydrogens
- Atoms obtained Hydrogens
- Deleted bonds
- Added bonds
The first two subfields contains a list of atom. The last two subfields includes a list of triples in the form of (atom1-atom2-bondtype). All atoms are indicated by their atom map numbers given in the reaction smiles. See the following table for all possible bond types:
Bond Type Bond Name 1.0 Single bond 2.0 Double bond 3.0 Triple bond 1.5 Aromatic bond -
-
USPTO/data.zip includes the train/dev/test split of USPTO dataset used in our paper. It has in total 480K fully atom mapped reactions. Each line in the file has two fields, separated by space:
- Reaction smiles (both reactants and products are atom mapped)
- Reaction center. That is, atom pairs whose bonds in between changed in the reaction. Atoms are represented by their atom map number given in the reaction smiles.
-
human/ directory contains the 80 reactions used in our human study.
Codes
Training and testing codes are in respective repositories:
- core-wln-global/ includes codes for reaction core identification (referred as global model in the paper).
- rank-wln/ includes codes for WLN + sum-pooling in the paper for candidate ranking
- rank-diff-wln/ includes codes for WL Difference Network (WLDN) for candidate ranking
Sample usage are provided in USPTO/README.md, which also applies to USPTO-15K. We used tensorflow-0.12.1 for development:
https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
Contributors
Wengong Jin ([email protected])
Connor W. Coley ([email protected])