• Stars
    star
    279
  • Rank 147,059 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

RXNMapper: Unsupervised attention-guided atom-mapping. Code complementing our Science Advances publication on "Extraction of organic chemistry grammar from unsupervised learning of chemical reactions" (https://advances.sciencemag.org/content/7/15/eabe4166).

Extraction of organic chemistry grammar from unsupervised learning of chemical reactions

Enable robust atom mapping on valid reaction SMILES. The atom-mapping information was learned by an ALBERT model trained in an unsupervised fashion on a large dataset of chemical reactions.

Installation

From pip

conda create -n rxnmapper python=3.6 -y
conda activate rxnmapper
pip install rxnmapper

From github

You can install the package and setup the environment directly from github using:

git clone https://github.com/rxn4chemistry/rxnmapper.git 
cd rxnmapper
conda create -n rxnmapper python=3.6 -y
conda activate rxnmapper
pip install -e .

RDkit

In both installation settings above, the RDKit dependency is not installed automatically, unless you include the extra when installing: pip install "rxmapper[rdkit]". It can also be installed via Conda or Pypi:

# Install RDKit from Conda
conda install -c conda-forge rdkit

# Install RDKit from Pypi
pip install rdkit
# for Python<3.7
# pip install rdkit-pypi

Usage

Basic usage

from rxnmapper import RXNMapper
rxn_mapper = RXNMapper()
rxns = ['CC(C)S.CN(C)C=O.Fc1cccnc1F.O=C([O-])[O-].[K+].[K+]>>CC(C)Sc1ncccc1F', 'C1COCCO1.CC(C)(C)OC(=O)CONC(=O)NCc1cccc2ccccc12.Cl>>O=C(O)CONC(=O)NCc1cccc2ccccc12']
results = rxn_mapper.get_attention_guided_atom_maps(rxns)

The results contain the mapped reactions and confidence scores:

[{'mapped_rxn': 'CN(C)C=O.F[c:5]1[n:6][cH:7][cH:8][cH:9][c:10]1[F:11].O=C([O-])[O-].[CH3:1][CH:2]([CH3:3])[SH:4].[K+].[K+]>>[CH3:1][CH:2]([CH3:3])[S:4][c:5]1[n:6][cH:7][cH:8][cH:9][c:10]1[F:11]',
  'confidence': 0.9565619900376546},
 {'mapped_rxn': 'C1COCCO1.CC(C)(C)[O:3][C:2](=[O:1])[CH2:4][O:5][NH:6][C:7](=[O:8])[NH:9][CH2:10][c:11]1[cH:12][cH:13][cH:14][c:15]2[cH:16][cH:17][cH:18][cH:19][c:20]12.Cl>>[O:1]=[C:2]([OH:3])[CH2:4][O:5][NH:6][C:7](=[O:8])[NH:9][CH2:10][c:11]1[cH:12][cH:13][cH:14][c:15]2[cH:16][cH:17][cH:18][cH:19][c:20]12',
  'confidence': 0.9704424331552834}]

To account for batching and error handling automatically, you can use BatchedMapper instead:

from rxnmapper import BatchedMapper
rxn_mapper = BatchedMapper(batch_size=32)
rxns = ['CC[O-]~[Na+].BrCC>>CCOCC', 'invalid>>reaction']

# The following calls work with input of arbitrary size. Also, they do not raise 
# any exceptions but will return ">>" or an empty dictionary for the second reaction.
results = list(rxn_mapper.map_reactions(rxns))  # results as strings directly
results = list(rxn_mapper.map_reactions_with_info(rxns))  # results as dictionaries (as above)

Testing

You can run the examples above with the test suite as well:

  1. In your Conda environment: pip install -e .[dev]
  2. pytest tests from the root

Examples

To learn more see the examples.

Data

Data can be found at: https://ibm.box.com/v/RXNMapperData

Citation

@article{schwaller2021extraction,
  title={Extraction of organic chemistry grammar from unsupervised learning of chemical reactions},
  author={Schwaller, Philippe and Hoover, Benjamin and Reymond, Jean-Louis and Strobelt, Hendrik and Laino, Teodoro},
  journal={Science Advances},
  volume={7},
  number={15},
  pages={eabe4166},
  year={2021},
  publisher={American Association for the Advancement of Science}
}

More Repositories

1

rxn4chemistry

Python wrapper for the IBM RXN for Chemistry API
Python
167
star
2

rxnfp

Reaction fingerprints, atlases and classification. Code complementing our Nature Machine Intelligence publication on "Mapping the space of chemical reactions using attention-based neural networks" (http://rdcu.be/cenmd).
HTML
153
star
3

rxn_yields

Code complementing our manuscript on the prediction of chemical reaction yields (https://iopscience.iop.org/article/10.1088/2632-2153/abc81d) and data augmentation strategies (https://doi.org/10.26434/chemrxiv.13286741).
Jupyter Notebook
97
star
4

biocatalysis-model

RXN for biochemical reactions
Python
60
star
5

paragraph2actions

Extraction of action sequences from experimental procedures
Python
36
star
6

rxnaamapper

Reaction SMILES-AA mapping via language modelling
Python
29
star
7

disconnection_aware_retrosynthesis

Python
28
star
8

smiles2actions

Action sequence prediction for arbitrary chemical equations
Python
25
star
9

rxn-chemutils

Chemistry-related Python utilities used in the RXN universe
Python
20
star
10

rxn-ir-to-structure

Predicting molecular structure from Infrared (IR) Spectra
Python
13
star
11

nmr-to-structure

Prediction molecular structure from NMR spectra
Python
11
star
12

rxn-reaction-preprocessing

Preprocessing of datasets of chemical reactions: standardization, filtering, augmentation, tokenization, etc.
Python
9
star
13

rxn-utilities

General Python utilities commonly used in the RXN universe
Python
7
star
14

rxn-standardization

Standardizing chemical compounds with language models
Python
7
star
15

rxn_cluster_token_prompt

Code to train high diversity retrosynthesis models with cluster token prompt
Python
5
star
16

multimodal-spectroscopic-dataset

Code for generation and benchmarks of the Multimodal Spectroscopic Dataset
Python
4
star
17

sac-action-extraction

Extraction of single-atom catalyst synthesis actions with transformers.
Python
3
star
18

rxn-onmt-models

Training of OpenNMT-based RXN models
Python
2
star
19

rxn-models

Open-source RXN models page
2
star
20

rxn-models-for-polymerization

RXN models for polymerization
1
star
21

rxn-metrics

Metrics for RXN models
Python
1
star