• Stars
    star
    306
  • Rank 136,456 (Top 3 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 6 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Original implementation of the paper "SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery" by Shion Honda et al.

SMILES Transformer

SMILES Transformer extracts molecular fingerprints from string representations of chemical molecules.
The transformer learns latent representation that is useful for various downstream tasks through autoencoding task.

Requirement

This project requires the following libraries.

  • NumPy
  • Pandas
  • PyTorch > 1.2
  • tqdm
  • RDKit

Dataset

Canonical SMILES of 1.7 million molecules that have no more than 100 characters from Chembl24 dataset were used.
These canonical SMILES were transformed randomly every epoch with SMILES-enumeration by E. J. Bjerrum.

Pre-training

After preparing the SMILES corpus for pre-training, run:

$ python pretrain_trfm.py

Pre-trained model is here.

Downstream Tasks

See experiments/ for the example codes.

Cite

@article{honda2019smiles,
    title={SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery},
    author={Shion Honda and Shoi Shi and Hiroki R. Ueda},
    year={2019},
    eprint={1911.04738},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

More Repositories