• Stars
    star
    131
  • Rank 275,867 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

MolGPT

In this work, we train small custom GPT on Moses and Guacamol dataset with next token prediction task. The model is then used for unconditional and conditional molecular generation. We compare our model with previous approaches on the Moses and Guacamol datasets. Saliency maps are obtained for interpretability using Ecco library.

  • The processed Guacamol and MOSES datasets in csv format can be downloaded from this link:

https://drive.google.com/drive/folders/1LrtGru7Srj_62WMR4Zcfs7xJ3GZr9N4E?usp=sharing

  • Original Guacamol dataset can be found here:

https://github.com/BenevolentAI/guacamol

  • Original Moses dataset can be found here:

https://github.com/molecularsets/moses

  • All trained weights can be found here:

https://www.kaggle.com/virajbagal/ligflow-final-weights

To train the model, make sure you have the datasets' csv file in the same directory as the code files.

Training

./train_moses.sh
./train_guacamol.sh

Generation

./generate_guacamol_prop.sh
./generate_moses_prop_scaf.sh

If you find this work useful, please cite:

Bagal, Viraj; Aggarwal, Rishal; Vinod, P. K.; Priyakumar, U. Deva (2021): MolGPT: Molecular Generation using a Transformer-Decoder Model. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.14561901.v1

More Repositories

1

DeepPocket

Ligand Binding Site detection using Deep Learning
Python
89
star
2

CIGIN

AAAI 2020: Chemically Interpretable Graph Interaction Network for Prediction of Pharmacokinetic Properties of Drug-like Molecules
Jupyter Notebook
35
star
3

MoleGuLAR

Repository for MoleGuLAR: Molecule generation using Reinforcement Learning and Alternating Rewards
Jupyter Notebook
22
star
4

BAND-NN

J. Comp. Chem. 2020, 41, 790-799
Python
10
star
5

SCONES

Self-Consistent Neural Network for Protein Stability Prediction upon Mutations
Python
10
star
6

ml4science_tut

ML for Science Tutorials
Jupyter Notebook
9
star
7

SpectraToStructure

Python
8
star
8

MO-MEMES

Implementation of MO-MEMES, an extension to the Machine learning framework for Enhanced MolEcular Screening (MEMES) framework for multi-objective Bayesian optimization.
Python
7
star
9

BiRDS

Perl
7
star
10

DING

Deep learning enabled for INorganic material Generator (https://doi.org/10.1039/D0CP03508D)
Jupyter Notebook
5
star
11

SwinFUSE

Official code for SwinFUSE to be presented in Self-supervised Modality-agnostic Pre-training Of Swin Transformers at ISBI'24
Jupyter Notebook
4
star
12

Apobind

Apo structures for protein-ligand complexes in PDBbind V. 2019
Python
4
star
13

Protein-Ligand-Dataset-Bias

Latent biases present in ML methods used for protein-ligand interaction prediction tasks
Jupyter Notebook
3
star
14

delNetFF

Delta Net Force Field (https://doi.org/10.1021/acs.jpca.0c03926)
Python
3
star
15

parkinsonsfromgait

Jupyter Notebook
3
star
16

DeepSPInN

A framework that predicts the molecular structure when given Infrared and 13C Nuclear magnetic resonance spectra without referring to any pre-existing spectral databases or molecular fragment knowledge bases
Jupyter Notebook
3
star
17

TorRNA

Improved prediction of Torsion angles of RNA by leveraging large language models
Jupyter Notebook
2
star
18

DART

Jupyter Notebook
1
star
19

covid19-risk-stratification-india

Jupyter Notebook
1
star
20

rex_md_kinetic

J. Chem. Theory Comput. 2018, 14, 7, 3365-3380
Python
1
star