• Stars
    star
    172
  • Rank 221,201 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Merging, linking and placing compounds by stitching bound compounds together like a reanimated corpse

Fragmenstein

Fragmenstein: Merging, linking and placing compounds by stitching bound compounds together like a reanimated corpse.

Documentation Status  github forks matteoferla Fragmenstein?label=Fork&style=social  github stars matteoferla Fragmenstein?style=social  github watchers matteoferla Fragmenstein?label=Watch&style=social

 github last-commit matteoferla Fragmenstein  github license matteoferla Fragmenstein  github release-date matteoferla Fragmenstein  github commit-activity m matteoferla Fragmenstein  github issues matteoferla Fragmenstein  github issues-closed matteoferla Fragmenstein

 pypi v fragmenstein  pypi pyversions fragmenstein  pypi wheel fragmenstein  pypi format fragmenstein  pypi status fragmenstein  pypi dm fragmenstein

 codeclimate maintainability matteoferla Fragmenstein  codeclimate issues matteoferla Fragmenstein  codeclimate tech-debt matteoferla Fragmenstein

Name Colab Link PyRosetta Description
Pipeline colab demo โœ” Given a template and a some hits,
merge them
and place the most similar purchasable analogues from Enamine REAL
Light colab demo โŒ Generate molecules and see how they merge
and how a placed compound fairs

Ox

For manuscript data see manuscript data repository For authors see Authors

Stitched molecules

Fragmenstein can perform two different tasks.

  • Combine hits
  • Place a given followup molecule (SMILES) based on series of hits

overview

Like Frankenstein's creation it may violate the laws of chemistry. Trigonal planar topologies may be tetrahedral, bonds unnaturally long etc. This monstrosity is therefore then energy minimised with strong constraints within the protein.

Classes

There are four main classes โ€”named after characters from the Fragmenstein book and movies:

  • Monster makes the stitched together molecules indepent of the protein โ€” documentation
  • Igor uses PyRosetta to minimise in the protein the fragmenstein monster followup โ€” documentation
  • Victor is a pipeline that calls the parts, with several features, such as warhead switching โ€”documentation
  • Laboratory does all the combinatorial operations with Victor (specific case)

NB. In the absence of pyrosetta (which requires an academic licence), all bar Igor work.

Additionally, there are a few minor classes.

One of these is mRMSD, a multiple RMSD variant which does not superpose/align and bases which atoms to use on coordinates โ€”documentation

The class Walton performs geometric manipulations of compounds, to set them up to demonstrate features of Fragmenstein (like captain Walton, it does not partake in the plot, but is key to the narration)

There are two module hosted elsewhere:

Combine

It can also merge and link fragment hits by itself and find the best scoring mergers. For details about linking see linking notes. It uses the same overlapping position clustering, but also has a decent amount of impossible/uncommon chemistry prevention.

Monster:

from fragmenstein import Monster
monster = Monster(hits=[hits_a, hit_b])
monster.combine()
monster.positioned_mol #: RDKit.Chem.Mol

Victor:

from fragmenstein import Victor
import pyrosetta
pyrosetta.init( extra_options='-no_optH false -mute all -ex1 -ex2 -ignore_unrecognized_res false -load_PDB_components false -ignore_waters false')

victor = Victor(hits=[hits_a, hit_b], 
                pdb_filename='foo.pdb',  # or pdb_block='ATOM 1 MET ...'
                covalent_resi=1) # if not covalent, just put the first residue or something.
victor.combine()
victor.minimized_mol

The PyRosetta init step can be done with the helper function:

Igor.init_pyrosetta()

The two seem similar, but Victor places with Monster and minimises with Igor. As a result it has energy scores

victor.ddG

Fragmenstein is not really a docking algorithm as it does not find the pose with the lowest energy within a given volume. Consequently, it is a method to find how faithful is a given followup to the hits provided. Hence the minimised pose should be assessed by the RMSD metric or similar and the โˆ†โˆ†G score used solely as a cutoff โ€”lower than zero.

For a large number of combination:

from fragmenstein import Laboratory

lab = Laboratory(pdbblock=pdbblock, covalent_resi=None)
combinations:pd.DataFrame = lab.combine(hits, n_cores=28)

Place

Here is an interactive example of placed molecules.

It is rather tolerant to erroneous/excessive submissions (by automatically excluding them) and can energy minimise strained conformations. summary

Three mapping approaches were tested, but the key is that hits are pairwise mapped to each other by means of one-to-one atom matching based upon position as opposed to similarity which is easily led astray. For example, note here that the benzene and the pyridine rings overlap, not the two pyridine rings:

Examples

Monster:

from fragmenstein import Monster
monster = Monster(hits=[hits_a, hit_b])
monster.place_smiles('CCO')
monster.positioned_mol

Victor:

from fragmenstein import Victor, Igor
    Igor.init_pyrosetta()
    victor = Victor(hits=[hits_a, hit_b], pdb_filename='foo.pdb')
    victor.place('CCO')
    victor.minimized_mol

For a lengthier example see example notes or documentation.

Demo data

Some demo data is provided in the demo submodule.

from fragmenstein.demo import MPro, Mac1

pdbblock: str = Mac1.get_template()
for hitname in Mac1.get_hit_list():
    Mac1.get_hit(hitname)
    ...

To use SAR-COV-2 MPro as a test bed, the following may be helpful:

  • fragmenstein.MProVictor, a derived class (of Victor), with various presents specific for MPro.
  • fragemenstein.get_mpro_template(), returns the PDB block (str) of MPro
  • fragemenstein.get_mpro_molblock(xnumber), returns the mol block (str) of a MPro hit from Fragalysis
  • fragemenstein.get_mpro_mol(xnumber), as above but returns a Chem.Mol instance.

Other features

Installation

Fragmenstein and dependencies

Python 3.6 or above. Install from pipy

python -m pip install fragmenstein

Requires Pyrosetta

โš ๏ธ PyRosetta no longer runs on CentOS 7 due to old kernel headers (cf. blog post).

Pyrosetta requires a password to be downloaded (academic licence) obtained by https://els2.comotion.uw.edu/product/pyrosetta. This is a different licence from the Rosetta one. The username of the Rosetta binaries is formatted variant of "academic user", while the PyRosetta is the name of a researcher whose name bares an important concept in protein folding, like boltzmann + constant (but is not that). Pyrosetta can be downloaded via a browser from http://www.pyrosetta.org/dow. Or in the terminal via:

curl -u ๐Ÿ‘พ๐Ÿ‘พ๐Ÿ‘พ:๐Ÿ‘พ๐Ÿ‘พ๐Ÿ‘พhttps://graylab.jhu.edu/download/PyRosetta4/archive/release/PyRosetta4.Release.python38.linux/PyRosetta4.Release.python38.linux.release-NNN.tar.bz2 -o a.tar.bz2
tar -xf a.tar.bz2
cd PyRosetta4.Release.python38.linux
sudo pip3 install .

or using conda

or using install_pyrosetta from the pyrosetta-help package.

pip install pyrosetta-help
PYROSETTA_USERNAME=๐Ÿ‘พ๐Ÿ‘พ๐Ÿ‘พ PYROSETTA_PASSWORD=๐Ÿ‘พ๐Ÿ‘พ๐Ÿ‘พ install_pyrosetta

The PYROSETTA_USERNAME and PYROSETTA_PASSWORD are environment variables, which should not be shared publicly (i.e. store them as private environmental variables in your target application).

Origin

See Fragmenstein and COVID moonshot.

Fragmenstein was created to see how reasonable are the molecules of fragment mergers submitted in the COVID moonshot project, because after all the underlying method is fragment based screening. This dataset has some unique peculiarities that potentially are not encountered in other projects.

Command line interface

The strength of Fragmenstein is as a python module, but there is a command line interface.

fragmenstein monster combine -i hit1.mol hit2.mol >> combo.mol
fragmenstein monster place -i hit1.mol hit2.mol -s 'CCO' >> placed.mol
fragmenstein victor combine -i hit1.mol hit2.mol -t protein.pdb -o output >> combo.mol
fragmenstein victor combine -i hit1.mol hit2.mol -s 'NCO' -n molname -t protein.pdb -o output >> placed.mol
fragmenstein laboratory combine -i hits.sdf -o output -d output.csv -s output.sdf -c 24

Authors

Author Role Homepage Department Badges
Matteo Ferla main developer WCHG Wellcome Centre for Human Genetics, University of Oxford https img shields io badge orcid 0000 0002 5508 4673 a6ce39 logo orcid https img shields io badge google scholar gF bp_cAAAAJ success logo googlescholar https img shields io twitter follow matteoferla label Follow logo twitter https img shields io stackexchange stackoverflow r 4625475 logo stackoverflow https img shields io stackexchange bioinformatics r 6322 logo stackexchange https img shields io badge email gmail informational logo googlemail https img shields io badge email Oxford informational logo googlemail
Rubรฉn Sรกnchez-Garcia discussion/code Stats Department of Statistics, University of Oxford https img shields io badge orcid 0000 0001 6156 3542 a6ce39 logo orcid https img shields io badge google scholar MplGOMAAAAJ success logo googlescholar
Rachael Skyner discussion/editing/code
Stefan Gahbauer discussion
Jenny Taylor PI WCHG Wellcome Centre for Human Genetics, University of Oxford https img shields io badge orcid 0000 0003 3602 5704 a6ce39 logo orcid
Brian Marsden PI CMD CMD, Oxford https img shields io badge orcid 0000 0002 1937 4091 a6ce39 logo orcid https img shields io badge google scholar mCPM7bAAAAAJ success logo googlescholar https img shields io twitter follow bmarsden19 label Follow logo twitter
Charlotte Deane PI
Frank von Delft PI CMD Diamond Lightsource / CMD, Oxford https img shields io badge orcid 0000 0003 0378 0017 a6ce39 logo orcid https img shields io badge google scholar uZpTG1kAAAAJ success logo googlescholar https img shields io twitter follow FrankvonDelft label Follow logo twitter

See Also

  • ChemRXiv preprint โ€” TBA
  • Steph Wills's fragment network merges repo contains useful filtering algorithms
  • Fragmenstein is used in Schuller et. al. 2021 SCHULLER et al
  • Figures for the upcoming manuscript are in a separate repo
  • The conversion of a rdkit Chem.Mol that cannot be sanitised to an analogue that can is done by the molecular rectifier package
  • The conversion of a rdkit Chem.Mol to a PyRosetta residue type (a "params file") is done via the rdkit-to-params package
  • The pipeline demo colab notebook uses Brian Shoichet's SmallWorld webapp, interfaced via its API in Python
  • The playground demo colab notebook features a JSME widget โ€” JSME is a popular JS only molecular editor

More Repositories

1

DnD-battler

A 5e D&D encounter simulator written for my own amusement to test some hypotheses.
Python
76
star
2

molecular_rectifier

Given an RDKit molecule that does not sanitise, correct it until it does
Python
36
star
3

Python_SmallWorld_API

An (unofficial) Python3 module to query the SmallWorld chemical space search server (https://sw.docking.org/search.html)
Python
32
star
4

MichelaNGLo-app

A web app to convert a PyMOL PSE file or PDB file to a easy to implement NGL.js view that can be implemented easily on any site
Mako
25
star
5

rdkit_to_params

Create or modify Rosetta params files (topology files) from scratch, RDKit mols or another params file
Python
22
star
6

pyrosetta-help

Some scripts that I keep using over and over.
Python
17
star
7

Fragment-hit-follow-up-chemistry

A collection of notebooks and scripts for the prediction of follow-up compounds in
Jupyter Notebook
10
star
8

Display-of-preset-Rosetta-NCAAs

What exactly are the non-canonical amino acids in the Rosetta database folder?
Python
9
star
9

Snippets-for-ColabFold

A collections of snippets for working with ColabFold
Python
8
star
10

pyfurby

Raspberry Pi Zero W controlled Furby
Python
8
star
11

Christmas_tree_protein

A Christmas tree protein
6
star
12

Pyrosetta-documentarian

A class to help reverse engineer what what a Pyrosetta class does...
Python
6
star
13

JSME_notebook

JSME molecular editor in a Jupyter or Colab notebook in a hacky way
Python
4
star
14

DTC-compchem-practical

Compchem practical for the University of Oxford Doctoral Training Centre
Jupyter Notebook
4
star
15

PLIP-PyRosetta-hotspots-test

A proof-of-principle of using PLIP and PyRosetta as a substitute to Hotspot API and CCDC Gold
Python
4
star
16

DnD-encounter-simulator-site

The site for the DnD encounter similator
Python
4
star
17

gist-import

GitHub Gist are handy snippets, which are meant to be copy-pasted into one's code... but what if you could import them?
Python
3
star
18

protein_fuser

A py3 script to fuse structures together
Python
3
star
19

DirEvo_tools

The new and improved server combining Pedel, Mutanalyst and much more
Mako
3
star
20

AtomicRenamer

Given a molecule label the atoms (names/labels) according to a reference ligand from the PDB
Python
3
star
21

Michelanglo-and-Venus

Topmost repository for the Michelanglo webapp, including the Venus functionality.
Shell
3
star
22

validation_of_venus_ddG

Tests to assess the accuracy of Venus ddG calculations
Python
3
star
23

And-the-protein-ligand-went-boom

A collection of various computational chemistry/biochemistry fails
3
star
24

mutant_calculator

Now live on www.mutanalyst.com
JavaScript
2
star
25

Replace-ligand

Example of how to replace a ligand with a similar in Rdkit and Pyrosetta
Python
2
star
26

MichelaNGLo-api

A python module to interact with MichelaษดษขสŸo programmatically
Python
2
star
27

Wikipedian-compounds

Parsing Wikipedia Chembox data for fun
2
star
28

MichelaNGLo-protein-analysis

protein module for Michelanglo and VENUS (handles the actual protein analysis)
Python
2
star
29

notebook-error-reporter

An idea on how to collect errors generated by other users using a notebook that was shared.
Python
2
star
30

tangents

Python
1
star
31

mutational_scanning

Analysis of variants generated by mutational scanning for Carlos G Acevedo-Rocha et al 2017
MATLAB
1
star
32

MED27_analysis

Analysis of variants of MED27 (no mutant data is stored here)
Python
1
star
33

Various-scripts-rosetta-pymol

scripts for Rosetta and PyMol stuff
Python
1
star
34

MichelaNGLo-human-protein-data

pickled files containing protein data for humans
1
star
35

mutagenesis

Python DNA mutagenesis model and more
Python
1
star
36

SLC38A3_analysis

SLC38A3 protein modelling
Python
1
star
37

Epistasis_Calculator

Python
1
star
38

PrettyFastaJS

Script to embed in a pretty way a fasta file in a website, such as blog
JavaScript
1
star
39

EV-D68-3C-protease

Follow-up suggestions for EV-D68 3C protease fragment-based drug-discovery campaign of ASAP consortium
Jupyter Notebook
1
star
40

PRKG2_analysis

In silico analysis of the cGMP-dependent protein kinase 2 (PRKG2 encoded)
1
star
41

MEF2C_analysis

Analysis of variants of MEF2C
Python
1
star
42

ConsurfDB-client-API

A Python3 module to requests information from the ConsurfDB server and perform several operations (_e.g._ reindexing) with grade files and PDB files ๐Ÿโ†”๏ธ๐Ÿ„
Python
1
star
43

SARS-CoV-2_CL3_covalent_docking

This is junk a junkyard for code salvaging. Please see Fragmenstein instead
Python
1
star
44

Arthorian-Quest

An experiment in using Arthor (arthor.docking.org) by NextMove and filtering the results with Fragmenstein
Python
1
star
45

Poised-Fragment-library-design

The design of a fragment library that is sociably poised for the XChem in house synthesis robot
Jupyter Notebook
1
star
46

crysalin

Engineering crysalin lattice
Python
1
star