• Stars
    star
    191
  • Rank 202,877 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

ChEMBL database structure pipelines

CI Testing License: MIT

ChEMBL Structure Pipeline

ChEMBL protocols used to standardise and salt strip molecules. First used in ChEMBL 26.

Check the wiki and paper[1] for a detailed description of the different processes.

Installation

From source:

git clone https://github.com/chembl/ChEMBL_Structure_Pipeline.git
pip install ./ChEMBL_Structure_Pipeline

with pip:

pip install chembl_structure_pipeline

with conda:

conda install -c conda-forge chembl_structure_pipeline

Usage

Standardise a compound (info)

from chembl_structure_pipeline import standardizer

o_molblock = """
  Mrv1810 07121910172D          

  4  3  0  0  0  0            999 V2000
   -2.5038    0.4060    0.0000 C   0  0  3  0  0  0  0  0  0  0  0  0
   -2.5038    1.2310    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
   -3.2182   -0.0065    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
   -1.7893   -0.0065    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  3  1  0  0  0  0
  1  4  1  4  0  0  0
M  CHG  2   2  -1   3   1
M  END
"""

std_molblock = standardizer.standardize_molblock(o_molblock)

Get the parent compound (info)

from chembl_structure_pipeline import standardizer

o_molblock = """
  Mrv1810 07121910262D          

  3  1  0  0  0  0            999 V2000
   -5.2331    1.1053    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5186    1.5178    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
   -2.8647    1.5789    0.0000 Cl  0  5  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
M  CHG  2   2   1   3  -1
M  END
"""

parent_molblock, _ = standardizer.get_parent_molblock(o_molblock)

Check a compound (info)

The checker assesses the quality of a structure. It highlights specific features or issues in the structure that may need to be revised. Together with the description of the issue, the checker process returns a penalty score (between 0-9) which reflects the seriousness of the issue (the higher the score, the more critical is the issue)

from chembl_structure_pipeline import checker

o_molblock = """ 
  Mrv1810 02151908462D           
 
  4  3  0  0  0  0            999 V2000 
    2.2321    4.4196    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0 
    3.0023    4.7153    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0 
    1.4117    4.5059    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0 
    1.9568    3.6420    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0 
  1  2  1  1  0  0  0 
  1  3  1  0  0  0  0 
  1  4  1  0  0  0  0 
M  END 
"""

issues = checker.check_molblock(o_molblock)

References

[1] Bento, A.P., Hersey, A., Félix, E. et al. An open source chemical structure curation pipeline using RDKit. J Cheminform 12, 51 (2020). https://doi.org/10.1186/s13321-020-00456-1

More Repositories

1

chembl_webresource_client

Official Python client for accessing ChEMBL API
Python
362
star
2

FPSim2

Simple package for fast molecular similarity searches
Python
111
star
3

mychembl

Resources used to create the myChEMBL virtual machine
Jupyter Notebook
57
star
4

chembl_beaker

RDKit wrapper
Python
48
star
5

GLaDOS

Web Interface for ChEMBL @ EMBL-EBI
JavaScript
45
star
6

surechembl-data-client

A collection of scripts for retrieving, storing, and querying SureChEMBL data.
Python
34
star
7

tractability_pipeline_v2

Pipeline for assessing the tractability of potential targets (starting from Gene IDs)
Python
21
star
8

chembl_webservices_2

Source code of the ChEMBL web services.
Python
16
star
9

target_predictions

Python
12
star
10

autoencoder_ipython

Ipython notebook for blog post entry
Jupyter Notebook
12
star
11

chembl_multitask_model

Target prediction multitask neural network, with examples running it in Python, C++, Julia and JS
Python
11
star
12

notebooks

notebook repository
Jupyter Notebook
9
star
13

ModifiedNB

Popular cheminformatics Naïve Bayes model implemented in scikit-learn
Python
7
star
14

cbl_migrator

Migrates Oracle DBs to PostgreSQL, MySQL and SQLite
Python
7
star
15

GLaDOS-docs

Repository for storing the source files of the new interface documentation. https://chembl.gitbook.io/chembl-interface-documentation/
7
star
16

of_conformal

OpenFaaS function re-implementing https://doi.org/10.1186/s13321-018-0325-4 with LightGBM
Python
7
star
17

compound_target_pairs_dataset

Automatic extraction of interacting compound-target pairs from ChEMBL.
Python
7
star
18

antidote

An open platform for chemoinformatics and data-driven drug discovery applications
6
star
19

chembl_target_predictions

Set of script used by ChEMBL group to generate target predictions
Python
6
star
20

tractability_pipeline

Replaced by: https://github.com/chembl/tractability_pipeline_v2
Python
5
star
21

the-S3-amongos

S3 (AWS Simple Storage Service) server clone using MongoDB, PyMongo and Tornado.
Python
5
star
22

chembl_webservices_py3

ChEMBL Web Services in Python 3
Python
4
star
23

chembl_core_db

Python
4
star
24

chembl_core_model

Python
4
star
25

ChEMBL_NTD-Markdown

Markdown files for the new ChEMBL_NTD page: https://chembl.gitbook.io/chembl-ntd/
4
star
26

eodc_code_examples

Source code for code snippets sumbitted to the Expert Opinion on Drug Discovery
Python
3
star
27

ChEMBL-Loader-Documentation

A repository for the ChEMBL loader documentation as shown in gitbook. (https://chembl.gitbook.io/chembl-loader/)
3
star
28

chembl_invivo_assay

This repository identifies and annotates in vivo assays.
Python
3
star
29

chembl_assay_matrix

Python package generating compound co-occurance matrix for all assays from given document
CSS
2
star
30

sachem_elchem

Sachem Elchem plugin for elasticsearch
C
2
star
31

potsim2

PotSim2: Simple package to segment and compare protein potential grids
Python
2
star
32

mmv_train_image

Python
2
star
33

unichem2index

Queries Unichem's DB and Indexes the structure data into an Elasticsearch Index
Go
1
star
34

chembl_ws_2_es

Tools to migrate from ChEMBL web services to Elastic Search
Python
1
star
35

chembl_webservices

Python package providing chembl webservices API.
Python
1
star
36

Unichem-Documentation

1
star
37

structure_pipeline_binder

Jupyter Notebook
1
star
38

test_data

Repository to store some data to help in some tests and experiments.
1
star
39

surechembl-issues

Public issue report repository for SureChEMBL
1
star
40

idg_patents_paper

Perl
1
star
41

surechembl-docker-data-client

Dockerized example for Surechembl Data Client App.
Dockerfile
1
star
42

chembl_api

Python package providing full CRUD operations using REST out of ChEMBL model for internal web apps.
Python
1
star
43

openfaas_tp

Python
1
star
44

KNIME_REST_example

Example of accessing ChEMBL API from KNIME using KREST nodes.
1
star
45

speices_tagger

SPECIES tagger developed by Evangelos Pafilis et al.
C++
1
star
46

pfam_maps

Django app for a web-interface to manually curate mappings of small molecule binding to Pfam-A domains
JavaScript
1
star
47

chemistry_service

Python
1
star