ProteinLigandBenchmarks
Protein-Ligand Benchmark Dataset for testing Parameters and Methods of Free Energy Calculations.
Documentation
Documentation for the protein-ligand-benchmark
package is hosted at readthedocs.
Related Publication
The LiveCoMS article on "Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks" provides accompanying information to this benchmark dataset and how to use it for alchemical free energy calculations. For any suggestions of improvements please raise an issue in its GitHub repository protein-ligand-benchmark-livecoms.
Installation
The repository uses git-lfs
(large file storage) for the storage of all the data file. Ideally git-lfs
is installed first before cloning the repository.
conda create -n plbenchmark python=3.7 git-lfs
conda activate plbenchmark
git lfs clone https://github.com/openforcefield/protein-ligand-benchmark.git
cd protein-ligand-benchmark
conda env update --file environment.yml
pip install -e .
Getting Started
Example notebooks can be found in the Documentation and in examples
.
Paper repository here.
Data file tree and file description
The data is organized as followed:
data
βββ targets.yml # list of all targets and their directories
βββ <date>_<target_name_1> # directory for target 1
β βββ 00_data # metadata for target 1
β β βββ edges.yml # edges/perturbations
β β βββ ligands.yml # ligands and activities
β β βββ target.yml # target
β βββ 01_protein # protein data
β β βββ crd # coordinates
β β β βββ cofactors_crystalwater.pdb # cofactors and cyrstal waters (might be empty if there are none)
β β β βββ protein.pdb # aminoacid residues
β β βββ top # topology(s)
β β β βββ amber99sb-star-ildn-mut.ff # force field spec.
β β β βββ cofactors_crystalwater.top# Gromacs TOP file of cofactors and crystal water (might be empty if there are none)
β β β βββ protein.top # Gromacs TOP file of amino acid residues
β β β βββ *.itp # Gromacs ITP file(s) to be included in TOP files
β βββ 02_ligands # ligands
β βββ lig_<name_1> # ligand 1
β β βββ crd # coordinates
β β β βββ lig_<name_1>.sdf # SDF file
β β βββ top # topology(s)
β β βββ openff-1.0.0.offxml # force field spec.
β β βββ fflig_<name_1>.itp # Gromacs ITP file : atom types
β β βββ lig_<name_1>.itp # Gromacs ITP file
β β βββ lig_<name_1>.top # Gromacs TOP file
β β βββ posre_lig_<name_1>.itp # Gromacs ITP file : position restraint file
β βββ lig_<name_2> # ligand 2
β β¦
β βββ 03_hybrid # edges (perturbations)
β βββ edge_<name_1>_<name_2> # edge between ligand 1 and ligand 2
β β βββ water # edge in water
β β βββ crd # coordinates
β β β βββ mergedA.pdb # merged conf based on coords of ligand 1
β β β βββ mergedB.pdb # merged conf based on coords of ligand 2
β β β βββ pairs.dat # atom mapping
β β β βββ score.dat # similarity score
β β βββ top # topology(s)
β β βββ openff-1.0.0.offxml # force field spec.
β β βββ ffmerged.itp # Gromacs ITP file
β β βββ ffMOL.itp # Gromacs ITP file
β β βββ merged.itp # Gromacs ITP file
β β¦
βββ <date>_<target_name_2> # directory for target 2
β¦
Description of meta data YAML files
targets.yml
This file lists all the registered targets in the benchmark set. Each entry denotes one target and contains the following information:
mcl1_sample:
name: mcl1_sample
date: 2020-08-26
dir: 2020-08-26_mcl1_sample
mcl1_sample
is the entry name and each entry has three sub-entries:
name
is the target name, which is usually the same as the entry name of the target.date
is the date when the target was initially added to the benchmark set.dir
is the directory name where all the data for the target is found. Usually it is thedate
and thename
field, connected by a underscore_
.
target.yml
This file is found in the meta data directory of each target: <date>_<target_name>/00_data/target.yml
. It contains additionally information about the target:
alternate:
iridium_classifier: HT
iridium_score: 0.3
pdb: 6O6F
associated_sets:
- Schrodinger JACS
comments: hydrophobic interactions contributing to binding
date: 2019-12-13
dpi: 0.26
id: 9
iridium_classifier: HT
iridium_score: 0.41
name: mcl1
netcharge: 4 e
pdb: 4HW3
references:
calculation:
- 10.1021/ja512751q
- 10.1021/acs.jcim.9b00105
- 10.1039/C9SC03754C
measurement:
- 10.1021/jm301448p
Explanation of the entries:
alternate
: Alternate X-ray structure which could be usediridium_classifier
: Iridium classifier of the alternate structureiridium_score
: Iridium score of the alternate structurepdb
: PDB ID of the alternate structure
associated_sets
: list of benchmark set tags, where this target is in (e.g."Schrodinger JACS"
)comments
: hydrophobic interactions contributing to bindingdate
: date when the target was initially added to the benchmark set.dpi
: diffraction precision index of the used structure (quality metric for the structure)id
: a given IDiridium_classifier
: Iridium classifier of the used structureiridium_score
: Iridium score of the used structurename
: name/identifier of the targetnetcharge
: total charge of the prepared protein (this should be equalized with counter ions during preparation of the simulation system)pdb
: PDB ID of the used structurereferences
: doi to referencescalculation
: list of references where this target was used in calculationsmeasurement
: list of references of affinity measurements
ligands.yml
This file is found in the meta data directory of each target: <date>_<target_name>/00_data/ligands.yml
. It contains information of the ligands of one target. One entry looks like this:
lig_23:
measurement:
comment: Table 2, entry 23
doi: 10.1021/jm301448p
error: 0.03
type: ki
unit: uM
value: 0.37
name: lig_23
smiles: '[H]c1c(c(c2c(c1[H])c(c(c(c2OC([H])([H])C([H])([H])C([H])([H])C3=C(Sc4c3c(c(c(c4[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H])[H]'
Explanation of the entries:
measurement
: affinity measurement entrycomment
: comment about the measurementdoi
: DOI (digital object identifier) pointing to the reference for this measurementerror
: Error of measurement,null
if not reportedtype
: type of measurement observable,ki
(binding equilibrium constant),ic50
(IC50 value),pic50
(pIC50 value), ordg
(free energy of binding) are accepted entries.unit
: Unit of value and error entries.value
: Value of the measurement.
name
: name of ligand, which always starts withlig_
, followed by a unique identifier.smiles
: SMILES string of the ligand, with charge state information and chirality information.
edges.yml
This file is found in the meta data directory of each target: <date>_<target_name>/00_data/edges.yml
. It contains information of the edges of one target. One entry looks like this:
edge_50_60:
ligand_a: lig_50
ligand_b: lig_60
Each entry is just a list of two ligand identifiers.
Summary
Summary of the contents of the Protein-Ligand Benchmark Dataset. It contains the available protein targets with corresponding PDB ID and number of ligands.
Target | PDB | N. Lig. |
---|---|---|
bace | 4DJW | 36 |
bace_hunt | 4JPC | 32 |
bace_p2 | 3IN4 | 12 |
cdk2 | 1H1Q | 16 |
cdk8 | 5HNB | 33 |
cmet | 4R1Y | 12 |
eg5 | 3L9H | 28 |
galectin | 5E89 | 8 |
hif2a | 5TBM | 42 |
jnk1 | 2GMX | 21 |
mcl1 | 4HW3 | 42 |
p38 | 3FLY | 34 |
pde10 | 4BBX | 35 |
pde2 | 6EZF | 21 |
pfkfb3 | 6HVI | 40 |
ptp1b | 2QBS | 23 |
shp2 | 5EHR | 26 |
syk | 4PV0 | 44 |
thrombin | 2ZFF | 11 |
tnks2 | 4UI5 | 27 |
tyk2 | 4GIH | 16 |
Release History
Releases follow the major.minor.micro
scheme recommended by PEP440, where
major
increments denote a change that may break API compatibility with previous major releasesminor
increments denote addition of new targets or addition and larger changes to the APImicro
increments denote bugfixes, addition of API features, changes of coordinates or topologies, and changes of metadata
Contributions
- Authors David Hahn
- Data Contributors The authors of the following publications, especially Vytautas Gapsys and Christina E. M. Schindler.
- V. Gapsys et al., Large scale relative protein ligand binding affinities using non-equilibrium alchemy, Chem. Sci., 2020,11, 1140-1152
- Christina E. M. Schindler et al., Large-Scale Assessment of Binding Free Energy Calculations in Active Drug Discovery Projects, J. Chem. Inf. Model. 2020, 60, 11, 5457β5474
- Laura Perez Benito et al., Predicting Activity Cliffs with Free-Energy Perturbation, J. Chem. Theory Comput. 2019, 15, 3, 1884β1895
- Discussions and Suggestions Christopher I. Bayly, Marko Breznik, Hannah E. Bruce Macdonald, John D.Chodera, Katharina Meier, Antonia S. J. S. Mey, David L. Mobley, Laura Perez Benito, Gary Tresadern, Gregory L. Warren and all members of the Open Force Field Initiative
- Code review and discussions Matt Thompson, Jeffrey Wagner
License
MIT. See the License File for more information.
CC-BY-4.0 for data (content of directory data
). See the License File for more information.
Copyright
Copyright (c) 2021, Open Force Field Consortium, David F. Hahn
Acknowledgements
Project based on the Computational Molecular Science Python Cookiecutter version 1.1.