Benchmarking Cross-Docking Strategies in Kinase Drug Discovery

In recent years, machine learning has transformed many aspects of the drug discovery process, including small molecule design, for which the prediction of bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has gre...

Full description

Saved in:
Bibliographic Details
Main Author: David A. Schaller (20288030) (author)
Other Authors: Clara D. Christ (2632918) (author), John D. Chodera (1323594) (author), Andrea Volkamer (1444000) (author)
Published: 2024
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1852025118256005120
author David A. Schaller (20288030)
author2 Clara D. Christ (2632918)
John D. Chodera (1323594)
Andrea Volkamer (1444000)
author2_role author
author
author
author_facet David A. Schaller (20288030)
Clara D. Christ (2632918)
John D. Chodera (1323594)
Andrea Volkamer (1444000)
author_role author
dc.creator.none.fl_str_mv David A. Schaller (20288030)
Clara D. Christ (2632918)
John D. Chodera (1323594)
Andrea Volkamer (1444000)
dc.date.none.fl_str_mv 2024-11-19T07:09:47Z
dc.identifier.none.fl_str_mv 10.1021/acs.jcim.4c00905.s002
dc.relation.none.fl_str_mv https://figshare.com/articles/dataset/Benchmarking_Cross-Docking_Strategies_in_Kinase_Drug_Discovery/27852113
dc.rights.none.fl_str_mv CC BY-NC 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biophysics
Biochemistry
Genetics
Pharmacology
Biotechnology
Immunology
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
Information Systems not elsewhere classified
transformed many aspects
reproduce binding poses
recovering binding poses
maximum common substructure
leveraging structural information
inhibitor complex geometries
finding practical approaches
drug discovery process
utilizing shape overlap
kinase drug discovery
based docking alone
pose selection strategies
docking methods biased
generating useful kinase
docking utilizing
docking strategies
docking pose
three methods
studied docking
docking scenario
success rate
standard physics
square deviation
small molecule
recent years
realistic cross
protein target
protein kinases
protein families
openeye toolkits
machine learning
low root
kinoml framework
integral part
included systems
great potential
general findings
fundamentally limited
efficient way
different classes
competitive ligands
cocrystallized ligand
benchmarking cross
automated fashion
although focused
allowing automated
423 atp
dc.title.none.fl_str_mv Benchmarking Cross-Docking Strategies in Kinase Drug Discovery
dc.type.none.fl_str_mv Dataset
info:eu-repo/semantics/publishedVersion
dataset
description In recent years, machine learning has transformed many aspects of the drug discovery process, including small molecule design, for which the prediction of bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches but is fundamentally limited by the accuracy with which protein–ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase-inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures cocrystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the cocrystallized ligand, utilizing shape overlap with or without maximum common substructure matching, are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance of generating a low root-mean-square deviation (RMSD) docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar cocrystallized ligands according to the maximum common substructure (MCS) proved to be the most efficient way to reproduce binding poses, achieving a success rate of 70.4% across all included systems. The studied docking and pose selection strategies, which utilize the OpenEye Toolkits, were implemented into pipelines of the KinoML framework, allowing automated and reliable protein–ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe that the general findings can also be transferred to other protein families.
eu_rights_str_mv openAccess
id Manara_594faad9fbd7f1a3c4e2bb2fb44b7df5
identifier_str_mv 10.1021/acs.jcim.4c00905.s002
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/27852113
publishDate 2024
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY-NC 4.0
spelling Benchmarking Cross-Docking Strategies in Kinase Drug DiscoveryDavid A. Schaller (20288030)Clara D. Christ (2632918)John D. Chodera (1323594)Andrea Volkamer (1444000)BiophysicsBiochemistryGeneticsPharmacologyBiotechnologyImmunologyBiological Sciences not elsewhere classifiedChemical Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedtransformed many aspectsreproduce binding posesrecovering binding posesmaximum common substructureleveraging structural informationinhibitor complex geometriesfinding practical approachesdrug discovery processutilizing shape overlapkinase drug discoverybased docking alonepose selection strategiesdocking methods biasedgenerating useful kinasedocking utilizingdocking strategiesdocking posethree methodsstudied dockingdocking scenariosuccess ratestandard physicssquare deviationsmall moleculerecent yearsrealistic crossprotein targetprotein kinasesprotein familiesopeneye toolkitsmachine learninglow rootkinoml frameworkintegral partincluded systemsgreat potentialgeneral findingsfundamentally limitedefficient waydifferent classescompetitive ligandscocrystallized ligandbenchmarking crossautomated fashionalthough focusedallowing automated423 atpIn recent years, machine learning has transformed many aspects of the drug discovery process, including small molecule design, for which the prediction of bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches but is fundamentally limited by the accuracy with which protein–ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase-inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures cocrystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the cocrystallized ligand, utilizing shape overlap with or without maximum common substructure matching, are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance of generating a low root-mean-square deviation (RMSD) docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar cocrystallized ligands according to the maximum common substructure (MCS) proved to be the most efficient way to reproduce binding poses, achieving a success rate of 70.4% across all included systems. The studied docking and pose selection strategies, which utilize the OpenEye Toolkits, were implemented into pipelines of the KinoML framework, allowing automated and reliable protein–ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe that the general findings can also be transferred to other protein families.2024-11-19T07:09:47ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1021/acs.jcim.4c00905.s002https://figshare.com/articles/dataset/Benchmarking_Cross-Docking_Strategies_in_Kinase_Drug_Discovery/27852113CC BY-NC 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/278521132024-11-19T07:09:47Z
spellingShingle Benchmarking Cross-Docking Strategies in Kinase Drug Discovery
David A. Schaller (20288030)
Biophysics
Biochemistry
Genetics
Pharmacology
Biotechnology
Immunology
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
Information Systems not elsewhere classified
transformed many aspects
reproduce binding poses
recovering binding poses
maximum common substructure
leveraging structural information
inhibitor complex geometries
finding practical approaches
drug discovery process
utilizing shape overlap
kinase drug discovery
based docking alone
pose selection strategies
docking methods biased
generating useful kinase
docking utilizing
docking strategies
docking pose
three methods
studied docking
docking scenario
success rate
standard physics
square deviation
small molecule
recent years
realistic cross
protein target
protein kinases
protein families
openeye toolkits
machine learning
low root
kinoml framework
integral part
included systems
great potential
general findings
fundamentally limited
efficient way
different classes
competitive ligands
cocrystallized ligand
benchmarking cross
automated fashion
although focused
allowing automated
423 atp
status_str publishedVersion
title Benchmarking Cross-Docking Strategies in Kinase Drug Discovery
title_full Benchmarking Cross-Docking Strategies in Kinase Drug Discovery
title_fullStr Benchmarking Cross-Docking Strategies in Kinase Drug Discovery
title_full_unstemmed Benchmarking Cross-Docking Strategies in Kinase Drug Discovery
title_short Benchmarking Cross-Docking Strategies in Kinase Drug Discovery
title_sort Benchmarking Cross-Docking Strategies in Kinase Drug Discovery
topic Biophysics
Biochemistry
Genetics
Pharmacology
Biotechnology
Immunology
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
Information Systems not elsewhere classified
transformed many aspects
reproduce binding poses
recovering binding poses
maximum common substructure
leveraging structural information
inhibitor complex geometries
finding practical approaches
drug discovery process
utilizing shape overlap
kinase drug discovery
based docking alone
pose selection strategies
docking methods biased
generating useful kinase
docking utilizing
docking strategies
docking pose
three methods
studied docking
docking scenario
success rate
standard physics
square deviation
small molecule
recent years
realistic cross
protein target
protein kinases
protein families
openeye toolkits
machine learning
low root
kinoml framework
integral part
included systems
great potential
general findings
fundamentally limited
efficient way
different classes
competitive ligands
cocrystallized ligand
benchmarking cross
automated fashion
although focused
allowing automated
423 atp