Developing a Machine Learning Model for Hydrogen Bond Acceptance Based on Natural Bond Orbital Descriptors

This study employs machine learning (ML) to assess the predictive power of electronic descriptors derived from natural bond orbital (NBO) analysis for hydrogen bond acceptance. Using a data set of 979 hydrogen bond complexes, each formed by a hydrogen bond acceptor and 4-fluorophenol as the donor, w...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Diego Ulysses Melo (17676683) (author)
مؤلفون آخرون: Leonardo Martins Carneiro (21668282) (author), Mauricio Domingues Coutinho-Neto (17271623) (author), Paula Homem-de-Mello (1803232) (author), Fernando Heering Bartoloni (2015380) (author)
منشور في: 2025
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:This study employs machine learning (ML) to assess the predictive power of electronic descriptors derived from natural bond orbital (NBO) analysis for hydrogen bond acceptance. Using a data set of 979 hydrogen bond complexes, each formed by a hydrogen bond acceptor and 4-fluorophenol as the donor, we optimized geometries via GFN2-xTB, followed by DFT single-point calculations. From these, NBO analysis was used to extract intramolecular donor–acceptor interactions, particularly the orbital stabilization energies (<i>E</i><sup>(2)</sup>), which reflect electron delocalization and relate to canonical resonance structures. The <i>E</i><sup>(2)</sup> values served as features to train seven ML models, based on different techniques: KNN, Decision Tree, SVM, RF, MLP, XGBoost, and CatBoost. To our knowledge, this is the first work that uses <i>E</i><sup>(2)</sup> as a standalone ML descriptor for hydrogen bond acceptance. Even with a small set of descriptors, we achieved high predictive performance, with errors below 0.4 kcal mol<sup>–1</sup>, surpassing previous studies that used heterogeneous descriptors, including quantum-chemical data. Our results highlight the utility of NBO-based features in building accurate, physically meaningful, and generalizable ML models for p<i>K</i><sub>BHX</sub> prediction.