Supporting data for High-Accuracy Dataset Generation and Machine Learning Enhancement of Density Functional Theory

<p dir="ltr">Computational chemistry combines theoretical chemistry with computer simulations to simulate and calculate molecular properties and chemical reactions. Due to their complex structures, organic molecules are relatively difficult to handle in computational chemistry. As an...

Full description

Saved in:
Bibliographic Details
Main Author: Yiling Zhu (13190991) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<p dir="ltr">Computational chemistry combines theoretical chemistry with computer simulations to simulate and calculate molecular properties and chemical reactions. Due to their complex structures, organic molecules are relatively difficult to handle in computational chemistry. As an efficient quantum mechanical method for solving the Schrödinger equation, density functional theory (DFT) describes the system through electron density, avoiding complex multi-electron wave functions, and thus has advantages in computational efficiency and application scope. However, the core challenge of DFT theory lies in the uncertainty of the exchange-correlation (XC) functional, meaning that the exact XC potential and its corresponding energy are not yet fully determined. The widely used B3LYP and CCSD methods for finding the XC functional also have their own limitations and drawbacks.</p><p dir="ltr">To address these issues, this work presents a comprehensive investigation of machine learning-enhanced density functional theory through systematic construction of a large-scale quantum chemical dataset and neural network based correction methods. Inspired by the Holographic Electron Density Theorem, a comprehensive dataset was constructed encompassing 593 diverse molecular systems from established benchmarks (G2, W4-11, GMTKN55), systematically expanded through geometric perturbations including bond stretching, angular bending, and conformational sampling. The dataset employed rigorous computational protocols using PySCF with cc-pVDZ and cc-pVTZ basis sets, calculating electronic structures across multiple theory levels.</p><p dir="ltr">Local electronic environments were standardized as 9×9×9 density cubes centered at atomic positions, with principal axis alignment ensuring rotational invariance and systematic re-gridding producing uniform 5×5×5 representations. Three machine learning approaches were developed and tested: anMLP Mixer for exchange-correlation potential prediction, a fully connected network for energy corrections, and an ML-PBE method for systematic error reduction. Results demonstrate exceptional performance improvements with accuracy enhancements spanning multiple orders of magnitude compared to conventional functionals. This approach advances accurate calculation of exchange-correlation potentials from electron density distributions, enhancing DFT development through systematic dataset construction and machine learning enhancement.</p><p dir="ltr"><br></p>