Towards Generalizable <i>In Silico</i> Predictions of Differential Ion Mobility Using Machine Learning and Customized Fingerprint Engineering

Differential mobility spectrometry (DMS), a tool for separating chemically similar species (including isomers), is readily coupled to mass spectrometry to improve selectivity in analytical workflows. DMS dispersion curves, which describe the dynamic mobility experienced by an ion in a gaseous enviro...

Full description

Saved in:
Bibliographic Details
Main Author: Cailum M. K. Stienstra (16470055) (author)
Other Authors: Christopher R. M. Ryan (16007811) (author), Daniel Demczuk (21033595) (author), Justine R. Bissonnette (12024936) (author), Anish Arjuna (21033598) (author), J. Larry Campbell (1461124) (author), W. Scott Hopkins (1282116) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Differential mobility spectrometry (DMS), a tool for separating chemically similar species (including isomers), is readily coupled to mass spectrometry to improve selectivity in analytical workflows. DMS dispersion curves, which describe the dynamic mobility experienced by an ion in a gaseous environment, show the maximum ion transmission for an analyte through the DMS instrument as a function of the separation voltage (SV) and compensation voltage (CV) conditions. To date, there exists no fast, general prediction tool for the dispersion behavior of ions. Here, we demonstrate a machine learning (ML) model that achieves generalized dispersion prediction using an <i>in silico</i> feature addition pipeline. We employ a data set containing 1141 dispersion curve measurements of anions and cations recorded in pure N<sub>2</sub> environments and in N<sub>2</sub> environments doped with 1.5% methanol (MeOH). Our feature addition pipeline can compute 1591 RDKit and Mordred descriptors using only SMILES codes, which are then normalized to sampled molecular distributions (<i>n</i> = 100 000) using cumulative density functions (CDFs). This tool can be thought of as a “learned” feature fingerprint generation pipeline, which could be applied to almost any molecular (bio)cheminformatics tasks. Our best performing model, which for the first time considers solvent-modified environments, has a mean absolute error (MAE) of 2.1 ± 0.2 V for dispersion curve prediction, a significant improvement over the previous state-of-the-art work. We use explainability techniques (<i>e.g.</i>, SHAP analysis) to show that this feature addition pipeline is a semideterministic process for feature sets, and we discuss “best practices” to understand feature sets and maximize model performance. We expect that this tool could be used for prescreening to accelerate or even automate the use of DMS in complex analytical workflows (<i>e.g.</i>, 2D LC×DMS separation) and perform automated identification of transmission windows and increase the “self-driving” potential of the instrument. We make our models available as a free and accessible tool at https://github.com/HopkinsLaboratory/DispersionCurveGUI.