High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch

High-level programming languages such as Python and R are widely used in mass spectrometry data processing, where library searching is a standard step. Despite the availability of numerous library search algorithms, those developed by NIST and implemented in MS Search remain predominant, partly beca...

Volledige beschrijving

Bewaard in:
Bibliografische gegevens
Hoofdauteur: Andrey Samokhin (20282728) (author)
Andere auteurs: Mikhail Khrisanfov (22683809) (author)
Gepubliceerd in: 2025
Onderwerpen:
Tags: Voeg label toe
Geen labels, Wees de eerste die dit record labelt!
_version_ 1849927625425813504
author Andrey Samokhin (20282728)
author2 Mikhail Khrisanfov (22683809)
author2_role author
author_facet Andrey Samokhin (20282728)
Mikhail Khrisanfov (22683809)
author_role author
dc.creator.none.fl_str_mv Andrey Samokhin (20282728)
Mikhail Khrisanfov (22683809)
dc.date.none.fl_str_mv 2025-11-25T19:13:50Z
dc.identifier.none.fl_str_mv 10.1021/jasms.5c00322.s001
dc.relation.none.fl_str_mv https://figshare.com/articles/dataset/High-Throughput_Mass_Spectral_Library_Searching_of_Small_Molecules_in_R_with_NIST_MSPepSearch/30715313
dc.rights.none.fl_str_mv CC BY-NC 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biophysics
Biochemistry
Genetics
Cancer
Inorganic Chemistry
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
Information Systems not elsewhere classified
use requires calling
untargeted gas chromatography
proprietary formats inaccessible
mass spectrometry analysis
level programming languages
running multiple instances
g ., nist
another nist tool
nist mspepsearch high
threaded tool
multiple flags
widely used
standard step
small molecules
retrieve results
provides access
multistep workflows
line interface
like compounds
library searching
library searches
greater flexibility
custom code
commercial databases
biological samples
achieved externally
dc.title.none.fl_str_mv High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch
dc.type.none.fl_str_mv Dataset
info:eu-repo/semantics/publishedVersion
dataset
description High-level programming languages such as Python and R are widely used in mass spectrometry data processing, where library searching is a standard step. Despite the availability of numerous library search algorithms, those developed by NIST and implemented in MS Search remain predominant, partly because commercial databases (e.g., NIST, Wiley) are distributed in proprietary formats inaccessible to custom code. MSPepSearch, another NIST tool, provides access to the same algorithms with greater flexibility for automation. However, its use requires calling a command-line interface with multiple flags and parsing output text files to retrieve results, which can be cumbersome. To address this, we developed mspepsearchr, an R package that streamlines the integration of library searches against NIST-format mass spectral databases into complex, multistep workflows. MSPepSearch is a single-threaded tool; therefore, parallelization was achieved externally by running multiple instances from within R. We describe the package, evaluate its performance, and illustrate its utility through the recognition of steroid-like compounds in untargeted gas chromatography-mass spectrometry analysis of biological samples.
eu_rights_str_mv openAccess
id Manara_2276f1b963fba0e28676b01a5e7c5169
identifier_str_mv 10.1021/jasms.5c00322.s001
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/30715313
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY-NC 4.0
spelling High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearchAndrey Samokhin (20282728)Mikhail Khrisanfov (22683809)BiophysicsBiochemistryGeneticsCancerInorganic ChemistryBiological Sciences not elsewhere classifiedChemical Sciences not elsewhere classifiedInformation Systems not elsewhere classifieduse requires callinguntargeted gas chromatographyproprietary formats inaccessiblemass spectrometry analysislevel programming languagesrunning multiple instancesg ., nistanother nist toolnist mspepsearch highthreaded toolmultiple flagswidely usedstandard stepsmall moleculesretrieve resultsprovides accessmultistep workflowsline interfacelike compoundslibrary searchinglibrary searchesgreater flexibilitycustom codecommercial databasesbiological samplesachieved externallyHigh-level programming languages such as Python and R are widely used in mass spectrometry data processing, where library searching is a standard step. Despite the availability of numerous library search algorithms, those developed by NIST and implemented in MS Search remain predominant, partly because commercial databases (e.g., NIST, Wiley) are distributed in proprietary formats inaccessible to custom code. MSPepSearch, another NIST tool, provides access to the same algorithms with greater flexibility for automation. However, its use requires calling a command-line interface with multiple flags and parsing output text files to retrieve results, which can be cumbersome. To address this, we developed mspepsearchr, an R package that streamlines the integration of library searches against NIST-format mass spectral databases into complex, multistep workflows. MSPepSearch is a single-threaded tool; therefore, parallelization was achieved externally by running multiple instances from within R. We describe the package, evaluate its performance, and illustrate its utility through the recognition of steroid-like compounds in untargeted gas chromatography-mass spectrometry analysis of biological samples.2025-11-25T19:13:50ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1021/jasms.5c00322.s001https://figshare.com/articles/dataset/High-Throughput_Mass_Spectral_Library_Searching_of_Small_Molecules_in_R_with_NIST_MSPepSearch/30715313CC BY-NC 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/307153132025-11-25T19:13:50Z
spellingShingle High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch
Andrey Samokhin (20282728)
Biophysics
Biochemistry
Genetics
Cancer
Inorganic Chemistry
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
Information Systems not elsewhere classified
use requires calling
untargeted gas chromatography
proprietary formats inaccessible
mass spectrometry analysis
level programming languages
running multiple instances
g ., nist
another nist tool
nist mspepsearch high
threaded tool
multiple flags
widely used
standard step
small molecules
retrieve results
provides access
multistep workflows
line interface
like compounds
library searching
library searches
greater flexibility
custom code
commercial databases
biological samples
achieved externally
status_str publishedVersion
title High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch
title_full High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch
title_fullStr High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch
title_full_unstemmed High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch
title_short High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch
title_sort High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch
topic Biophysics
Biochemistry
Genetics
Cancer
Inorganic Chemistry
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
Information Systems not elsewhere classified
use requires calling
untargeted gas chromatography
proprietary formats inaccessible
mass spectrometry analysis
level programming languages
running multiple instances
g ., nist
another nist tool
nist mspepsearch high
threaded tool
multiple flags
widely used
standard step
small molecules
retrieve results
provides access
multistep workflows
line interface
like compounds
library searching
library searches
greater flexibility
custom code
commercial databases
biological samples
achieved externally