High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch
High-level programming languages such as Python and R are widely used in mass spectrometry data processing, where library searching is a standard step. Despite the availability of numerous library search algorithms, those developed by NIST and implemented in MS Search remain predominant, partly beca...
Bewaard in:
| Hoofdauteur: | |
|---|---|
| Andere auteurs: | |
| Gepubliceerd in: |
2025
|
| Onderwerpen: | |
| Tags: |
Voeg label toe
Geen labels, Wees de eerste die dit record labelt!
|
| _version_ | 1849927625425813504 |
|---|---|
| author | Andrey Samokhin (20282728) |
| author2 | Mikhail Khrisanfov (22683809) |
| author2_role | author |
| author_facet | Andrey Samokhin (20282728) Mikhail Khrisanfov (22683809) |
| author_role | author |
| dc.creator.none.fl_str_mv | Andrey Samokhin (20282728) Mikhail Khrisanfov (22683809) |
| dc.date.none.fl_str_mv | 2025-11-25T19:13:50Z |
| dc.identifier.none.fl_str_mv | 10.1021/jasms.5c00322.s001 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/dataset/High-Throughput_Mass_Spectral_Library_Searching_of_Small_Molecules_in_R_with_NIST_MSPepSearch/30715313 |
| dc.rights.none.fl_str_mv | CC BY-NC 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Biophysics Biochemistry Genetics Cancer Inorganic Chemistry Biological Sciences not elsewhere classified Chemical Sciences not elsewhere classified Information Systems not elsewhere classified use requires calling untargeted gas chromatography proprietary formats inaccessible mass spectrometry analysis level programming languages running multiple instances g ., nist another nist tool nist mspepsearch high threaded tool multiple flags widely used standard step small molecules retrieve results provides access multistep workflows line interface like compounds library searching library searches greater flexibility custom code commercial databases biological samples achieved externally |
| dc.title.none.fl_str_mv | High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch |
| dc.type.none.fl_str_mv | Dataset info:eu-repo/semantics/publishedVersion dataset |
| description | High-level programming languages such as Python and R are widely used in mass spectrometry data processing, where library searching is a standard step. Despite the availability of numerous library search algorithms, those developed by NIST and implemented in MS Search remain predominant, partly because commercial databases (e.g., NIST, Wiley) are distributed in proprietary formats inaccessible to custom code. MSPepSearch, another NIST tool, provides access to the same algorithms with greater flexibility for automation. However, its use requires calling a command-line interface with multiple flags and parsing output text files to retrieve results, which can be cumbersome. To address this, we developed mspepsearchr, an R package that streamlines the integration of library searches against NIST-format mass spectral databases into complex, multistep workflows. MSPepSearch is a single-threaded tool; therefore, parallelization was achieved externally by running multiple instances from within R. We describe the package, evaluate its performance, and illustrate its utility through the recognition of steroid-like compounds in untargeted gas chromatography-mass spectrometry analysis of biological samples. |
| eu_rights_str_mv | openAccess |
| id | Manara_2276f1b963fba0e28676b01a5e7c5169 |
| identifier_str_mv | 10.1021/jasms.5c00322.s001 |
| network_acronym_str | Manara |
| network_name_str | ManaraRepo |
| oai_identifier_str | oai:figshare.com:article/30715313 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY-NC 4.0 |
| spelling | High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearchAndrey Samokhin (20282728)Mikhail Khrisanfov (22683809)BiophysicsBiochemistryGeneticsCancerInorganic ChemistryBiological Sciences not elsewhere classifiedChemical Sciences not elsewhere classifiedInformation Systems not elsewhere classifieduse requires callinguntargeted gas chromatographyproprietary formats inaccessiblemass spectrometry analysislevel programming languagesrunning multiple instancesg ., nistanother nist toolnist mspepsearch highthreaded toolmultiple flagswidely usedstandard stepsmall moleculesretrieve resultsprovides accessmultistep workflowsline interfacelike compoundslibrary searchinglibrary searchesgreater flexibilitycustom codecommercial databasesbiological samplesachieved externallyHigh-level programming languages such as Python and R are widely used in mass spectrometry data processing, where library searching is a standard step. Despite the availability of numerous library search algorithms, those developed by NIST and implemented in MS Search remain predominant, partly because commercial databases (e.g., NIST, Wiley) are distributed in proprietary formats inaccessible to custom code. MSPepSearch, another NIST tool, provides access to the same algorithms with greater flexibility for automation. However, its use requires calling a command-line interface with multiple flags and parsing output text files to retrieve results, which can be cumbersome. To address this, we developed mspepsearchr, an R package that streamlines the integration of library searches against NIST-format mass spectral databases into complex, multistep workflows. MSPepSearch is a single-threaded tool; therefore, parallelization was achieved externally by running multiple instances from within R. We describe the package, evaluate its performance, and illustrate its utility through the recognition of steroid-like compounds in untargeted gas chromatography-mass spectrometry analysis of biological samples.2025-11-25T19:13:50ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1021/jasms.5c00322.s001https://figshare.com/articles/dataset/High-Throughput_Mass_Spectral_Library_Searching_of_Small_Molecules_in_R_with_NIST_MSPepSearch/30715313CC BY-NC 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/307153132025-11-25T19:13:50Z |
| spellingShingle | High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch Andrey Samokhin (20282728) Biophysics Biochemistry Genetics Cancer Inorganic Chemistry Biological Sciences not elsewhere classified Chemical Sciences not elsewhere classified Information Systems not elsewhere classified use requires calling untargeted gas chromatography proprietary formats inaccessible mass spectrometry analysis level programming languages running multiple instances g ., nist another nist tool nist mspepsearch high threaded tool multiple flags widely used standard step small molecules retrieve results provides access multistep workflows line interface like compounds library searching library searches greater flexibility custom code commercial databases biological samples achieved externally |
| status_str | publishedVersion |
| title | High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch |
| title_full | High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch |
| title_fullStr | High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch |
| title_full_unstemmed | High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch |
| title_short | High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch |
| title_sort | High-Throughput Mass Spectral Library Searching of Small Molecules in R with NIST MSPepSearch |
| topic | Biophysics Biochemistry Genetics Cancer Inorganic Chemistry Biological Sciences not elsewhere classified Chemical Sciences not elsewhere classified Information Systems not elsewhere classified use requires calling untargeted gas chromatography proprietary formats inaccessible mass spectrometry analysis level programming languages running multiple instances g ., nist another nist tool nist mspepsearch high threaded tool multiple flags widely used standard step small molecules retrieve results provides access multistep workflows line interface like compounds library searching library searches greater flexibility custom code commercial databases biological samples achieved externally |