Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing

Alternative splicing-mediated protein N-terminal sequence variation is closely associated with diseases, but its identification by mass spectrometry faces technical bottlenecks. Traditional proteogenomic methods cannot identify novel N-terminal proteins undetected in transcriptome data, while <i&...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Cuitong He (5446760) (author)
مؤلفون آخرون: Ke Su (602528) (author), Huanju Liu (18445044) (author), Lihao Jin (21802403) (author), Tingting Xu (307597) (author), Chenyang Mu (21802406) (author), Fu Yang (101882) (author)
منشور في: 2025
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1852018138960363520
author Cuitong He (5446760)
author2 Ke Su (602528)
Huanju Liu (18445044)
Lihao Jin (21802403)
Tingting Xu (307597)
Chenyang Mu (21802406)
Fu Yang (101882)
author2_role author
author
author
author
author
author
author_facet Cuitong He (5446760)
Ke Su (602528)
Huanju Liu (18445044)
Lihao Jin (21802403)
Tingting Xu (307597)
Chenyang Mu (21802406)
Fu Yang (101882)
author_role author
dc.creator.none.fl_str_mv Cuitong He (5446760)
Ke Su (602528)
Huanju Liu (18445044)
Lihao Jin (21802403)
Tingting Xu (307597)
Chenyang Mu (21802406)
Fu Yang (101882)
dc.date.none.fl_str_mv 2025-07-29T03:13:47Z
dc.identifier.none.fl_str_mv 10.1021/acs.analchem.5c02498.s002
dc.relation.none.fl_str_mv https://figshare.com/articles/dataset/Precise_Discovery_of_Novel_N_Terminal_Proteoforms_beyond_the_Limitations_of_Proteogenomics_and_i_De_Novo_i_Sequencing/29662211
dc.rights.none.fl_str_mv CC BY-NC 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biophysics
Biochemistry
Genetics
Molecular Biology
Biotechnology
Developmental Biology
Infectious Diseases
Plant Biology
Virology
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
recently reported aag
peptide encoding mechanisms
parse signature peptides
first dedicated algorithm
enables tracing back
de novo </
mediated protein n
terminal sequence variation
terminal extension peptides
mass spectrometry data
human genes calm2
validated novel n
initiated novel n
terminal proteins undetected
sequencing alternative splicing
scale proteogenomics failed
novel n
terminal proteins
terminal proteoforms
transcriptome data
human atp9a
translational frameshifting
specifically designed
sequencing algorithms
rigorous validation
precise discovery
essential complement
conventional approaches
closely associated
actually generated
dc.title.none.fl_str_mv Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing
dc.type.none.fl_str_mv Dataset
info:eu-repo/semantics/publishedVersion
dataset
description Alternative splicing-mediated protein N-terminal sequence variation is closely associated with diseases, but its identification by mass spectrometry faces technical bottlenecks. Traditional proteogenomic methods cannot identify novel N-terminal proteins undetected in transcriptome data, while <i>de novo</i> sequencing has limitations in accuracy and traceability. To address this, we developed the first dedicated algorithm, NovelNSeq, which is specifically designed to parse signature peptides (novel N-terminal extension peptides) of novel N-terminal proteins from mass spectrometry data without relying on transcriptome data or <i>de novo</i> sequencing. NovelNSeq fully exploits peptide encoding rules, demonstrating significantly higher accuracy than <i>de novo</i> sequencing algorithms such as PEAKS, pNovo3, SpliceNovo, Casanovo, and InstaNovo, and enables tracing back the peptide encoding mechanisms. Using NovelNSeq, we identified and validated novel N-terminal proteoforms from human genes CALM2, CAPNS1, and CPNE7 in mass spectrometry data where large-scale proteogenomics failed to detect them, which establishes NovelNSeq as an essential complement to conventional approaches. Furthermore, we revealed that a recently reported AAG-initiated novel N-terminus in human ATP9A is actually generated through translational frameshifting from the canonical ATG start codon, highlighting the need for rigorous validation of noncanonical start codon annotations in novel N-terminal proteoforms.
eu_rights_str_mv openAccess
id Manara_108123d89fcdaa70c0f0c021a874e7da
identifier_str_mv 10.1021/acs.analchem.5c02498.s002
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/29662211
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY-NC 4.0
spelling Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> SequencingCuitong He (5446760)Ke Su (602528)Huanju Liu (18445044)Lihao Jin (21802403)Tingting Xu (307597)Chenyang Mu (21802406)Fu Yang (101882)BiophysicsBiochemistryGeneticsMolecular BiologyBiotechnologyDevelopmental BiologyInfectious DiseasesPlant BiologyVirologyBiological Sciences not elsewhere classifiedChemical Sciences not elsewhere classifiedrecently reported aagpeptide encoding mechanismsparse signature peptidesfirst dedicated algorithmenables tracing backde novo </mediated protein nterminal sequence variationterminal extension peptidesmass spectrometry datahuman genes calm2validated novel ninitiated novel nterminal proteins undetectedsequencing alternative splicingscale proteogenomics failednovel nterminal proteinsterminal proteoformstranscriptome datahuman atp9atranslational frameshiftingspecifically designedsequencing algorithmsrigorous validationprecise discoveryessential complementconventional approachesclosely associatedactually generatedAlternative splicing-mediated protein N-terminal sequence variation is closely associated with diseases, but its identification by mass spectrometry faces technical bottlenecks. Traditional proteogenomic methods cannot identify novel N-terminal proteins undetected in transcriptome data, while <i>de novo</i> sequencing has limitations in accuracy and traceability. To address this, we developed the first dedicated algorithm, NovelNSeq, which is specifically designed to parse signature peptides (novel N-terminal extension peptides) of novel N-terminal proteins from mass spectrometry data without relying on transcriptome data or <i>de novo</i> sequencing. NovelNSeq fully exploits peptide encoding rules, demonstrating significantly higher accuracy than <i>de novo</i> sequencing algorithms such as PEAKS, pNovo3, SpliceNovo, Casanovo, and InstaNovo, and enables tracing back the peptide encoding mechanisms. Using NovelNSeq, we identified and validated novel N-terminal proteoforms from human genes CALM2, CAPNS1, and CPNE7 in mass spectrometry data where large-scale proteogenomics failed to detect them, which establishes NovelNSeq as an essential complement to conventional approaches. Furthermore, we revealed that a recently reported AAG-initiated novel N-terminus in human ATP9A is actually generated through translational frameshifting from the canonical ATG start codon, highlighting the need for rigorous validation of noncanonical start codon annotations in novel N-terminal proteoforms.2025-07-29T03:13:47ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1021/acs.analchem.5c02498.s002https://figshare.com/articles/dataset/Precise_Discovery_of_Novel_N_Terminal_Proteoforms_beyond_the_Limitations_of_Proteogenomics_and_i_De_Novo_i_Sequencing/29662211CC BY-NC 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/296622112025-07-29T03:13:47Z
spellingShingle Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing
Cuitong He (5446760)
Biophysics
Biochemistry
Genetics
Molecular Biology
Biotechnology
Developmental Biology
Infectious Diseases
Plant Biology
Virology
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
recently reported aag
peptide encoding mechanisms
parse signature peptides
first dedicated algorithm
enables tracing back
de novo </
mediated protein n
terminal sequence variation
terminal extension peptides
mass spectrometry data
human genes calm2
validated novel n
initiated novel n
terminal proteins undetected
sequencing alternative splicing
scale proteogenomics failed
novel n
terminal proteins
terminal proteoforms
transcriptome data
human atp9a
translational frameshifting
specifically designed
sequencing algorithms
rigorous validation
precise discovery
essential complement
conventional approaches
closely associated
actually generated
status_str publishedVersion
title Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing
title_full Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing
title_fullStr Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing
title_full_unstemmed Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing
title_short Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing
title_sort Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing
topic Biophysics
Biochemistry
Genetics
Molecular Biology
Biotechnology
Developmental Biology
Infectious Diseases
Plant Biology
Virology
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
recently reported aag
peptide encoding mechanisms
parse signature peptides
first dedicated algorithm
enables tracing back
de novo </
mediated protein n
terminal sequence variation
terminal extension peptides
mass spectrometry data
human genes calm2
validated novel n
initiated novel n
terminal proteins undetected
sequencing alternative splicing
scale proteogenomics failed
novel n
terminal proteins
terminal proteoforms
transcriptome data
human atp9a
translational frameshifting
specifically designed
sequencing algorithms
rigorous validation
precise discovery
essential complement
conventional approaches
closely associated
actually generated