Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing
Alternative splicing-mediated protein N-terminal sequence variation is closely associated with diseases, but its identification by mass spectrometry faces technical bottlenecks. Traditional proteogenomic methods cannot identify novel N-terminal proteins undetected in transcriptome data, while <i&...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , , , , , |
| منشور في: |
2025
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1852018138960363520 |
|---|---|
| author | Cuitong He (5446760) |
| author2 | Ke Su (602528) Huanju Liu (18445044) Lihao Jin (21802403) Tingting Xu (307597) Chenyang Mu (21802406) Fu Yang (101882) |
| author2_role | author author author author author author |
| author_facet | Cuitong He (5446760) Ke Su (602528) Huanju Liu (18445044) Lihao Jin (21802403) Tingting Xu (307597) Chenyang Mu (21802406) Fu Yang (101882) |
| author_role | author |
| dc.creator.none.fl_str_mv | Cuitong He (5446760) Ke Su (602528) Huanju Liu (18445044) Lihao Jin (21802403) Tingting Xu (307597) Chenyang Mu (21802406) Fu Yang (101882) |
| dc.date.none.fl_str_mv | 2025-07-29T03:13:47Z |
| dc.identifier.none.fl_str_mv | 10.1021/acs.analchem.5c02498.s002 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/dataset/Precise_Discovery_of_Novel_N_Terminal_Proteoforms_beyond_the_Limitations_of_Proteogenomics_and_i_De_Novo_i_Sequencing/29662211 |
| dc.rights.none.fl_str_mv | CC BY-NC 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Biophysics Biochemistry Genetics Molecular Biology Biotechnology Developmental Biology Infectious Diseases Plant Biology Virology Biological Sciences not elsewhere classified Chemical Sciences not elsewhere classified recently reported aag peptide encoding mechanisms parse signature peptides first dedicated algorithm enables tracing back de novo </ mediated protein n terminal sequence variation terminal extension peptides mass spectrometry data human genes calm2 validated novel n initiated novel n terminal proteins undetected sequencing alternative splicing scale proteogenomics failed novel n terminal proteins terminal proteoforms transcriptome data human atp9a translational frameshifting specifically designed sequencing algorithms rigorous validation precise discovery essential complement conventional approaches closely associated actually generated |
| dc.title.none.fl_str_mv | Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing |
| dc.type.none.fl_str_mv | Dataset info:eu-repo/semantics/publishedVersion dataset |
| description | Alternative splicing-mediated protein N-terminal sequence variation is closely associated with diseases, but its identification by mass spectrometry faces technical bottlenecks. Traditional proteogenomic methods cannot identify novel N-terminal proteins undetected in transcriptome data, while <i>de novo</i> sequencing has limitations in accuracy and traceability. To address this, we developed the first dedicated algorithm, NovelNSeq, which is specifically designed to parse signature peptides (novel N-terminal extension peptides) of novel N-terminal proteins from mass spectrometry data without relying on transcriptome data or <i>de novo</i> sequencing. NovelNSeq fully exploits peptide encoding rules, demonstrating significantly higher accuracy than <i>de novo</i> sequencing algorithms such as PEAKS, pNovo3, SpliceNovo, Casanovo, and InstaNovo, and enables tracing back the peptide encoding mechanisms. Using NovelNSeq, we identified and validated novel N-terminal proteoforms from human genes CALM2, CAPNS1, and CPNE7 in mass spectrometry data where large-scale proteogenomics failed to detect them, which establishes NovelNSeq as an essential complement to conventional approaches. Furthermore, we revealed that a recently reported AAG-initiated novel N-terminus in human ATP9A is actually generated through translational frameshifting from the canonical ATG start codon, highlighting the need for rigorous validation of noncanonical start codon annotations in novel N-terminal proteoforms. |
| eu_rights_str_mv | openAccess |
| id | Manara_108123d89fcdaa70c0f0c021a874e7da |
| identifier_str_mv | 10.1021/acs.analchem.5c02498.s002 |
| network_acronym_str | Manara |
| network_name_str | ManaraRepo |
| oai_identifier_str | oai:figshare.com:article/29662211 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY-NC 4.0 |
| spelling | Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> SequencingCuitong He (5446760)Ke Su (602528)Huanju Liu (18445044)Lihao Jin (21802403)Tingting Xu (307597)Chenyang Mu (21802406)Fu Yang (101882)BiophysicsBiochemistryGeneticsMolecular BiologyBiotechnologyDevelopmental BiologyInfectious DiseasesPlant BiologyVirologyBiological Sciences not elsewhere classifiedChemical Sciences not elsewhere classifiedrecently reported aagpeptide encoding mechanismsparse signature peptidesfirst dedicated algorithmenables tracing backde novo </mediated protein nterminal sequence variationterminal extension peptidesmass spectrometry datahuman genes calm2validated novel ninitiated novel nterminal proteins undetectedsequencing alternative splicingscale proteogenomics failednovel nterminal proteinsterminal proteoformstranscriptome datahuman atp9atranslational frameshiftingspecifically designedsequencing algorithmsrigorous validationprecise discoveryessential complementconventional approachesclosely associatedactually generatedAlternative splicing-mediated protein N-terminal sequence variation is closely associated with diseases, but its identification by mass spectrometry faces technical bottlenecks. Traditional proteogenomic methods cannot identify novel N-terminal proteins undetected in transcriptome data, while <i>de novo</i> sequencing has limitations in accuracy and traceability. To address this, we developed the first dedicated algorithm, NovelNSeq, which is specifically designed to parse signature peptides (novel N-terminal extension peptides) of novel N-terminal proteins from mass spectrometry data without relying on transcriptome data or <i>de novo</i> sequencing. NovelNSeq fully exploits peptide encoding rules, demonstrating significantly higher accuracy than <i>de novo</i> sequencing algorithms such as PEAKS, pNovo3, SpliceNovo, Casanovo, and InstaNovo, and enables tracing back the peptide encoding mechanisms. Using NovelNSeq, we identified and validated novel N-terminal proteoforms from human genes CALM2, CAPNS1, and CPNE7 in mass spectrometry data where large-scale proteogenomics failed to detect them, which establishes NovelNSeq as an essential complement to conventional approaches. Furthermore, we revealed that a recently reported AAG-initiated novel N-terminus in human ATP9A is actually generated through translational frameshifting from the canonical ATG start codon, highlighting the need for rigorous validation of noncanonical start codon annotations in novel N-terminal proteoforms.2025-07-29T03:13:47ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1021/acs.analchem.5c02498.s002https://figshare.com/articles/dataset/Precise_Discovery_of_Novel_N_Terminal_Proteoforms_beyond_the_Limitations_of_Proteogenomics_and_i_De_Novo_i_Sequencing/29662211CC BY-NC 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/296622112025-07-29T03:13:47Z |
| spellingShingle | Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing Cuitong He (5446760) Biophysics Biochemistry Genetics Molecular Biology Biotechnology Developmental Biology Infectious Diseases Plant Biology Virology Biological Sciences not elsewhere classified Chemical Sciences not elsewhere classified recently reported aag peptide encoding mechanisms parse signature peptides first dedicated algorithm enables tracing back de novo </ mediated protein n terminal sequence variation terminal extension peptides mass spectrometry data human genes calm2 validated novel n initiated novel n terminal proteins undetected sequencing alternative splicing scale proteogenomics failed novel n terminal proteins terminal proteoforms transcriptome data human atp9a translational frameshifting specifically designed sequencing algorithms rigorous validation precise discovery essential complement conventional approaches closely associated actually generated |
| status_str | publishedVersion |
| title | Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing |
| title_full | Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing |
| title_fullStr | Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing |
| title_full_unstemmed | Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing |
| title_short | Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing |
| title_sort | Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing |
| topic | Biophysics Biochemistry Genetics Molecular Biology Biotechnology Developmental Biology Infectious Diseases Plant Biology Virology Biological Sciences not elsewhere classified Chemical Sciences not elsewhere classified recently reported aag peptide encoding mechanisms parse signature peptides first dedicated algorithm enables tracing back de novo </ mediated protein n terminal sequence variation terminal extension peptides mass spectrometry data human genes calm2 validated novel n initiated novel n terminal proteins undetected sequencing alternative splicing scale proteogenomics failed novel n terminal proteins terminal proteoforms transcriptome data human atp9a translational frameshifting specifically designed sequencing algorithms rigorous validation precise discovery essential complement conventional approaches closely associated actually generated |