Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing
Alternative splicing-mediated protein N-terminal sequence variation is closely associated with diseases, but its identification by mass spectrometry faces technical bottlenecks. Traditional proteogenomic methods cannot identify novel N-terminal proteins undetected in transcriptome data, while <i&...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , , , , , |
| منشور في: |
2025
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| الملخص: | Alternative splicing-mediated protein N-terminal sequence variation is closely associated with diseases, but its identification by mass spectrometry faces technical bottlenecks. Traditional proteogenomic methods cannot identify novel N-terminal proteins undetected in transcriptome data, while <i>de novo</i> sequencing has limitations in accuracy and traceability. To address this, we developed the first dedicated algorithm, NovelNSeq, which is specifically designed to parse signature peptides (novel N-terminal extension peptides) of novel N-terminal proteins from mass spectrometry data without relying on transcriptome data or <i>de novo</i> sequencing. NovelNSeq fully exploits peptide encoding rules, demonstrating significantly higher accuracy than <i>de novo</i> sequencing algorithms such as PEAKS, pNovo3, SpliceNovo, Casanovo, and InstaNovo, and enables tracing back the peptide encoding mechanisms. Using NovelNSeq, we identified and validated novel N-terminal proteoforms from human genes CALM2, CAPNS1, and CPNE7 in mass spectrometry data where large-scale proteogenomics failed to detect them, which establishes NovelNSeq as an essential complement to conventional approaches. Furthermore, we revealed that a recently reported AAG-initiated novel N-terminus in human ATP9A is actually generated through translational frameshifting from the canonical ATG start codon, highlighting the need for rigorous validation of noncanonical start codon annotations in novel N-terminal proteoforms. |
|---|