Precise Discovery of Novel N‑Terminal Proteoforms beyond the Limitations of Proteogenomics and <i>De Novo</i> Sequencing

Alternative splicing-mediated protein N-terminal sequence variation is closely associated with diseases, but its identification by mass spectrometry faces technical bottlenecks. Traditional proteogenomic methods cannot identify novel N-terminal proteins undetected in transcriptome data, while <i&...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Cuitong He (5446760) (author)
مؤلفون آخرون: Ke Su (602528) (author), Huanju Liu (18445044) (author), Lihao Jin (21802403) (author), Tingting Xu (307597) (author), Chenyang Mu (21802406) (author), Fu Yang (101882) (author)
منشور في: 2025
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:Alternative splicing-mediated protein N-terminal sequence variation is closely associated with diseases, but its identification by mass spectrometry faces technical bottlenecks. Traditional proteogenomic methods cannot identify novel N-terminal proteins undetected in transcriptome data, while <i>de novo</i> sequencing has limitations in accuracy and traceability. To address this, we developed the first dedicated algorithm, NovelNSeq, which is specifically designed to parse signature peptides (novel N-terminal extension peptides) of novel N-terminal proteins from mass spectrometry data without relying on transcriptome data or <i>de novo</i> sequencing. NovelNSeq fully exploits peptide encoding rules, demonstrating significantly higher accuracy than <i>de novo</i> sequencing algorithms such as PEAKS, pNovo3, SpliceNovo, Casanovo, and InstaNovo, and enables tracing back the peptide encoding mechanisms. Using NovelNSeq, we identified and validated novel N-terminal proteoforms from human genes CALM2, CAPNS1, and CPNE7 in mass spectrometry data where large-scale proteogenomics failed to detect them, which establishes NovelNSeq as an essential complement to conventional approaches. Furthermore, we revealed that a recently reported AAG-initiated novel N-terminus in human ATP9A is actually generated through translational frameshifting from the canonical ATG start codon, highlighting the need for rigorous validation of noncanonical start codon annotations in novel N-terminal proteoforms.