MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
<p dir="ltr">Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic pred...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , , , , , |
| منشور في: |
2023
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1864513507200860160 |
|---|---|
| author | Fang Ge (1533166) |
| author2 | Muhammad Arif (769250) Zihao Yan (5047112) Hanin Alahmadi (17372413) Apilak Worachartcheewan (422620) Dong-Jun Yu (630659) Watshara Shoombuatong (453384) |
| author2_role | author author author author author author |
| author_facet | Fang Ge (1533166) Muhammad Arif (769250) Zihao Yan (5047112) Hanin Alahmadi (17372413) Apilak Worachartcheewan (422620) Dong-Jun Yu (630659) Watshara Shoombuatong (453384) |
| author_role | author |
| dc.creator.none.fl_str_mv | Fang Ge (1533166) Muhammad Arif (769250) Zihao Yan (5047112) Hanin Alahmadi (17372413) Apilak Worachartcheewan (422620) Dong-Jun Yu (630659) Watshara Shoombuatong (453384) |
| dc.date.none.fl_str_mv | 2023-11-10T09:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1021/acs.jcim.3c00950 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/MMPatho_Leveraging_Multilevel_Consensus_and_Evolutionary_Information_for_Enhanced_Missense_Mutation_Pathogenic_Prediction/26808652 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Biological sciences Genetics Information and computing sciences Machine learning Missense Mutation (MM) Pathogenicity Computational Approach MMPatho Amino Acid-Level Features Genome-Level Features Protein Sequences |
| dc.title.none.fl_str_mv | MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p dir="ltr">Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. First, we established a large-scale nonredundant MM benchmark data set based on the entire Ensembl database, complemented by a focused blind test set specifically for pathogenic GOF/LOF MM. Based on this data set, for each mutation, we utilized Ensembl VEP v104 and dbNSFP v4.1a to extract variant-level, amino acid-level, individuals’ outputs, and genome-level features. Additionally, protein sequences were generated using ENSP identifiers with the Ensembl API, and then encoded. The mutant sites’ ESM-1b and ProtTrans-T5 embeddings were subsequently extracted. Then, our model group (MMPatho) was developed by leveraging upon these efforts, which comprised ConsMM and EvoIndMM. To be specific, ConsMM employs individuals’ outputs and XGBoost with SHAP explanation analysis, while EvoIndMM investigates the potential enhancement of predictive capability by incorporating evolutionary information from ESM-1b and ProtT5-XL-U50, large protein language embeddings. Through rigorous comparative experiments, both ConsMM and EvoIndMM were capable of achieving remarkable AUROC (0.9836 and 0.9854) and AUPR (0.9852 and 0.9902) values on the blind test set devoid of overlapping variations and proteins from the training data, thus highlighting the superiority of our computational approach in the prediction of MM pathogenicity.</p><h2>Other Information</h2><p dir="ltr">Published in: Journal of Chemical Information and Modeling<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1021/acs.jcim.3c00950" target="_blank">https://dx.doi.org/10.1021/acs.jcim.3c00950</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_fd3d174694f8201647c63da044825ffa |
| identifier_str_mv | 10.1021/acs.jcim.3c00950 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/26808652 |
| publishDate | 2023 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic PredictionFang Ge (1533166)Muhammad Arif (769250)Zihao Yan (5047112)Hanin Alahmadi (17372413)Apilak Worachartcheewan (422620)Dong-Jun Yu (630659)Watshara Shoombuatong (453384)Biological sciencesGeneticsInformation and computing sciencesMachine learningMissense Mutation (MM)PathogenicityComputational ApproachMMPathoAmino Acid-Level FeaturesGenome-Level FeaturesProtein Sequences<p dir="ltr">Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. First, we established a large-scale nonredundant MM benchmark data set based on the entire Ensembl database, complemented by a focused blind test set specifically for pathogenic GOF/LOF MM. Based on this data set, for each mutation, we utilized Ensembl VEP v104 and dbNSFP v4.1a to extract variant-level, amino acid-level, individuals’ outputs, and genome-level features. Additionally, protein sequences were generated using ENSP identifiers with the Ensembl API, and then encoded. The mutant sites’ ESM-1b and ProtTrans-T5 embeddings were subsequently extracted. Then, our model group (MMPatho) was developed by leveraging upon these efforts, which comprised ConsMM and EvoIndMM. To be specific, ConsMM employs individuals’ outputs and XGBoost with SHAP explanation analysis, while EvoIndMM investigates the potential enhancement of predictive capability by incorporating evolutionary information from ESM-1b and ProtT5-XL-U50, large protein language embeddings. Through rigorous comparative experiments, both ConsMM and EvoIndMM were capable of achieving remarkable AUROC (0.9836 and 0.9854) and AUPR (0.9852 and 0.9902) values on the blind test set devoid of overlapping variations and proteins from the training data, thus highlighting the superiority of our computational approach in the prediction of MM pathogenicity.</p><h2>Other Information</h2><p dir="ltr">Published in: Journal of Chemical Information and Modeling<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1021/acs.jcim.3c00950" target="_blank">https://dx.doi.org/10.1021/acs.jcim.3c00950</a></p>2023-11-10T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1021/acs.jcim.3c00950https://figshare.com/articles/journal_contribution/MMPatho_Leveraging_Multilevel_Consensus_and_Evolutionary_Information_for_Enhanced_Missense_Mutation_Pathogenic_Prediction/26808652CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/268086522023-11-10T09:00:00Z |
| spellingShingle | MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction Fang Ge (1533166) Biological sciences Genetics Information and computing sciences Machine learning Missense Mutation (MM) Pathogenicity Computational Approach MMPatho Amino Acid-Level Features Genome-Level Features Protein Sequences |
| status_str | publishedVersion |
| title | MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction |
| title_full | MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction |
| title_fullStr | MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction |
| title_full_unstemmed | MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction |
| title_short | MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction |
| title_sort | MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction |
| topic | Biological sciences Genetics Information and computing sciences Machine learning Missense Mutation (MM) Pathogenicity Computational Approach MMPatho Amino Acid-Level Features Genome-Level Features Protein Sequences |