MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction

<p dir="ltr">Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic pred...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Fang Ge (1533166) (author)
مؤلفون آخرون: Muhammad Arif (769250) (author), Zihao Yan (5047112) (author), Hanin Alahmadi (17372413) (author), Apilak Worachartcheewan (422620) (author), Dong-Jun Yu (630659) (author), Watshara Shoombuatong (453384) (author)
منشور في: 2023
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513507200860160
author Fang Ge (1533166)
author2 Muhammad Arif (769250)
Zihao Yan (5047112)
Hanin Alahmadi (17372413)
Apilak Worachartcheewan (422620)
Dong-Jun Yu (630659)
Watshara Shoombuatong (453384)
author2_role author
author
author
author
author
author
author_facet Fang Ge (1533166)
Muhammad Arif (769250)
Zihao Yan (5047112)
Hanin Alahmadi (17372413)
Apilak Worachartcheewan (422620)
Dong-Jun Yu (630659)
Watshara Shoombuatong (453384)
author_role author
dc.creator.none.fl_str_mv Fang Ge (1533166)
Muhammad Arif (769250)
Zihao Yan (5047112)
Hanin Alahmadi (17372413)
Apilak Worachartcheewan (422620)
Dong-Jun Yu (630659)
Watshara Shoombuatong (453384)
dc.date.none.fl_str_mv 2023-11-10T09:00:00Z
dc.identifier.none.fl_str_mv 10.1021/acs.jcim.3c00950
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/MMPatho_Leveraging_Multilevel_Consensus_and_Evolutionary_Information_for_Enhanced_Missense_Mutation_Pathogenic_Prediction/26808652
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biological sciences
Genetics
Information and computing sciences
Machine learning
Missense Mutation (MM)
Pathogenicity
Computational Approach
MMPatho
Amino Acid-Level Features
Genome-Level Features
Protein Sequences
dc.title.none.fl_str_mv MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p dir="ltr">Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. First, we established a large-scale nonredundant MM benchmark data set based on the entire Ensembl database, complemented by a focused blind test set specifically for pathogenic GOF/LOF MM. Based on this data set, for each mutation, we utilized Ensembl VEP v104 and dbNSFP v4.1a to extract variant-level, amino acid-level, individuals’ outputs, and genome-level features. Additionally, protein sequences were generated using ENSP identifiers with the Ensembl API, and then encoded. The mutant sites’ ESM-1b and ProtTrans-T5 embeddings were subsequently extracted. Then, our model group (MMPatho) was developed by leveraging upon these efforts, which comprised ConsMM and EvoIndMM. To be specific, ConsMM employs individuals’ outputs and XGBoost with SHAP explanation analysis, while EvoIndMM investigates the potential enhancement of predictive capability by incorporating evolutionary information from ESM-1b and ProtT5-XL-U50, large protein language embeddings. Through rigorous comparative experiments, both ConsMM and EvoIndMM were capable of achieving remarkable AUROC (0.9836 and 0.9854) and AUPR (0.9852 and 0.9902) values on the blind test set devoid of overlapping variations and proteins from the training data, thus highlighting the superiority of our computational approach in the prediction of MM pathogenicity.</p><h2>Other Information</h2><p dir="ltr">Published in: Journal of Chemical Information and Modeling<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1021/acs.jcim.3c00950" target="_blank">https://dx.doi.org/10.1021/acs.jcim.3c00950</a></p>
eu_rights_str_mv openAccess
id Manara2_fd3d174694f8201647c63da044825ffa
identifier_str_mv 10.1021/acs.jcim.3c00950
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/26808652
publishDate 2023
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic PredictionFang Ge (1533166)Muhammad Arif (769250)Zihao Yan (5047112)Hanin Alahmadi (17372413)Apilak Worachartcheewan (422620)Dong-Jun Yu (630659)Watshara Shoombuatong (453384)Biological sciencesGeneticsInformation and computing sciencesMachine learningMissense Mutation (MM)PathogenicityComputational ApproachMMPathoAmino Acid-Level FeaturesGenome-Level FeaturesProtein Sequences<p dir="ltr">Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. First, we established a large-scale nonredundant MM benchmark data set based on the entire Ensembl database, complemented by a focused blind test set specifically for pathogenic GOF/LOF MM. Based on this data set, for each mutation, we utilized Ensembl VEP v104 and dbNSFP v4.1a to extract variant-level, amino acid-level, individuals’ outputs, and genome-level features. Additionally, protein sequences were generated using ENSP identifiers with the Ensembl API, and then encoded. The mutant sites’ ESM-1b and ProtTrans-T5 embeddings were subsequently extracted. Then, our model group (MMPatho) was developed by leveraging upon these efforts, which comprised ConsMM and EvoIndMM. To be specific, ConsMM employs individuals’ outputs and XGBoost with SHAP explanation analysis, while EvoIndMM investigates the potential enhancement of predictive capability by incorporating evolutionary information from ESM-1b and ProtT5-XL-U50, large protein language embeddings. Through rigorous comparative experiments, both ConsMM and EvoIndMM were capable of achieving remarkable AUROC (0.9836 and 0.9854) and AUPR (0.9852 and 0.9902) values on the blind test set devoid of overlapping variations and proteins from the training data, thus highlighting the superiority of our computational approach in the prediction of MM pathogenicity.</p><h2>Other Information</h2><p dir="ltr">Published in: Journal of Chemical Information and Modeling<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1021/acs.jcim.3c00950" target="_blank">https://dx.doi.org/10.1021/acs.jcim.3c00950</a></p>2023-11-10T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1021/acs.jcim.3c00950https://figshare.com/articles/journal_contribution/MMPatho_Leveraging_Multilevel_Consensus_and_Evolutionary_Information_for_Enhanced_Missense_Mutation_Pathogenic_Prediction/26808652CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/268086522023-11-10T09:00:00Z
spellingShingle MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
Fang Ge (1533166)
Biological sciences
Genetics
Information and computing sciences
Machine learning
Missense Mutation (MM)
Pathogenicity
Computational Approach
MMPatho
Amino Acid-Level Features
Genome-Level Features
Protein Sequences
status_str publishedVersion
title MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
title_full MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
title_fullStr MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
title_full_unstemmed MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
title_short MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
title_sort MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
topic Biological sciences
Genetics
Information and computing sciences
Machine learning
Missense Mutation (MM)
Pathogenicity
Computational Approach
MMPatho
Amino Acid-Level Features
Genome-Level Features
Protein Sequences