A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data

<p>The Negative Binomial distribution (NBD) is used for modeling many types of count data, including gene expression counts obtained by RNA sequencing technologies (RNA-Seq). Finding outliers in this type of data has been shown in recent research to help in identifying rare genetic disorders i...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Edin Salkovic (16891479) (author)
مؤلفون آخرون: Halima Bensmail (10400) (author)
منشور في: 2021
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513560296554496
author Edin Salkovic (16891479)
author2 Halima Bensmail (10400)
author2_role author
author_facet Edin Salkovic (16891479)
Halima Bensmail (10400)
author_role author
dc.creator.none.fl_str_mv Edin Salkovic (16891479)
Halima Bensmail (10400)
dc.date.none.fl_str_mv 2021-05-20T00:00:00Z
dc.identifier.none.fl_str_mv 10.1109/access.2021.3082311
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/A_Novel_Bayesian_Outlier_Score_Based_on_the_Negative_Binomial_Distribution_for_Detecting_Aberrantly_Expressed_Genes_in_RNA-Seq_Gene_Expression_Count_Data/24042444
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biological sciences
Bioinformatics and computational biology
Genetics
Information and computing sciences
Data management and data science
Mathematical sciences
Statistics
Data models
Gene expression
Bayes methods
Computational modeling
Anomaly detection
Dispersion
Sequential analysis
Bayesian model
Bayesian outlier score
Gibbs sampling
Mendelian disorder
Negative binomial distribution
Outlier detection
Rare disease
RNA-Seq
dc.title.none.fl_str_mv A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p>The Negative Binomial distribution (NBD) is used for modeling many types of count data, including gene expression counts obtained by RNA sequencing technologies (RNA-Seq). Finding outliers in this type of data has been shown in recent research to help in identifying rare genetic disorders in humans. Existing Bayesian approaches to detecting outliers in data following the NBD are either computationally inefficient or too general and hence do not leverage the NBD's specificities in an optimal way. We present a novel Bayesian outlier score for data following the NBD, relying on recent advances in the inference of its dispersion parameter through a special method of Gibbs sampling. The novel Bayesian model on which our score is based - OutPyRX (Outlier detection in Python for RNA-Seq, eXtended version) - improves the model of its predecessor OutPyR by introducing novel parameters that are derived from OutPyR's. These novel parameters allow more than 6 times faster convergence of the novel outlier score compared to OutPyR's while having a negligible computational impact on the Gibbs sampling procedure. We show that, in terms of area under precision-recall curve (AUC) values, the novel score outcompetes existing scores on 21 out of 24 datasets that we derived from 4 real datasets by injecting artificial outliers. However, OutPyRX does not perform confounder control which is required for some datasets containing biological outliers. The model is general and can be applied to other similar count data. The code for our model is available at <a href="https://github.com/esalkovic/outpyrx" rel="nofollow">https://github.com/esalkovic/outpyrx</a>.</p><h2>Other Information</h2><p>Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/legalcode" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2021.3082311" target="_blank">https://dx.doi.org/10.1109/access.2021.3082311</a></p>
eu_rights_str_mv openAccess
id Manara2_a71ce69ccb88c94d6d1bfa0c4189d8dc
identifier_str_mv 10.1109/access.2021.3082311
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/24042444
publishDate 2021
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count DataEdin Salkovic (16891479)Halima Bensmail (10400)Biological sciencesBioinformatics and computational biologyGeneticsInformation and computing sciencesData management and data scienceMathematical sciencesStatisticsData modelsGene expressionBayes methodsComputational modelingAnomaly detectionDispersionSequential analysisBayesian modelBayesian outlier scoreGibbs samplingMendelian disorderNegative binomial distributionOutlier detectionRare diseaseRNA-Seq<p>The Negative Binomial distribution (NBD) is used for modeling many types of count data, including gene expression counts obtained by RNA sequencing technologies (RNA-Seq). Finding outliers in this type of data has been shown in recent research to help in identifying rare genetic disorders in humans. Existing Bayesian approaches to detecting outliers in data following the NBD are either computationally inefficient or too general and hence do not leverage the NBD's specificities in an optimal way. We present a novel Bayesian outlier score for data following the NBD, relying on recent advances in the inference of its dispersion parameter through a special method of Gibbs sampling. The novel Bayesian model on which our score is based - OutPyRX (Outlier detection in Python for RNA-Seq, eXtended version) - improves the model of its predecessor OutPyR by introducing novel parameters that are derived from OutPyR's. These novel parameters allow more than 6 times faster convergence of the novel outlier score compared to OutPyR's while having a negligible computational impact on the Gibbs sampling procedure. We show that, in terms of area under precision-recall curve (AUC) values, the novel score outcompetes existing scores on 21 out of 24 datasets that we derived from 4 real datasets by injecting artificial outliers. However, OutPyRX does not perform confounder control which is required for some datasets containing biological outliers. The model is general and can be applied to other similar count data. The code for our model is available at <a href="https://github.com/esalkovic/outpyrx" rel="nofollow">https://github.com/esalkovic/outpyrx</a>.</p><h2>Other Information</h2><p>Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/legalcode" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2021.3082311" target="_blank">https://dx.doi.org/10.1109/access.2021.3082311</a></p>2021-05-20T00:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2021.3082311https://figshare.com/articles/journal_contribution/A_Novel_Bayesian_Outlier_Score_Based_on_the_Negative_Binomial_Distribution_for_Detecting_Aberrantly_Expressed_Genes_in_RNA-Seq_Gene_Expression_Count_Data/24042444CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/240424442021-05-20T00:00:00Z
spellingShingle A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data
Edin Salkovic (16891479)
Biological sciences
Bioinformatics and computational biology
Genetics
Information and computing sciences
Data management and data science
Mathematical sciences
Statistics
Data models
Gene expression
Bayes methods
Computational modeling
Anomaly detection
Dispersion
Sequential analysis
Bayesian model
Bayesian outlier score
Gibbs sampling
Mendelian disorder
Negative binomial distribution
Outlier detection
Rare disease
RNA-Seq
status_str publishedVersion
title A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data
title_full A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data
title_fullStr A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data
title_full_unstemmed A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data
title_short A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data
title_sort A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data
topic Biological sciences
Bioinformatics and computational biology
Genetics
Information and computing sciences
Data management and data science
Mathematical sciences
Statistics
Data models
Gene expression
Bayes methods
Computational modeling
Anomaly detection
Dispersion
Sequential analysis
Bayesian model
Bayesian outlier score
Gibbs sampling
Mendelian disorder
Negative binomial distribution
Outlier detection
Rare disease
RNA-Seq