OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular values

<h3>Motivation</h3><p dir="ltr">Finding outliers in RNA-sequencing (RNA-Seq) gene expression (GE) can help in identifying genes that are aberrant and cause Mendelian disorders. Recently developed models for this task rely on modeling RNA-Seq GE data using the negative bin...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Edin Salkovic (16891479) (author)
مؤلفون آخرون: Mohammad Amin Sadeghi (8321631) (author), Abdelkader Baggag (16864140) (author), Ahmed Gamal Rashed Salem (17945072) (author), Halima Bensmail (10400) (author)
منشور في: 2023
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513527945887744
author Edin Salkovic (16891479)
author2 Mohammad Amin Sadeghi (8321631)
Abdelkader Baggag (16864140)
Ahmed Gamal Rashed Salem (17945072)
Halima Bensmail (10400)
author2_role author
author
author
author
author_facet Edin Salkovic (16891479)
Mohammad Amin Sadeghi (8321631)
Abdelkader Baggag (16864140)
Ahmed Gamal Rashed Salem (17945072)
Halima Bensmail (10400)
author_role author
dc.creator.none.fl_str_mv Edin Salkovic (16891479)
Mohammad Amin Sadeghi (8321631)
Abdelkader Baggag (16864140)
Ahmed Gamal Rashed Salem (17945072)
Halima Bensmail (10400)
dc.date.none.fl_str_mv 2023-03-22T03:00:00Z
dc.identifier.none.fl_str_mv 10.1093/bioinformatics/btad142
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/OutSingle_a_novel_method_of_detecting_and_injecting_outliers_in_RNA-Seq_count_data_using_the_optimal_hard_threshold_for_singular_values/25202120
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biological sciences
Biochemistry and cell biology
Mathematical sciences
Statistics
RNA-Seq count data
RNA-sequencing (RNA-Seq)
gene expression (GE)
negative binomial distribution (NBD)
dc.title.none.fl_str_mv OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular values
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <h3>Motivation</h3><p dir="ltr">Finding outliers in RNA-sequencing (RNA-Seq) gene expression (GE) can help in identifying genes that are aberrant and cause Mendelian disorders. Recently developed models for this task rely on modeling RNA-Seq GE data using the negative binomial distribution (NBD). However, some of those models either rely on procedures for inferring NBD’s parameters in a nonbiased way that are computationally demanding and thus make confounder control challenging, while others rely on less computationally demanding but biased procedures and convoluted confounder control approaches that hinder interpretability.</p><h3>Results</h3><p dir="ltr">In this article, we present OutSingle (Outlier detection using Singular Value Decomposition), an almost instantaneous way of detecting outliers in RNA-Seq GE data. It uses a simple log-normal approach for count modeling. For confounder control, it uses the recently discovered optimal hard threshold (OHT) method for noise detection, which itself is based on singular value decomposition (SVD). Due to its SVD/OHT utilization, OutSingle’s model is straightforward to understand and interpret. We then show that our novel method, when used on RNA-Seq GE data with real biological outliers masked by confounders, outcompetes the previous state-of-the-art model based on an ad hoc denoising autoencoder. Additionally, OutSingle can be used to inject artificial outliers masked by confounders, which is difficult to achieve with previous approaches. We describe a way of using OutSingle for outlier injection and proceed to show how OutSingle outperforms its competition on 16 out of 18 datasets that were generated from three real datasets using OutSingle’s injection procedure with different outlier types and magnitudes. Our methods are applicable to other types of similar problems involving finding outliers in matrices under the presence of confounders.</p><h3>Availability and implementation</h3><p dir="ltr">The code for OutSingle is available at https://github.com/esalkovic/outsingle.</p><h2>Other Information</h2><p dir="ltr">Published in: Bioinformatics<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1093/bioinformatics/btad142" target="_blank">https://dx.doi.org/10.1093/bioinformatics/btad142</a></p>
eu_rights_str_mv openAccess
id Manara2_7a75bdcdd8732dcf9ae38fb56c5e1453
identifier_str_mv 10.1093/bioinformatics/btad142
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/25202120
publishDate 2023
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular valuesEdin Salkovic (16891479)Mohammad Amin Sadeghi (8321631)Abdelkader Baggag (16864140)Ahmed Gamal Rashed Salem (17945072)Halima Bensmail (10400)Biological sciencesBiochemistry and cell biologyMathematical sciencesStatisticsRNA-Seq count dataRNA-sequencing (RNA-Seq)gene expression (GE)negative binomial distribution (NBD)<h3>Motivation</h3><p dir="ltr">Finding outliers in RNA-sequencing (RNA-Seq) gene expression (GE) can help in identifying genes that are aberrant and cause Mendelian disorders. Recently developed models for this task rely on modeling RNA-Seq GE data using the negative binomial distribution (NBD). However, some of those models either rely on procedures for inferring NBD’s parameters in a nonbiased way that are computationally demanding and thus make confounder control challenging, while others rely on less computationally demanding but biased procedures and convoluted confounder control approaches that hinder interpretability.</p><h3>Results</h3><p dir="ltr">In this article, we present OutSingle (Outlier detection using Singular Value Decomposition), an almost instantaneous way of detecting outliers in RNA-Seq GE data. It uses a simple log-normal approach for count modeling. For confounder control, it uses the recently discovered optimal hard threshold (OHT) method for noise detection, which itself is based on singular value decomposition (SVD). Due to its SVD/OHT utilization, OutSingle’s model is straightforward to understand and interpret. We then show that our novel method, when used on RNA-Seq GE data with real biological outliers masked by confounders, outcompetes the previous state-of-the-art model based on an ad hoc denoising autoencoder. Additionally, OutSingle can be used to inject artificial outliers masked by confounders, which is difficult to achieve with previous approaches. We describe a way of using OutSingle for outlier injection and proceed to show how OutSingle outperforms its competition on 16 out of 18 datasets that were generated from three real datasets using OutSingle’s injection procedure with different outlier types and magnitudes. Our methods are applicable to other types of similar problems involving finding outliers in matrices under the presence of confounders.</p><h3>Availability and implementation</h3><p dir="ltr">The code for OutSingle is available at https://github.com/esalkovic/outsingle.</p><h2>Other Information</h2><p dir="ltr">Published in: Bioinformatics<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1093/bioinformatics/btad142" target="_blank">https://dx.doi.org/10.1093/bioinformatics/btad142</a></p>2023-03-22T03:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1093/bioinformatics/btad142https://figshare.com/articles/journal_contribution/OutSingle_a_novel_method_of_detecting_and_injecting_outliers_in_RNA-Seq_count_data_using_the_optimal_hard_threshold_for_singular_values/25202120CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/252021202023-03-22T03:00:00Z
spellingShingle OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular values
Edin Salkovic (16891479)
Biological sciences
Biochemistry and cell biology
Mathematical sciences
Statistics
RNA-Seq count data
RNA-sequencing (RNA-Seq)
gene expression (GE)
negative binomial distribution (NBD)
status_str publishedVersion
title OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular values
title_full OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular values
title_fullStr OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular values
title_full_unstemmed OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular values
title_short OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular values
title_sort OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular values
topic Biological sciences
Biochemistry and cell biology
Mathematical sciences
Statistics
RNA-Seq count data
RNA-sequencing (RNA-Seq)
gene expression (GE)
negative binomial distribution (NBD)