Unsupervised outlier detection in multidimensional data

<p>Detection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomali...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Atiq ur Rehman (14153391) (author)
مؤلفون آخرون: Samir Brahim Belhaouari (9427347) (author)
منشور في: 2022
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513566556553216
author Atiq ur Rehman (14153391)
author2 Samir Brahim Belhaouari (9427347)
author2_role author
author_facet Atiq ur Rehman (14153391)
Samir Brahim Belhaouari (9427347)
author_role author
dc.creator.none.fl_str_mv Atiq ur Rehman (14153391)
Samir Brahim Belhaouari (9427347)
dc.date.none.fl_str_mv 2022-11-22T21:18:25Z
dc.identifier.none.fl_str_mv 10.1186/s40537-021-00469-z
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Unsupervised_outlier_detection_in_multidimensional_data/21598509
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Commerce, management, tourism and services
Business systems in context
Information and computing sciences
Distributed computing and systems software
Information Systems and Management
Computer Networks and Communications
Hardware and Architecture
Information Systems
dc.title.none.fl_str_mv Unsupervised outlier detection in multidimensional data
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p>Detection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomalies in a dataset in an unsupervised manner, some novel statistical techniques are proposed in this paper. The proposed techniques are based on statistical methods considering data compactness and other properties. The newly proposed ideas are found efficient in terms of performance, ease of implementation, and computational complexity. Furthermore, two proposed techniques presented in this paper use transformation of data to a unidimensional distance space to detect the outliers, so irrespective of the data’s high dimensions, the techniques remain computationally inexpensive and feasible. Comprehensive performance analysis of the proposed anomaly detection schemes is presented in the paper, and the newly proposed schemes are found better than the state-of-the-art methods when tested on several benchmark datasets.</p><h2>Other Information</h2> <p> Published in: Journal of Big Data<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="http://dx.doi.org/10.1186/s40537-021-00469-z" target="_blank">http://dx.doi.org/10.1186/s40537-021-00469-z</a></p>
eu_rights_str_mv openAccess
id Manara2_886f2bf473bcdc48aa78b4eb8b1e1757
identifier_str_mv 10.1186/s40537-021-00469-z
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/21598509
publishDate 2022
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Unsupervised outlier detection in multidimensional dataAtiq ur Rehman (14153391)Samir Brahim Belhaouari (9427347)Commerce, management, tourism and servicesBusiness systems in contextInformation and computing sciencesDistributed computing and systems softwareInformation Systems and ManagementComputer Networks and CommunicationsHardware and ArchitectureInformation Systems<p>Detection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomalies in a dataset in an unsupervised manner, some novel statistical techniques are proposed in this paper. The proposed techniques are based on statistical methods considering data compactness and other properties. The newly proposed ideas are found efficient in terms of performance, ease of implementation, and computational complexity. Furthermore, two proposed techniques presented in this paper use transformation of data to a unidimensional distance space to detect the outliers, so irrespective of the data’s high dimensions, the techniques remain computationally inexpensive and feasible. Comprehensive performance analysis of the proposed anomaly detection schemes is presented in the paper, and the newly proposed schemes are found better than the state-of-the-art methods when tested on several benchmark datasets.</p><h2>Other Information</h2> <p> Published in: Journal of Big Data<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="http://dx.doi.org/10.1186/s40537-021-00469-z" target="_blank">http://dx.doi.org/10.1186/s40537-021-00469-z</a></p>2022-11-22T21:18:25ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1186/s40537-021-00469-zhttps://figshare.com/articles/journal_contribution/Unsupervised_outlier_detection_in_multidimensional_data/21598509CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/215985092022-11-22T21:18:25Z
spellingShingle Unsupervised outlier detection in multidimensional data
Atiq ur Rehman (14153391)
Commerce, management, tourism and services
Business systems in context
Information and computing sciences
Distributed computing and systems software
Information Systems and Management
Computer Networks and Communications
Hardware and Architecture
Information Systems
status_str publishedVersion
title Unsupervised outlier detection in multidimensional data
title_full Unsupervised outlier detection in multidimensional data
title_fullStr Unsupervised outlier detection in multidimensional data
title_full_unstemmed Unsupervised outlier detection in multidimensional data
title_short Unsupervised outlier detection in multidimensional data
title_sort Unsupervised outlier detection in multidimensional data
topic Commerce, management, tourism and services
Business systems in context
Information and computing sciences
Distributed computing and systems software
Information Systems and Management
Computer Networks and Communications
Hardware and Architecture
Information Systems