KNNOR: An oversampling technique for imbalanced datasets

<p>Predictive performance of Machine Learning (ML) models rely on the quality of data used for training the models. However, if the training data is not balanced among different classes, the performance of ML models deteriorate heavily. Several techniques have been proposed in the literature t...

Full description

Saved in:
Bibliographic Details
Main Author: Ashhadul Islam (16869981) (author)
Other Authors: Samir Brahim Belhaouari (9427347) (author), Atiq Ur Rehman (8843024) (author), Halima Bensmail (10400) (author)
Published: 2021
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1864513506648260608
author Ashhadul Islam (16869981)
author2 Samir Brahim Belhaouari (9427347)
Atiq Ur Rehman (8843024)
Halima Bensmail (10400)
author2_role author
author
author
author_facet Ashhadul Islam (16869981)
Samir Brahim Belhaouari (9427347)
Atiq Ur Rehman (8843024)
Halima Bensmail (10400)
author_role author
dc.creator.none.fl_str_mv Ashhadul Islam (16869981)
Samir Brahim Belhaouari (9427347)
Atiq Ur Rehman (8843024)
Halima Bensmail (10400)
dc.date.none.fl_str_mv 2021-12-20T18:00:00Z
dc.identifier.none.fl_str_mv 10.1016/j.asoc.2021.108288
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/KNNOR_An_oversampling_technique_for_imbalanced_datasets/26862124
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Information and computing sciences
Data management and data science
Machine learning
Data augmentation
Machine learning
Imbalanced data
Nearest neighbor
Support Vector Machines
dc.title.none.fl_str_mv KNNOR: An oversampling technique for imbalanced datasets
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p>Predictive performance of Machine Learning (ML) models rely on the quality of data used for training the models. However, if the training data is not balanced among different classes, the performance of ML models deteriorate heavily. Several techniques have been proposed in the literature to add some semblance of balance to the data sets by adding artificial data points. Synthetic Minority Oversampling Technique(SMOTE) and Adaptive Synthetic Sampling(ADASYN) are some of the commonly used techniques to deal with class imbalance. However, these approaches are prone to ‘within class imbalance’ and ‘small disjunct problem’. To overcome these problems, this article proposes an advanced algorithm by studying the compactness and location of the minority class relative to other classes. The proposed technique called K-Nearest Neighbor OveRsampling approach (KNNOR) performs a three step process to identify the critical and safe areas for augmentation and generate synthetic data points of the minority class. The relative density of the entire population is considered while generating artificial points. This enables the proposed KNNOR approach to oversample the minority class more reliably and at the same time stay resilient against noise. The proposed method is compared with the ten top performing contemporary oversamplers by testing the accuracy of classifiers trained on augmented data provided by each oversampler. The experimental results on several common imbalanced datasets show that our method ranks first more consistently than the other state-of-art oversamplers. The proposed method is easy to use and has been made open source as a python library.</p><h2>Other Information</h2> <p> Published in: Applied Soft Computing<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.asoc.2021.108288" target="_blank">https://dx.doi.org/10.1016/j.asoc.2021.108288</a></p>
eu_rights_str_mv openAccess
id Manara2_db6493bdc7944ee5ffc45f9d9061ee62
identifier_str_mv 10.1016/j.asoc.2021.108288
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/26862124
publishDate 2021
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling KNNOR: An oversampling technique for imbalanced datasetsAshhadul Islam (16869981)Samir Brahim Belhaouari (9427347)Atiq Ur Rehman (8843024)Halima Bensmail (10400)Information and computing sciencesData management and data scienceMachine learningData augmentationMachine learningImbalanced dataNearest neighborSupport Vector Machines<p>Predictive performance of Machine Learning (ML) models rely on the quality of data used for training the models. However, if the training data is not balanced among different classes, the performance of ML models deteriorate heavily. Several techniques have been proposed in the literature to add some semblance of balance to the data sets by adding artificial data points. Synthetic Minority Oversampling Technique(SMOTE) and Adaptive Synthetic Sampling(ADASYN) are some of the commonly used techniques to deal with class imbalance. However, these approaches are prone to ‘within class imbalance’ and ‘small disjunct problem’. To overcome these problems, this article proposes an advanced algorithm by studying the compactness and location of the minority class relative to other classes. The proposed technique called K-Nearest Neighbor OveRsampling approach (KNNOR) performs a three step process to identify the critical and safe areas for augmentation and generate synthetic data points of the minority class. The relative density of the entire population is considered while generating artificial points. This enables the proposed KNNOR approach to oversample the minority class more reliably and at the same time stay resilient against noise. The proposed method is compared with the ten top performing contemporary oversamplers by testing the accuracy of classifiers trained on augmented data provided by each oversampler. The experimental results on several common imbalanced datasets show that our method ranks first more consistently than the other state-of-art oversamplers. The proposed method is easy to use and has been made open source as a python library.</p><h2>Other Information</h2> <p> Published in: Applied Soft Computing<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.asoc.2021.108288" target="_blank">https://dx.doi.org/10.1016/j.asoc.2021.108288</a></p>2021-12-20T18:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.asoc.2021.108288https://figshare.com/articles/journal_contribution/KNNOR_An_oversampling_technique_for_imbalanced_datasets/26862124CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/268621242021-12-20T18:00:00Z
spellingShingle KNNOR: An oversampling technique for imbalanced datasets
Ashhadul Islam (16869981)
Information and computing sciences
Data management and data science
Machine learning
Data augmentation
Machine learning
Imbalanced data
Nearest neighbor
Support Vector Machines
status_str publishedVersion
title KNNOR: An oversampling technique for imbalanced datasets
title_full KNNOR: An oversampling technique for imbalanced datasets
title_fullStr KNNOR: An oversampling technique for imbalanced datasets
title_full_unstemmed KNNOR: An oversampling technique for imbalanced datasets
title_short KNNOR: An oversampling technique for imbalanced datasets
title_sort KNNOR: An oversampling technique for imbalanced datasets
topic Information and computing sciences
Data management and data science
Machine learning
Data augmentation
Machine learning
Imbalanced data
Nearest neighbor
Support Vector Machines