Oversampling techniques for imbalanced data in regression

<p>Our study addresses the challenge of imbalanced regression data in Machine Learning (ML) by introducing tailored methods for different data structures. We adapt K-Nearest Neighbor Oversampling-Regression (KNNOR-Reg), originally for imbalanced classification, to address imbalanced regression...

Full description

Saved in:
Bibliographic Details
Main Author: Samir Brahim Belhaouari (9427347) (author)
Other Authors: Ashhadul Islam (16869981) (author), Khelil Kassoul (18441114) (author), Ala Al-Fuqaha (4434340) (author), Abdesselam Bouzerdoum (17900021) (author)
Published: 2024
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1864513509810765824
author Samir Brahim Belhaouari (9427347)
author2 Ashhadul Islam (16869981)
Khelil Kassoul (18441114)
Ala Al-Fuqaha (4434340)
Abdesselam Bouzerdoum (17900021)
author2_role author
author
author
author
author_facet Samir Brahim Belhaouari (9427347)
Ashhadul Islam (16869981)
Khelil Kassoul (18441114)
Ala Al-Fuqaha (4434340)
Abdesselam Bouzerdoum (17900021)
author_role author
dc.creator.none.fl_str_mv Samir Brahim Belhaouari (9427347)
Ashhadul Islam (16869981)
Khelil Kassoul (18441114)
Ala Al-Fuqaha (4434340)
Abdesselam Bouzerdoum (17900021)
dc.date.none.fl_str_mv 2024-05-20T15:00:00Z
dc.identifier.none.fl_str_mv 10.1016/j.eswa.2024.124118
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Oversampling_techniques_for_imbalanced_data_in_regression/26404000
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Information and computing sciences
Data management and data science
Machine learning
Data augmentation
Machine learning
AutoInflaters
Nearest neighbor
Imbalanced data
dc.title.none.fl_str_mv Oversampling techniques for imbalanced data in regression
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p>Our study addresses the challenge of imbalanced regression data in Machine Learning (ML) by introducing tailored methods for different data structures. We adapt K-Nearest Neighbor Oversampling-Regression (KNNOR-Reg), originally for imbalanced classification, to address imbalanced regression in low population datasets, evolving to KNNOR-Deep Regression (KNNOR-DeepReg) for high-population datasets. For tabular data, we also present the Auto-Inflater neural network, utilizing an exponential loss function for Autoencoders. For image datasets, we employ Multi-Level Autoencoders, consisting of Convolutional and Fully Connected Autoencoders. For such high-dimension data our approach outperforms the Synthetic Minority Oversampling Technique for Regression (SMOTER) algorithm for the IMDB-WIKI and AgeDB image datasets. For tabular data we conducted a comprehensive experiment using various models trained on both augmented and non-augmented datasets, followed by performance comparisons on test data. The outcomes revealed a positive impact of data augmentation, with a success rate of 83.75% for Light Gradient Boosting Method (LightGBM) and 71.57% for the 18 other regressors employed in the study. This success rate is determined by the frequency of instances where models performed better when augmented data was used compared to instances with no augmentation. Access to the comparative code can be found in GitHub.</p><h2>Other Information</h2> <p> Published in: Expert Systems with Applications<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.eswa.2024.124118" target="_blank">https://dx.doi.org/10.1016/j.eswa.2024.124118</a></p>
eu_rights_str_mv openAccess
id Manara2_17a4dc6d4a13b009086633bb36e6b54b
identifier_str_mv 10.1016/j.eswa.2024.124118
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/26404000
publishDate 2024
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Oversampling techniques for imbalanced data in regressionSamir Brahim Belhaouari (9427347)Ashhadul Islam (16869981)Khelil Kassoul (18441114)Ala Al-Fuqaha (4434340)Abdesselam Bouzerdoum (17900021)Information and computing sciencesData management and data scienceMachine learningData augmentationMachine learningAutoInflatersNearest neighborImbalanced data<p>Our study addresses the challenge of imbalanced regression data in Machine Learning (ML) by introducing tailored methods for different data structures. We adapt K-Nearest Neighbor Oversampling-Regression (KNNOR-Reg), originally for imbalanced classification, to address imbalanced regression in low population datasets, evolving to KNNOR-Deep Regression (KNNOR-DeepReg) for high-population datasets. For tabular data, we also present the Auto-Inflater neural network, utilizing an exponential loss function for Autoencoders. For image datasets, we employ Multi-Level Autoencoders, consisting of Convolutional and Fully Connected Autoencoders. For such high-dimension data our approach outperforms the Synthetic Minority Oversampling Technique for Regression (SMOTER) algorithm for the IMDB-WIKI and AgeDB image datasets. For tabular data we conducted a comprehensive experiment using various models trained on both augmented and non-augmented datasets, followed by performance comparisons on test data. The outcomes revealed a positive impact of data augmentation, with a success rate of 83.75% for Light Gradient Boosting Method (LightGBM) and 71.57% for the 18 other regressors employed in the study. This success rate is determined by the frequency of instances where models performed better when augmented data was used compared to instances with no augmentation. Access to the comparative code can be found in GitHub.</p><h2>Other Information</h2> <p> Published in: Expert Systems with Applications<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.eswa.2024.124118" target="_blank">https://dx.doi.org/10.1016/j.eswa.2024.124118</a></p>2024-05-20T15:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.eswa.2024.124118https://figshare.com/articles/journal_contribution/Oversampling_techniques_for_imbalanced_data_in_regression/26404000CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/264040002024-05-20T15:00:00Z
spellingShingle Oversampling techniques for imbalanced data in regression
Samir Brahim Belhaouari (9427347)
Information and computing sciences
Data management and data science
Machine learning
Data augmentation
Machine learning
AutoInflaters
Nearest neighbor
Imbalanced data
status_str publishedVersion
title Oversampling techniques for imbalanced data in regression
title_full Oversampling techniques for imbalanced data in regression
title_fullStr Oversampling techniques for imbalanced data in regression
title_full_unstemmed Oversampling techniques for imbalanced data in regression
title_short Oversampling techniques for imbalanced data in regression
title_sort Oversampling techniques for imbalanced data in regression
topic Information and computing sciences
Data management and data science
Machine learning
Data augmentation
Machine learning
AutoInflaters
Nearest neighbor
Imbalanced data