The automation of the development of classification models and improvement of model quality using feature engineering techniques

<p>Recently pipelines of machine learning-based classification models have become important to codify, orchestrate, and automate the workflow to produce an effective machine learning model. In this article, we propose a framework that combines feature engineering techniques such as data imputa...

Full description

Saved in:
Bibliographic Details
Main Author: Sjoerd Boeschoten (17347045) (author)
Other Authors: Cagatay Catal (6897842) (author), Bedir Tekinerdogan (6897839) (author), Arjen Lommen (471283) (author), Marco Blokland (12644072) (author)
Published: 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1864513528381046784
author Sjoerd Boeschoten (17347045)
author2 Cagatay Catal (6897842)
Bedir Tekinerdogan (6897839)
Arjen Lommen (471283)
Marco Blokland (12644072)
author2_role author
author
author
author
author_facet Sjoerd Boeschoten (17347045)
Cagatay Catal (6897842)
Bedir Tekinerdogan (6897839)
Arjen Lommen (471283)
Marco Blokland (12644072)
author_role author
dc.creator.none.fl_str_mv Sjoerd Boeschoten (17347045)
Cagatay Catal (6897842)
Bedir Tekinerdogan (6897839)
Arjen Lommen (471283)
Marco Blokland (12644072)
dc.date.none.fl_str_mv 2023-03-01T00:00:00Z
dc.identifier.none.fl_str_mv 10.1016/j.eswa.2022.118912
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/The_automation_of_the_development_of_classification_models_and_improvement_of_model_quality_using_feature_engineering_techniques/25117565
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Information and computing sciences
Data management and data science
Machine learning
Machine learning pipeline
Feature engineering
Machine learning
Automation
Data imputation
Feature transformation
Data balancing
dc.title.none.fl_str_mv The automation of the development of classification models and improvement of model quality using feature engineering techniques
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p>Recently pipelines of machine learning-based classification models have become important to codify, orchestrate, and automate the workflow to produce an effective machine learning model. In this article, we propose a framework that combines feature engineering techniques such as data imputation, transformation, and class balancing to compare the performance of different prediction models and select the best final model based on predefined parameters. The proposed framework is extendable and configurable by adding algorithms supported by the CARET package implemented in the R programming language. This framework can generate different machine learning models, which provide comparable results compared to other studies. The framework allows practitioners and researchers to automatically generate different classification models. This research used High-Resolution Orbitrap-based Mass Spectrometers (HRMS) data to create automated prediction models for the first time in literature. We demonstrated the applicability of feature engineering techniques such as data imputation, transformation (e.g., scaling, centering, etc.), and data balancing using several case studies and the proposed semi-automated framework. We showed how the initial prediction models can be improved using the proposed framework.</p><h2>Other Information</h2> <p> Published in: Expert Systems with Applications<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.eswa.2022.118912" target="_blank">https://dx.doi.org/10.1016/j.eswa.2022.118912</a></p>
eu_rights_str_mv openAccess
id Manara2_fbf8b9d25c599f64467c6c3c31ca7c7e
identifier_str_mv 10.1016/j.eswa.2022.118912
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/25117565
publishDate 2023
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling The automation of the development of classification models and improvement of model quality using feature engineering techniquesSjoerd Boeschoten (17347045)Cagatay Catal (6897842)Bedir Tekinerdogan (6897839)Arjen Lommen (471283)Marco Blokland (12644072)Information and computing sciencesData management and data scienceMachine learningMachine learning pipelineFeature engineeringMachine learningAutomationData imputationFeature transformationData balancing<p>Recently pipelines of machine learning-based classification models have become important to codify, orchestrate, and automate the workflow to produce an effective machine learning model. In this article, we propose a framework that combines feature engineering techniques such as data imputation, transformation, and class balancing to compare the performance of different prediction models and select the best final model based on predefined parameters. The proposed framework is extendable and configurable by adding algorithms supported by the CARET package implemented in the R programming language. This framework can generate different machine learning models, which provide comparable results compared to other studies. The framework allows practitioners and researchers to automatically generate different classification models. This research used High-Resolution Orbitrap-based Mass Spectrometers (HRMS) data to create automated prediction models for the first time in literature. We demonstrated the applicability of feature engineering techniques such as data imputation, transformation (e.g., scaling, centering, etc.), and data balancing using several case studies and the proposed semi-automated framework. We showed how the initial prediction models can be improved using the proposed framework.</p><h2>Other Information</h2> <p> Published in: Expert Systems with Applications<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.eswa.2022.118912" target="_blank">https://dx.doi.org/10.1016/j.eswa.2022.118912</a></p>2023-03-01T00:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.eswa.2022.118912https://figshare.com/articles/journal_contribution/The_automation_of_the_development_of_classification_models_and_improvement_of_model_quality_using_feature_engineering_techniques/25117565CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/251175652023-03-01T00:00:00Z
spellingShingle The automation of the development of classification models and improvement of model quality using feature engineering techniques
Sjoerd Boeschoten (17347045)
Information and computing sciences
Data management and data science
Machine learning
Machine learning pipeline
Feature engineering
Machine learning
Automation
Data imputation
Feature transformation
Data balancing
status_str publishedVersion
title The automation of the development of classification models and improvement of model quality using feature engineering techniques
title_full The automation of the development of classification models and improvement of model quality using feature engineering techniques
title_fullStr The automation of the development of classification models and improvement of model quality using feature engineering techniques
title_full_unstemmed The automation of the development of classification models and improvement of model quality using feature engineering techniques
title_short The automation of the development of classification models and improvement of model quality using feature engineering techniques
title_sort The automation of the development of classification models and improvement of model quality using feature engineering techniques
topic Information and computing sciences
Data management and data science
Machine learning
Machine learning pipeline
Feature engineering
Machine learning
Automation
Data imputation
Feature transformation
Data balancing