The automation of the development of classification models and improvement of model quality using feature engineering techniques
<p>Recently pipelines of machine learning-based classification models have become important to codify, orchestrate, and automate the workflow to produce an effective machine learning model. In this article, we propose a framework that combines feature engineering techniques such as data imputa...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , , |
| Published: |
2023
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1864513528381046784 |
|---|---|
| author | Sjoerd Boeschoten (17347045) |
| author2 | Cagatay Catal (6897842) Bedir Tekinerdogan (6897839) Arjen Lommen (471283) Marco Blokland (12644072) |
| author2_role | author author author author |
| author_facet | Sjoerd Boeschoten (17347045) Cagatay Catal (6897842) Bedir Tekinerdogan (6897839) Arjen Lommen (471283) Marco Blokland (12644072) |
| author_role | author |
| dc.creator.none.fl_str_mv | Sjoerd Boeschoten (17347045) Cagatay Catal (6897842) Bedir Tekinerdogan (6897839) Arjen Lommen (471283) Marco Blokland (12644072) |
| dc.date.none.fl_str_mv | 2023-03-01T00:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1016/j.eswa.2022.118912 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/The_automation_of_the_development_of_classification_models_and_improvement_of_model_quality_using_feature_engineering_techniques/25117565 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Information and computing sciences Data management and data science Machine learning Machine learning pipeline Feature engineering Machine learning Automation Data imputation Feature transformation Data balancing |
| dc.title.none.fl_str_mv | The automation of the development of classification models and improvement of model quality using feature engineering techniques |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p>Recently pipelines of machine learning-based classification models have become important to codify, orchestrate, and automate the workflow to produce an effective machine learning model. In this article, we propose a framework that combines feature engineering techniques such as data imputation, transformation, and class balancing to compare the performance of different prediction models and select the best final model based on predefined parameters. The proposed framework is extendable and configurable by adding algorithms supported by the CARET package implemented in the R programming language. This framework can generate different machine learning models, which provide comparable results compared to other studies. The framework allows practitioners and researchers to automatically generate different classification models. This research used High-Resolution Orbitrap-based Mass Spectrometers (HRMS) data to create automated prediction models for the first time in literature. We demonstrated the applicability of feature engineering techniques such as data imputation, transformation (e.g., scaling, centering, etc.), and data balancing using several case studies and the proposed semi-automated framework. We showed how the initial prediction models can be improved using the proposed framework.</p><h2>Other Information</h2> <p> Published in: Expert Systems with Applications<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.eswa.2022.118912" target="_blank">https://dx.doi.org/10.1016/j.eswa.2022.118912</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_fbf8b9d25c599f64467c6c3c31ca7c7e |
| identifier_str_mv | 10.1016/j.eswa.2022.118912 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/25117565 |
| publishDate | 2023 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | The automation of the development of classification models and improvement of model quality using feature engineering techniquesSjoerd Boeschoten (17347045)Cagatay Catal (6897842)Bedir Tekinerdogan (6897839)Arjen Lommen (471283)Marco Blokland (12644072)Information and computing sciencesData management and data scienceMachine learningMachine learning pipelineFeature engineeringMachine learningAutomationData imputationFeature transformationData balancing<p>Recently pipelines of machine learning-based classification models have become important to codify, orchestrate, and automate the workflow to produce an effective machine learning model. In this article, we propose a framework that combines feature engineering techniques such as data imputation, transformation, and class balancing to compare the performance of different prediction models and select the best final model based on predefined parameters. The proposed framework is extendable and configurable by adding algorithms supported by the CARET package implemented in the R programming language. This framework can generate different machine learning models, which provide comparable results compared to other studies. The framework allows practitioners and researchers to automatically generate different classification models. This research used High-Resolution Orbitrap-based Mass Spectrometers (HRMS) data to create automated prediction models for the first time in literature. We demonstrated the applicability of feature engineering techniques such as data imputation, transformation (e.g., scaling, centering, etc.), and data balancing using several case studies and the proposed semi-automated framework. We showed how the initial prediction models can be improved using the proposed framework.</p><h2>Other Information</h2> <p> Published in: Expert Systems with Applications<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.eswa.2022.118912" target="_blank">https://dx.doi.org/10.1016/j.eswa.2022.118912</a></p>2023-03-01T00:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.eswa.2022.118912https://figshare.com/articles/journal_contribution/The_automation_of_the_development_of_classification_models_and_improvement_of_model_quality_using_feature_engineering_techniques/25117565CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/251175652023-03-01T00:00:00Z |
| spellingShingle | The automation of the development of classification models and improvement of model quality using feature engineering techniques Sjoerd Boeschoten (17347045) Information and computing sciences Data management and data science Machine learning Machine learning pipeline Feature engineering Machine learning Automation Data imputation Feature transformation Data balancing |
| status_str | publishedVersion |
| title | The automation of the development of classification models and improvement of model quality using feature engineering techniques |
| title_full | The automation of the development of classification models and improvement of model quality using feature engineering techniques |
| title_fullStr | The automation of the development of classification models and improvement of model quality using feature engineering techniques |
| title_full_unstemmed | The automation of the development of classification models and improvement of model quality using feature engineering techniques |
| title_short | The automation of the development of classification models and improvement of model quality using feature engineering techniques |
| title_sort | The automation of the development of classification models and improvement of model quality using feature engineering techniques |
| topic | Information and computing sciences Data management and data science Machine learning Machine learning pipeline Feature engineering Machine learning Automation Data imputation Feature transformation Data balancing |