Feature selection using the Boruta algorithm.

<div><p>Background</p><p>Coronary heart disease (CHD) and diabetes mellitus are highly prevalent in intensive care units (ICUs) and significantly contribute to high in-hospital mortality rates. Traditional risk stratification models often fail to capture the complex interacti...

Full description

Saved in:
Bibliographic Details
Main Author: Guang Tu (22054865) (author)
Other Authors: Zhonglan Cai (22054874) (author), Ling Wu (151826) (author), Hang Yu (278278) (author), Hongke Jiang (21596432) (author), Haijian Luo (22057247) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1852017593694552064
author Guang Tu (22054865)
author2 Zhonglan Cai (22054874)
Ling Wu (151826)
Hang Yu (278278)
Hongke Jiang (21596432)
Haijian Luo (22057247)
author2_role author
author
author
author
author
author_facet Guang Tu (22054865)
Zhonglan Cai (22054874)
Ling Wu (151826)
Hang Yu (278278)
Hongke Jiang (21596432)
Haijian Luo (22057247)
author_role author
dc.creator.none.fl_str_mv Guang Tu (22054865)
Zhonglan Cai (22054874)
Ling Wu (151826)
Hang Yu (278278)
Hongke Jiang (21596432)
Haijian Luo (22057247)
dc.date.none.fl_str_mv 2025-08-14T20:54:47Z
dc.identifier.none.fl_str_mv 10.1371/journal.pone.0330381.g003
dc.relation.none.fl_str_mv https://figshare.com/articles/figure/Feature_selection_using_the_Boruta_algorithm_/29914643
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Medicine
Biotechnology
Infectious Diseases
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
Information Systems not elsewhere classified
top 13 variables
model &# 8217
intensive care units
including logistic regression
extracted baseline characteristics
explain variable importance
demonstrated superior performance
coronary heart disease
blood urea nitrogen
analyze large datasets
6 %) experienced
identify intricate patterns
study included 2
gradient boosting classifier
improve clinical outcomes
adult icu patients
accurately identify high
hospital mortality rates
gradient boosting
clinical outcomes
icu patients
identify high
study aims
hospital mortality
clinical implementation
risk patients
213 patients
xlink ">
thereby providing
significantly contribute
risk stratification
random forest
promising alternative
primary diagnosis
practical tool
neural networks
machine learning
laboratory parameters
iv database
important predictors
highly prevalent
future work
feature selection
external validation
boruta algorithm
dc.title.none.fl_str_mv Feature selection using the Boruta algorithm.
dc.type.none.fl_str_mv Image
Figure
info:eu-repo/semantics/publishedVersion
image
description <div><p>Background</p><p>Coronary heart disease (CHD) and diabetes mellitus are highly prevalent in intensive care units (ICUs) and significantly contribute to high in-hospital mortality rates. Traditional risk stratification models often fail to capture the complex interactions among clinical variables, limiting their ability to accurately identify high-risk patients. Machine learning (ML) models, with their capacity to analyze large datasets and identify intricate patterns, provide a promising alternative for improving mortality prediction accuracy.</p><p>Objective</p><p>This study aims to develop and validate machine learning models for predicting in-hospital mortality in ICU patients with CHD and diabetes, and enhance model interpretability using SHapley Additive exPlanation (SHAP) values, thereby providing a more accurate and practical tool for clinicians.</p><p>Methods</p><p>We conducted a retrospective cohort study using data from the MIMIC-IV database, focusing on adult ICU patients with a primary diagnosis of CHD and diabetes. We extracted baseline characteristics, laboratory parameters, and clinical outcomes. The Boruta algorithm was employed for feature selection to identify variables significantly associated with in-hospital mortality, and 16 machine learning models, including logistic regression, random forest, gradient boosting, and neural networks, were developed and compared using receiver operating characteristic (ROC) curves and area under the curve (AUC) analysis. SHAP values were used to explain variable importance and enhance model interpretability.</p><p>Results</p><p>Our study included 2,213 patients, of whom 345 (15.6%) experienced in-hospital mortality. The Boruta algorithm identified 29 significant risk factors, and the top 13 variables were used for developing machine learning models. The gradient boosting classifier achieved the highest AUC of 0.8532, outperforming other models. SHAP analysis highlighted age, blood urea nitrogen, and pH as the most important predictors of mortality. SHAP waterfall plots provided detailed individualized risk assessments, demonstrating the model’s ability to identify high-risk subgroups effectively.</p><p>Conclusions</p><p>Machine learning models, especially the gradient boosting classifier, demonstrated superior performance in predicting in-hospital mortality in ICU patients with CHD and diabetes, outperforming traditional statistical methods. These models provide valuable insights for risk stratification and have the potential to improve clinical outcomes. Future work should focus on external validation and clinical implementation to further enhance their applicability and effectiveness in managing this high-risk population.</p></div>
eu_rights_str_mv openAccess
id Manara_d92cab7f87ee6e98c4e23fc4d4929596
identifier_str_mv 10.1371/journal.pone.0330381.g003
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/29914643
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Feature selection using the Boruta algorithm.Guang Tu (22054865)Zhonglan Cai (22054874)Ling Wu (151826)Hang Yu (278278)Hongke Jiang (21596432)Haijian Luo (22057247)MedicineBiotechnologyInfectious DiseasesBiological Sciences not elsewhere classifiedMathematical Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedtop 13 variablesmodel &# 8217intensive care unitsincluding logistic regressionextracted baseline characteristicsexplain variable importancedemonstrated superior performancecoronary heart diseaseblood urea nitrogenanalyze large datasets6 %) experiencedidentify intricate patternsstudy included 2gradient boosting classifierimprove clinical outcomesadult icu patientsaccurately identify highhospital mortality ratesgradient boostingclinical outcomesicu patientsidentify highstudy aimshospital mortalityclinical implementationrisk patients213 patientsxlink ">thereby providingsignificantly contributerisk stratificationrandom forestpromising alternativeprimary diagnosispractical toolneural networksmachine learninglaboratory parametersiv databaseimportant predictorshighly prevalentfuture workfeature selectionexternal validationboruta algorithm<div><p>Background</p><p>Coronary heart disease (CHD) and diabetes mellitus are highly prevalent in intensive care units (ICUs) and significantly contribute to high in-hospital mortality rates. Traditional risk stratification models often fail to capture the complex interactions among clinical variables, limiting their ability to accurately identify high-risk patients. Machine learning (ML) models, with their capacity to analyze large datasets and identify intricate patterns, provide a promising alternative for improving mortality prediction accuracy.</p><p>Objective</p><p>This study aims to develop and validate machine learning models for predicting in-hospital mortality in ICU patients with CHD and diabetes, and enhance model interpretability using SHapley Additive exPlanation (SHAP) values, thereby providing a more accurate and practical tool for clinicians.</p><p>Methods</p><p>We conducted a retrospective cohort study using data from the MIMIC-IV database, focusing on adult ICU patients with a primary diagnosis of CHD and diabetes. We extracted baseline characteristics, laboratory parameters, and clinical outcomes. The Boruta algorithm was employed for feature selection to identify variables significantly associated with in-hospital mortality, and 16 machine learning models, including logistic regression, random forest, gradient boosting, and neural networks, were developed and compared using receiver operating characteristic (ROC) curves and area under the curve (AUC) analysis. SHAP values were used to explain variable importance and enhance model interpretability.</p><p>Results</p><p>Our study included 2,213 patients, of whom 345 (15.6%) experienced in-hospital mortality. The Boruta algorithm identified 29 significant risk factors, and the top 13 variables were used for developing machine learning models. The gradient boosting classifier achieved the highest AUC of 0.8532, outperforming other models. SHAP analysis highlighted age, blood urea nitrogen, and pH as the most important predictors of mortality. SHAP waterfall plots provided detailed individualized risk assessments, demonstrating the model’s ability to identify high-risk subgroups effectively.</p><p>Conclusions</p><p>Machine learning models, especially the gradient boosting classifier, demonstrated superior performance in predicting in-hospital mortality in ICU patients with CHD and diabetes, outperforming traditional statistical methods. These models provide valuable insights for risk stratification and have the potential to improve clinical outcomes. Future work should focus on external validation and clinical implementation to further enhance their applicability and effectiveness in managing this high-risk population.</p></div>2025-08-14T20:54:47ZImageFigureinfo:eu-repo/semantics/publishedVersionimage10.1371/journal.pone.0330381.g003https://figshare.com/articles/figure/Feature_selection_using_the_Boruta_algorithm_/29914643CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/299146432025-08-14T20:54:47Z
spellingShingle Feature selection using the Boruta algorithm.
Guang Tu (22054865)
Medicine
Biotechnology
Infectious Diseases
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
Information Systems not elsewhere classified
top 13 variables
model &# 8217
intensive care units
including logistic regression
extracted baseline characteristics
explain variable importance
demonstrated superior performance
coronary heart disease
blood urea nitrogen
analyze large datasets
6 %) experienced
identify intricate patterns
study included 2
gradient boosting classifier
improve clinical outcomes
adult icu patients
accurately identify high
hospital mortality rates
gradient boosting
clinical outcomes
icu patients
identify high
study aims
hospital mortality
clinical implementation
risk patients
213 patients
xlink ">
thereby providing
significantly contribute
risk stratification
random forest
promising alternative
primary diagnosis
practical tool
neural networks
machine learning
laboratory parameters
iv database
important predictors
highly prevalent
future work
feature selection
external validation
boruta algorithm
status_str publishedVersion
title Feature selection using the Boruta algorithm.
title_full Feature selection using the Boruta algorithm.
title_fullStr Feature selection using the Boruta algorithm.
title_full_unstemmed Feature selection using the Boruta algorithm.
title_short Feature selection using the Boruta algorithm.
title_sort Feature selection using the Boruta algorithm.
topic Medicine
Biotechnology
Infectious Diseases
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
Information Systems not elsewhere classified
top 13 variables
model &# 8217
intensive care units
including logistic regression
extracted baseline characteristics
explain variable importance
demonstrated superior performance
coronary heart disease
blood urea nitrogen
analyze large datasets
6 %) experienced
identify intricate patterns
study included 2
gradient boosting classifier
improve clinical outcomes
adult icu patients
accurately identify high
hospital mortality rates
gradient boosting
clinical outcomes
icu patients
identify high
study aims
hospital mortality
clinical implementation
risk patients
213 patients
xlink ">
thereby providing
significantly contribute
risk stratification
random forest
promising alternative
primary diagnosis
practical tool
neural networks
machine learning
laboratory parameters
iv database
important predictors
highly prevalent
future work
feature selection
external validation
boruta algorithm