Feature selection using the Boruta algorithm.
<div><p>Background</p><p>Coronary heart disease (CHD) and diabetes mellitus are highly prevalent in intensive care units (ICUs) and significantly contribute to high in-hospital mortality rates. Traditional risk stratification models often fail to capture the complex interacti...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , , , |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1852017593694552064 |
|---|---|
| author | Guang Tu (22054865) |
| author2 | Zhonglan Cai (22054874) Ling Wu (151826) Hang Yu (278278) Hongke Jiang (21596432) Haijian Luo (22057247) |
| author2_role | author author author author author |
| author_facet | Guang Tu (22054865) Zhonglan Cai (22054874) Ling Wu (151826) Hang Yu (278278) Hongke Jiang (21596432) Haijian Luo (22057247) |
| author_role | author |
| dc.creator.none.fl_str_mv | Guang Tu (22054865) Zhonglan Cai (22054874) Ling Wu (151826) Hang Yu (278278) Hongke Jiang (21596432) Haijian Luo (22057247) |
| dc.date.none.fl_str_mv | 2025-08-14T20:54:47Z |
| dc.identifier.none.fl_str_mv | 10.1371/journal.pone.0330381.g003 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/figure/Feature_selection_using_the_Boruta_algorithm_/29914643 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Medicine Biotechnology Infectious Diseases Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified top 13 variables model &# 8217 intensive care units including logistic regression extracted baseline characteristics explain variable importance demonstrated superior performance coronary heart disease blood urea nitrogen analyze large datasets 6 %) experienced identify intricate patterns study included 2 gradient boosting classifier improve clinical outcomes adult icu patients accurately identify high hospital mortality rates gradient boosting clinical outcomes icu patients identify high study aims hospital mortality clinical implementation risk patients 213 patients xlink "> thereby providing significantly contribute risk stratification random forest promising alternative primary diagnosis practical tool neural networks machine learning laboratory parameters iv database important predictors highly prevalent future work feature selection external validation boruta algorithm |
| dc.title.none.fl_str_mv | Feature selection using the Boruta algorithm. |
| dc.type.none.fl_str_mv | Image Figure info:eu-repo/semantics/publishedVersion image |
| description | <div><p>Background</p><p>Coronary heart disease (CHD) and diabetes mellitus are highly prevalent in intensive care units (ICUs) and significantly contribute to high in-hospital mortality rates. Traditional risk stratification models often fail to capture the complex interactions among clinical variables, limiting their ability to accurately identify high-risk patients. Machine learning (ML) models, with their capacity to analyze large datasets and identify intricate patterns, provide a promising alternative for improving mortality prediction accuracy.</p><p>Objective</p><p>This study aims to develop and validate machine learning models for predicting in-hospital mortality in ICU patients with CHD and diabetes, and enhance model interpretability using SHapley Additive exPlanation (SHAP) values, thereby providing a more accurate and practical tool for clinicians.</p><p>Methods</p><p>We conducted a retrospective cohort study using data from the MIMIC-IV database, focusing on adult ICU patients with a primary diagnosis of CHD and diabetes. We extracted baseline characteristics, laboratory parameters, and clinical outcomes. The Boruta algorithm was employed for feature selection to identify variables significantly associated with in-hospital mortality, and 16 machine learning models, including logistic regression, random forest, gradient boosting, and neural networks, were developed and compared using receiver operating characteristic (ROC) curves and area under the curve (AUC) analysis. SHAP values were used to explain variable importance and enhance model interpretability.</p><p>Results</p><p>Our study included 2,213 patients, of whom 345 (15.6%) experienced in-hospital mortality. The Boruta algorithm identified 29 significant risk factors, and the top 13 variables were used for developing machine learning models. The gradient boosting classifier achieved the highest AUC of 0.8532, outperforming other models. SHAP analysis highlighted age, blood urea nitrogen, and pH as the most important predictors of mortality. SHAP waterfall plots provided detailed individualized risk assessments, demonstrating the model’s ability to identify high-risk subgroups effectively.</p><p>Conclusions</p><p>Machine learning models, especially the gradient boosting classifier, demonstrated superior performance in predicting in-hospital mortality in ICU patients with CHD and diabetes, outperforming traditional statistical methods. These models provide valuable insights for risk stratification and have the potential to improve clinical outcomes. Future work should focus on external validation and clinical implementation to further enhance their applicability and effectiveness in managing this high-risk population.</p></div> |
| eu_rights_str_mv | openAccess |
| id | Manara_d92cab7f87ee6e98c4e23fc4d4929596 |
| identifier_str_mv | 10.1371/journal.pone.0330381.g003 |
| network_acronym_str | Manara |
| network_name_str | ManaraRepo |
| oai_identifier_str | oai:figshare.com:article/29914643 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Feature selection using the Boruta algorithm.Guang Tu (22054865)Zhonglan Cai (22054874)Ling Wu (151826)Hang Yu (278278)Hongke Jiang (21596432)Haijian Luo (22057247)MedicineBiotechnologyInfectious DiseasesBiological Sciences not elsewhere classifiedMathematical Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedtop 13 variablesmodel &# 8217intensive care unitsincluding logistic regressionextracted baseline characteristicsexplain variable importancedemonstrated superior performancecoronary heart diseaseblood urea nitrogenanalyze large datasets6 %) experiencedidentify intricate patternsstudy included 2gradient boosting classifierimprove clinical outcomesadult icu patientsaccurately identify highhospital mortality ratesgradient boostingclinical outcomesicu patientsidentify highstudy aimshospital mortalityclinical implementationrisk patients213 patientsxlink ">thereby providingsignificantly contributerisk stratificationrandom forestpromising alternativeprimary diagnosispractical toolneural networksmachine learninglaboratory parametersiv databaseimportant predictorshighly prevalentfuture workfeature selectionexternal validationboruta algorithm<div><p>Background</p><p>Coronary heart disease (CHD) and diabetes mellitus are highly prevalent in intensive care units (ICUs) and significantly contribute to high in-hospital mortality rates. Traditional risk stratification models often fail to capture the complex interactions among clinical variables, limiting their ability to accurately identify high-risk patients. Machine learning (ML) models, with their capacity to analyze large datasets and identify intricate patterns, provide a promising alternative for improving mortality prediction accuracy.</p><p>Objective</p><p>This study aims to develop and validate machine learning models for predicting in-hospital mortality in ICU patients with CHD and diabetes, and enhance model interpretability using SHapley Additive exPlanation (SHAP) values, thereby providing a more accurate and practical tool for clinicians.</p><p>Methods</p><p>We conducted a retrospective cohort study using data from the MIMIC-IV database, focusing on adult ICU patients with a primary diagnosis of CHD and diabetes. We extracted baseline characteristics, laboratory parameters, and clinical outcomes. The Boruta algorithm was employed for feature selection to identify variables significantly associated with in-hospital mortality, and 16 machine learning models, including logistic regression, random forest, gradient boosting, and neural networks, were developed and compared using receiver operating characteristic (ROC) curves and area under the curve (AUC) analysis. SHAP values were used to explain variable importance and enhance model interpretability.</p><p>Results</p><p>Our study included 2,213 patients, of whom 345 (15.6%) experienced in-hospital mortality. The Boruta algorithm identified 29 significant risk factors, and the top 13 variables were used for developing machine learning models. The gradient boosting classifier achieved the highest AUC of 0.8532, outperforming other models. SHAP analysis highlighted age, blood urea nitrogen, and pH as the most important predictors of mortality. SHAP waterfall plots provided detailed individualized risk assessments, demonstrating the model’s ability to identify high-risk subgroups effectively.</p><p>Conclusions</p><p>Machine learning models, especially the gradient boosting classifier, demonstrated superior performance in predicting in-hospital mortality in ICU patients with CHD and diabetes, outperforming traditional statistical methods. These models provide valuable insights for risk stratification and have the potential to improve clinical outcomes. Future work should focus on external validation and clinical implementation to further enhance their applicability and effectiveness in managing this high-risk population.</p></div>2025-08-14T20:54:47ZImageFigureinfo:eu-repo/semantics/publishedVersionimage10.1371/journal.pone.0330381.g003https://figshare.com/articles/figure/Feature_selection_using_the_Boruta_algorithm_/29914643CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/299146432025-08-14T20:54:47Z |
| spellingShingle | Feature selection using the Boruta algorithm. Guang Tu (22054865) Medicine Biotechnology Infectious Diseases Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified top 13 variables model &# 8217 intensive care units including logistic regression extracted baseline characteristics explain variable importance demonstrated superior performance coronary heart disease blood urea nitrogen analyze large datasets 6 %) experienced identify intricate patterns study included 2 gradient boosting classifier improve clinical outcomes adult icu patients accurately identify high hospital mortality rates gradient boosting clinical outcomes icu patients identify high study aims hospital mortality clinical implementation risk patients 213 patients xlink "> thereby providing significantly contribute risk stratification random forest promising alternative primary diagnosis practical tool neural networks machine learning laboratory parameters iv database important predictors highly prevalent future work feature selection external validation boruta algorithm |
| status_str | publishedVersion |
| title | Feature selection using the Boruta algorithm. |
| title_full | Feature selection using the Boruta algorithm. |
| title_fullStr | Feature selection using the Boruta algorithm. |
| title_full_unstemmed | Feature selection using the Boruta algorithm. |
| title_short | Feature selection using the Boruta algorithm. |
| title_sort | Feature selection using the Boruta algorithm. |
| topic | Medicine Biotechnology Infectious Diseases Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified top 13 variables model &# 8217 intensive care units including logistic regression extracted baseline characteristics explain variable importance demonstrated superior performance coronary heart disease blood urea nitrogen analyze large datasets 6 %) experienced identify intricate patterns study included 2 gradient boosting classifier improve clinical outcomes adult icu patients accurately identify high hospital mortality rates gradient boosting clinical outcomes icu patients identify high study aims hospital mortality clinical implementation risk patients 213 patients xlink "> thereby providing significantly contribute risk stratification random forest promising alternative primary diagnosis practical tool neural networks machine learning laboratory parameters iv database important predictors highly prevalent future work feature selection external validation boruta algorithm |