Dataset size and splitting.

<div><p>Traditional clinical assessments often lack individualization, relying on standardized procedures that may not accommodate the diverse needs of patients, especially in early stages where personalized diagnosis could offer significant benefits. We aim to provide a machine-learning...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Devin Setiawan (22155445) (author)
مؤلفون آخرون: Yumiko Wiranto (22311263) (author), Jeffrey M. Girard (5403536) (author), Amber Watts (2620564) (author), Arian Ashourvan (6685232) (author)
منشور في: 2025
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1852016271369961472
author Devin Setiawan (22155445)
author2 Yumiko Wiranto (22311263)
Jeffrey M. Girard (5403536)
Amber Watts (2620564)
Arian Ashourvan (6685232)
author2_role author
author
author
author
author_facet Devin Setiawan (22155445)
Yumiko Wiranto (22311263)
Jeffrey M. Girard (5403536)
Amber Watts (2620564)
Arian Ashourvan (6685232)
author_role author
dc.creator.none.fl_str_mv Devin Setiawan (22155445)
Yumiko Wiranto (22311263)
Jeffrey M. Girard (5403536)
Amber Watts (2620564)
Arian Ashourvan (6685232)
dc.date.none.fl_str_mv 2025-09-25T17:27:00Z
dc.identifier.none.fl_str_mv 10.1371/journal.pdig.0001022.t001
dc.relation.none.fl_str_mv https://figshare.com/articles/dataset/Dataset_size_and_splitting_/30210725
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biotechnology
Sociology
Cancer
Biological Sciences not elsewhere classified
Information Systems not elsewhere classified
tailor feature selection
shapley additive explanations
individual patient characteristics
heart failure dataset
heart disease dataset
feature selection methods
6 &# 8211
best additional features
enhances diagnostic accuracy
enhance diagnostic accuracy
early diabetes dataset
icare shows improvements
synthetic dataset 1
early diabetes
initial features
icare shows
including early
early stages
world datasets
value analysis
standardized procedures
significant advantage
roc curve
individualized approaches
icare achieved
global approaches
global approach
diverse needs
clinical assessments
dc.title.none.fl_str_mv Dataset size and splitting.
dc.type.none.fl_str_mv Dataset
info:eu-repo/semantics/publishedVersion
dataset
description <div><p>Traditional clinical assessments often lack individualization, relying on standardized procedures that may not accommodate the diverse needs of patients, especially in early stages where personalized diagnosis could offer significant benefits. We aim to provide a machine-learning framework that addresses the individualized feature addition problem and enhances diagnostic accuracy for clinical assessments.Individualized Clinical Assessment Recommendation System (iCARE) employs locally weighted logistic regression and Shapley Additive Explanations (SHAP) value analysis to tailor feature selection to individual patient characteristics. Evaluations were conducted on synthetic and real-world datasets, including early-stage diabetes risk prediction and heart failure clinical records from the UCI Machine Learning Repository. We compared the performance of iCARE with a Global approach using statistical analysis on accuracy and area under the ROC curve (AUC) to select the best additional features. The iCARE framework enhances predictive accuracy and AUC metrics when additional features exhibit distinct predictive capabilities, as evidenced by synthetic datasets 1–3 and the early diabetes dataset. Specifically, in synthetic dataset 1, iCARE achieved an accuracy of 0.999 and an AUC of 1.000, outperforming the Global approach with an accuracy of 0.689 and an AUC of 0.639. In the early diabetes and heart disease dataset, iCARE shows improvements of 6–12% in accuracy and AUC across different numbers of initial features over other feature selection methods. Conversely, in synthetic datasets 4–5 and the heart failure dataset, where features lack discernible predictive distinctions, iCARE shows no significant advantage over global approaches on accuracy and AUC metrics. iCARE provides personalized feature recommendations that enhance diagnostic accuracy in scenarios where individualized approaches are critical, improving the precision and effectiveness of medical diagnoses.</p></div>
eu_rights_str_mv openAccess
id Manara_91c03483fa3dad5eff3255a85aee87ff
identifier_str_mv 10.1371/journal.pdig.0001022.t001
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/30210725
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Dataset size and splitting.Devin Setiawan (22155445)Yumiko Wiranto (22311263)Jeffrey M. Girard (5403536)Amber Watts (2620564)Arian Ashourvan (6685232)BiotechnologySociologyCancerBiological Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedtailor feature selectionshapley additive explanationsindividual patient characteristicsheart failure datasetheart disease datasetfeature selection methods6 &# 8211best additional featuresenhances diagnostic accuracyenhance diagnostic accuracyearly diabetes dataseticare shows improvementssynthetic dataset 1early diabetesinitial featuresicare showsincluding earlyearly stagesworld datasetsvalue analysisstandardized proceduressignificant advantageroc curveindividualized approachesicare achievedglobal approachesglobal approachdiverse needsclinical assessments<div><p>Traditional clinical assessments often lack individualization, relying on standardized procedures that may not accommodate the diverse needs of patients, especially in early stages where personalized diagnosis could offer significant benefits. We aim to provide a machine-learning framework that addresses the individualized feature addition problem and enhances diagnostic accuracy for clinical assessments.Individualized Clinical Assessment Recommendation System (iCARE) employs locally weighted logistic regression and Shapley Additive Explanations (SHAP) value analysis to tailor feature selection to individual patient characteristics. Evaluations were conducted on synthetic and real-world datasets, including early-stage diabetes risk prediction and heart failure clinical records from the UCI Machine Learning Repository. We compared the performance of iCARE with a Global approach using statistical analysis on accuracy and area under the ROC curve (AUC) to select the best additional features. The iCARE framework enhances predictive accuracy and AUC metrics when additional features exhibit distinct predictive capabilities, as evidenced by synthetic datasets 1–3 and the early diabetes dataset. Specifically, in synthetic dataset 1, iCARE achieved an accuracy of 0.999 and an AUC of 1.000, outperforming the Global approach with an accuracy of 0.689 and an AUC of 0.639. In the early diabetes and heart disease dataset, iCARE shows improvements of 6–12% in accuracy and AUC across different numbers of initial features over other feature selection methods. Conversely, in synthetic datasets 4–5 and the heart failure dataset, where features lack discernible predictive distinctions, iCARE shows no significant advantage over global approaches on accuracy and AUC metrics. iCARE provides personalized feature recommendations that enhance diagnostic accuracy in scenarios where individualized approaches are critical, improving the precision and effectiveness of medical diagnoses.</p></div>2025-09-25T17:27:00ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1371/journal.pdig.0001022.t001https://figshare.com/articles/dataset/Dataset_size_and_splitting_/30210725CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/302107252025-09-25T17:27:00Z
spellingShingle Dataset size and splitting.
Devin Setiawan (22155445)
Biotechnology
Sociology
Cancer
Biological Sciences not elsewhere classified
Information Systems not elsewhere classified
tailor feature selection
shapley additive explanations
individual patient characteristics
heart failure dataset
heart disease dataset
feature selection methods
6 &# 8211
best additional features
enhances diagnostic accuracy
enhance diagnostic accuracy
early diabetes dataset
icare shows improvements
synthetic dataset 1
early diabetes
initial features
icare shows
including early
early stages
world datasets
value analysis
standardized procedures
significant advantage
roc curve
individualized approaches
icare achieved
global approaches
global approach
diverse needs
clinical assessments
status_str publishedVersion
title Dataset size and splitting.
title_full Dataset size and splitting.
title_fullStr Dataset size and splitting.
title_full_unstemmed Dataset size and splitting.
title_short Dataset size and splitting.
title_sort Dataset size and splitting.
topic Biotechnology
Sociology
Cancer
Biological Sciences not elsewhere classified
Information Systems not elsewhere classified
tailor feature selection
shapley additive explanations
individual patient characteristics
heart failure dataset
heart disease dataset
feature selection methods
6 &# 8211
best additional features
enhances diagnostic accuracy
enhance diagnostic accuracy
early diabetes dataset
icare shows improvements
synthetic dataset 1
early diabetes
initial features
icare shows
including early
early stages
world datasets
value analysis
standardized procedures
significant advantage
roc curve
individualized approaches
icare achieved
global approaches
global approach
diverse needs
clinical assessments