Dataset size and splitting.
<div><p>Traditional clinical assessments often lack individualization, relying on standardized procedures that may not accommodate the diverse needs of patients, especially in early stages where personalized diagnosis could offer significant benefits. We aim to provide a machine-learning...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , , , |
| منشور في: |
2025
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1852016271369961472 |
|---|---|
| author | Devin Setiawan (22155445) |
| author2 | Yumiko Wiranto (22311263) Jeffrey M. Girard (5403536) Amber Watts (2620564) Arian Ashourvan (6685232) |
| author2_role | author author author author |
| author_facet | Devin Setiawan (22155445) Yumiko Wiranto (22311263) Jeffrey M. Girard (5403536) Amber Watts (2620564) Arian Ashourvan (6685232) |
| author_role | author |
| dc.creator.none.fl_str_mv | Devin Setiawan (22155445) Yumiko Wiranto (22311263) Jeffrey M. Girard (5403536) Amber Watts (2620564) Arian Ashourvan (6685232) |
| dc.date.none.fl_str_mv | 2025-09-25T17:27:00Z |
| dc.identifier.none.fl_str_mv | 10.1371/journal.pdig.0001022.t001 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/dataset/Dataset_size_and_splitting_/30210725 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Biotechnology Sociology Cancer Biological Sciences not elsewhere classified Information Systems not elsewhere classified tailor feature selection shapley additive explanations individual patient characteristics heart failure dataset heart disease dataset feature selection methods 6 &# 8211 best additional features enhances diagnostic accuracy enhance diagnostic accuracy early diabetes dataset icare shows improvements synthetic dataset 1 early diabetes initial features icare shows including early early stages world datasets value analysis standardized procedures significant advantage roc curve individualized approaches icare achieved global approaches global approach diverse needs clinical assessments |
| dc.title.none.fl_str_mv | Dataset size and splitting. |
| dc.type.none.fl_str_mv | Dataset info:eu-repo/semantics/publishedVersion dataset |
| description | <div><p>Traditional clinical assessments often lack individualization, relying on standardized procedures that may not accommodate the diverse needs of patients, especially in early stages where personalized diagnosis could offer significant benefits. We aim to provide a machine-learning framework that addresses the individualized feature addition problem and enhances diagnostic accuracy for clinical assessments.Individualized Clinical Assessment Recommendation System (iCARE) employs locally weighted logistic regression and Shapley Additive Explanations (SHAP) value analysis to tailor feature selection to individual patient characteristics. Evaluations were conducted on synthetic and real-world datasets, including early-stage diabetes risk prediction and heart failure clinical records from the UCI Machine Learning Repository. We compared the performance of iCARE with a Global approach using statistical analysis on accuracy and area under the ROC curve (AUC) to select the best additional features. The iCARE framework enhances predictive accuracy and AUC metrics when additional features exhibit distinct predictive capabilities, as evidenced by synthetic datasets 1–3 and the early diabetes dataset. Specifically, in synthetic dataset 1, iCARE achieved an accuracy of 0.999 and an AUC of 1.000, outperforming the Global approach with an accuracy of 0.689 and an AUC of 0.639. In the early diabetes and heart disease dataset, iCARE shows improvements of 6–12% in accuracy and AUC across different numbers of initial features over other feature selection methods. Conversely, in synthetic datasets 4–5 and the heart failure dataset, where features lack discernible predictive distinctions, iCARE shows no significant advantage over global approaches on accuracy and AUC metrics. iCARE provides personalized feature recommendations that enhance diagnostic accuracy in scenarios where individualized approaches are critical, improving the precision and effectiveness of medical diagnoses.</p></div> |
| eu_rights_str_mv | openAccess |
| id | Manara_91c03483fa3dad5eff3255a85aee87ff |
| identifier_str_mv | 10.1371/journal.pdig.0001022.t001 |
| network_acronym_str | Manara |
| network_name_str | ManaraRepo |
| oai_identifier_str | oai:figshare.com:article/30210725 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Dataset size and splitting.Devin Setiawan (22155445)Yumiko Wiranto (22311263)Jeffrey M. Girard (5403536)Amber Watts (2620564)Arian Ashourvan (6685232)BiotechnologySociologyCancerBiological Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedtailor feature selectionshapley additive explanationsindividual patient characteristicsheart failure datasetheart disease datasetfeature selection methods6 &# 8211best additional featuresenhances diagnostic accuracyenhance diagnostic accuracyearly diabetes dataseticare shows improvementssynthetic dataset 1early diabetesinitial featuresicare showsincluding earlyearly stagesworld datasetsvalue analysisstandardized proceduressignificant advantageroc curveindividualized approachesicare achievedglobal approachesglobal approachdiverse needsclinical assessments<div><p>Traditional clinical assessments often lack individualization, relying on standardized procedures that may not accommodate the diverse needs of patients, especially in early stages where personalized diagnosis could offer significant benefits. We aim to provide a machine-learning framework that addresses the individualized feature addition problem and enhances diagnostic accuracy for clinical assessments.Individualized Clinical Assessment Recommendation System (iCARE) employs locally weighted logistic regression and Shapley Additive Explanations (SHAP) value analysis to tailor feature selection to individual patient characteristics. Evaluations were conducted on synthetic and real-world datasets, including early-stage diabetes risk prediction and heart failure clinical records from the UCI Machine Learning Repository. We compared the performance of iCARE with a Global approach using statistical analysis on accuracy and area under the ROC curve (AUC) to select the best additional features. The iCARE framework enhances predictive accuracy and AUC metrics when additional features exhibit distinct predictive capabilities, as evidenced by synthetic datasets 1–3 and the early diabetes dataset. Specifically, in synthetic dataset 1, iCARE achieved an accuracy of 0.999 and an AUC of 1.000, outperforming the Global approach with an accuracy of 0.689 and an AUC of 0.639. In the early diabetes and heart disease dataset, iCARE shows improvements of 6–12% in accuracy and AUC across different numbers of initial features over other feature selection methods. Conversely, in synthetic datasets 4–5 and the heart failure dataset, where features lack discernible predictive distinctions, iCARE shows no significant advantage over global approaches on accuracy and AUC metrics. iCARE provides personalized feature recommendations that enhance diagnostic accuracy in scenarios where individualized approaches are critical, improving the precision and effectiveness of medical diagnoses.</p></div>2025-09-25T17:27:00ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1371/journal.pdig.0001022.t001https://figshare.com/articles/dataset/Dataset_size_and_splitting_/30210725CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/302107252025-09-25T17:27:00Z |
| spellingShingle | Dataset size and splitting. Devin Setiawan (22155445) Biotechnology Sociology Cancer Biological Sciences not elsewhere classified Information Systems not elsewhere classified tailor feature selection shapley additive explanations individual patient characteristics heart failure dataset heart disease dataset feature selection methods 6 &# 8211 best additional features enhances diagnostic accuracy enhance diagnostic accuracy early diabetes dataset icare shows improvements synthetic dataset 1 early diabetes initial features icare shows including early early stages world datasets value analysis standardized procedures significant advantage roc curve individualized approaches icare achieved global approaches global approach diverse needs clinical assessments |
| status_str | publishedVersion |
| title | Dataset size and splitting. |
| title_full | Dataset size and splitting. |
| title_fullStr | Dataset size and splitting. |
| title_full_unstemmed | Dataset size and splitting. |
| title_short | Dataset size and splitting. |
| title_sort | Dataset size and splitting. |
| topic | Biotechnology Sociology Cancer Biological Sciences not elsewhere classified Information Systems not elsewhere classified tailor feature selection shapley additive explanations individual patient characteristics heart failure dataset heart disease dataset feature selection methods 6 &# 8211 best additional features enhances diagnostic accuracy enhance diagnostic accuracy early diabetes dataset icare shows improvements synthetic dataset 1 early diabetes initial features icare shows including early early stages world datasets value analysis standardized procedures significant advantage roc curve individualized approaches icare achieved global approaches global approach diverse needs clinical assessments |