The Search process of the genetic algorithm.
<div><p>Diabetes, as an incurable lifelong chronic disease, has profound and far-reaching effects on patients. Given this, early intervention is particularly crucial, as it can not only significantly improve the prognosis of patients but also provide valuable reference information for cl...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , |
| منشور في: |
2024
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1852026283514396672 |
|---|---|
| author | Wenguang Li (6528113) |
| author2 | Yan Peng (104995) Ke Peng (2220973) |
| author2_role | author author |
| author_facet | Wenguang Li (6528113) Yan Peng (104995) Ke Peng (2220973) |
| author_role | author |
| dc.creator.none.fl_str_mv | Wenguang Li (6528113) Yan Peng (104995) Ke Peng (2220973) |
| dc.date.none.fl_str_mv | 2024-09-30T17:32:02Z |
| dc.identifier.none.fl_str_mv | 10.1371/journal.pone.0311222.g009 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/figure/The_Search_process_of_the_genetic_algorithm_/27137111 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Science Policy Plant Biology Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified body mass index stacking model based performing lightgbm model model &# 8217 layer stacking model data balance processing random forest model model integration strategies xlink "> diabetes xgboost model optimized model integration xgboost model random oversampling data imbalance significant impact scientific basis results show research object reaching effects publicly available powerful tool particularly crucial new idea kaggle platform genetic algorithm early intervention early diagnosis also provides also provided |
| dc.title.none.fl_str_mv | The Search process of the genetic algorithm. |
| dc.type.none.fl_str_mv | Image Figure info:eu-repo/semantics/publishedVersion image |
| description | <div><p>Diabetes, as an incurable lifelong chronic disease, has profound and far-reaching effects on patients. Given this, early intervention is particularly crucial, as it can not only significantly improve the prognosis of patients but also provide valuable reference information for clinical treatment. This study selected the BRFSS (Behavioral Risk Factor Surveillance System) dataset, which is publicly available on the Kaggle platform, as the research object, aiming to provide a scientific basis for the early diagnosis and treatment of diabetes through advanced machine learning techniques. Firstly, the dataset was balanced using various sampling methods; secondly, a Stacking model based on GA-XGBoost (XGBoost model optimized by genetic algorithm) was constructed for the risk prediction of diabetes; finally, the interpretability of the model was deeply analyzed using Shapley values. The results show: (1) Random oversampling, ADASYN, SMOTE, and SMOTEENN were used for data balance processing, among which SMOTEENN showed better efficiency and effect in dealing with data imbalance. (2) The GA-XGBoost model optimized the hyperparameters of the XGBoost model through a genetic algorithm to improve the model’s predictive accuracy. Combined with the better-performing LightGBM model and random forest model, a two-layer Stacking model was constructed. This model not only outperforms single machine learning models in predictive effect but also provides a new idea and method in the field of model integration. (3) Shapley value analysis identified features that have a significant impact on the prediction of diabetes, such as age and body mass index. This analysis not only enhances the transparency of the model but also provides more precise treatment decision support for doctors and patients. In summary, this study has not only improved the accuracy of predicting the risk of diabetes by adopting advanced machine learning techniques and model integration strategies but also provided a powerful tool for the early diagnosis and personalized treatment of diabetes.</p></div> |
| eu_rights_str_mv | openAccess |
| id | Manara_74b0262e2e39fb6b809df36f6c2bcea0 |
| identifier_str_mv | 10.1371/journal.pone.0311222.g009 |
| network_acronym_str | Manara |
| network_name_str | ManaraRepo |
| oai_identifier_str | oai:figshare.com:article/27137111 |
| publishDate | 2024 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | The Search process of the genetic algorithm.Wenguang Li (6528113)Yan Peng (104995)Ke Peng (2220973)Science PolicyPlant BiologyBiological Sciences not elsewhere classifiedMathematical Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedbody mass indexstacking model basedperforming lightgbm modelmodel &# 8217layer stacking modeldata balance processingrandom forest modelmodel integration strategiesxlink "> diabetesxgboost model optimizedmodel integrationxgboost modelrandom oversamplingdata imbalancesignificant impactscientific basisresults showresearch objectreaching effectspublicly availablepowerful toolparticularly crucialnew ideakaggle platformgenetic algorithmearly interventionearly diagnosisalso providesalso provided<div><p>Diabetes, as an incurable lifelong chronic disease, has profound and far-reaching effects on patients. Given this, early intervention is particularly crucial, as it can not only significantly improve the prognosis of patients but also provide valuable reference information for clinical treatment. This study selected the BRFSS (Behavioral Risk Factor Surveillance System) dataset, which is publicly available on the Kaggle platform, as the research object, aiming to provide a scientific basis for the early diagnosis and treatment of diabetes through advanced machine learning techniques. Firstly, the dataset was balanced using various sampling methods; secondly, a Stacking model based on GA-XGBoost (XGBoost model optimized by genetic algorithm) was constructed for the risk prediction of diabetes; finally, the interpretability of the model was deeply analyzed using Shapley values. The results show: (1) Random oversampling, ADASYN, SMOTE, and SMOTEENN were used for data balance processing, among which SMOTEENN showed better efficiency and effect in dealing with data imbalance. (2) The GA-XGBoost model optimized the hyperparameters of the XGBoost model through a genetic algorithm to improve the model’s predictive accuracy. Combined with the better-performing LightGBM model and random forest model, a two-layer Stacking model was constructed. This model not only outperforms single machine learning models in predictive effect but also provides a new idea and method in the field of model integration. (3) Shapley value analysis identified features that have a significant impact on the prediction of diabetes, such as age and body mass index. This analysis not only enhances the transparency of the model but also provides more precise treatment decision support for doctors and patients. In summary, this study has not only improved the accuracy of predicting the risk of diabetes by adopting advanced machine learning techniques and model integration strategies but also provided a powerful tool for the early diagnosis and personalized treatment of diabetes.</p></div>2024-09-30T17:32:02ZImageFigureinfo:eu-repo/semantics/publishedVersionimage10.1371/journal.pone.0311222.g009https://figshare.com/articles/figure/The_Search_process_of_the_genetic_algorithm_/27137111CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/271371112024-09-30T17:32:02Z |
| spellingShingle | The Search process of the genetic algorithm. Wenguang Li (6528113) Science Policy Plant Biology Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified body mass index stacking model based performing lightgbm model model &# 8217 layer stacking model data balance processing random forest model model integration strategies xlink "> diabetes xgboost model optimized model integration xgboost model random oversampling data imbalance significant impact scientific basis results show research object reaching effects publicly available powerful tool particularly crucial new idea kaggle platform genetic algorithm early intervention early diagnosis also provides also provided |
| status_str | publishedVersion |
| title | The Search process of the genetic algorithm. |
| title_full | The Search process of the genetic algorithm. |
| title_fullStr | The Search process of the genetic algorithm. |
| title_full_unstemmed | The Search process of the genetic algorithm. |
| title_short | The Search process of the genetic algorithm. |
| title_sort | The Search process of the genetic algorithm. |
| topic | Science Policy Plant Biology Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified body mass index stacking model based performing lightgbm model model &# 8217 layer stacking model data balance processing random forest model model integration strategies xlink "> diabetes xgboost model optimized model integration xgboost model random oversampling data imbalance significant impact scientific basis results show research object reaching effects publicly available powerful tool particularly crucial new idea kaggle platform genetic algorithm early intervention early diagnosis also provides also provided |