Base learner parameters.
<div><p>More than 90% of deaths due to breast cancer (BC) are due to metastasis-related complications, with invasive ductal carcinoma (IDC) of the breast being the most common pathologic type of breast cancer and highly susceptible to metastasis to distant organs. BC patients who develop...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , , , , , , , , |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1852022433249230848 |
|---|---|
| author | Jingru Dong (14076094) |
| author2 | Ruijiao Lei (20790315) Feiyang Ma (6183380) Lu Yu (74016) Lanlan Wang (160537) Shangzhi Xu (625782) Yunhua Hu (5215919) Jialin Sun (356541) Wenwen Zhang (331647) Haixia Wang (173360) Li Zhang (8200) |
| author2_role | author author author author author author author author author author |
| author_facet | Jingru Dong (14076094) Ruijiao Lei (20790315) Feiyang Ma (6183380) Lu Yu (74016) Lanlan Wang (160537) Shangzhi Xu (625782) Yunhua Hu (5215919) Jialin Sun (356541) Wenwen Zhang (331647) Haixia Wang (173360) Li Zhang (8200) |
| author_role | author |
| dc.creator.none.fl_str_mv | Jingru Dong (14076094) Ruijiao Lei (20790315) Feiyang Ma (6183380) Lu Yu (74016) Lanlan Wang (160537) Shangzhi Xu (625782) Yunhua Hu (5215919) Jialin Sun (356541) Wenwen Zhang (331647) Haixia Wang (173360) Li Zhang (8200) |
| dc.date.none.fl_str_mv | 2025-02-26T18:41:25Z |
| dc.identifier.none.fl_str_mv | 10.1371/journal.pone.0310410.t006 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/dataset/Base_learner_parameters_/28500010 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Biotechnology Developmental Biology Cancer Plant Biology Environmental Sciences not elsewhere classified Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified voting mechanism using voting mechanism exhibits valuable reference support invasive ductal carcinoma experimental results show common pathologic type best prediction performance hybrid model based detecting cancer metastasis risk prediction model develop cancer metastases idc hematogenous metastasis distant metastasis risk breast </ p based prediction breast cancer distant organs constructed based xlink "> work efficiency used anaconda text mining roc curve related complications random forest provide patients poor quality poor prognosis machine learning logistic regression jupyter notebooks increased chances highly susceptible four algorithms following metrics extremely important data processing also improves 94 ). |
| dc.title.none.fl_str_mv | Base learner parameters. |
| dc.type.none.fl_str_mv | Dataset info:eu-repo/semantics/publishedVersion dataset |
| description | <div><p>More than 90% of deaths due to breast cancer (BC) are due to metastasis-related complications, with invasive ductal carcinoma (IDC) of the breast being the most common pathologic type of breast cancer and highly susceptible to metastasis to distant organs. BC patients who develop cancer metastases are more likely to have a poor prognosis and poor quality of life, so it is extremely important to recognize and diagnose whether distant metastases have occurred in IDC as early as possible. In this study, we develop a non-invasive breast cancer classification system for detecting cancer metastasis. We used Anaconda-Jupyter notebooks to develop various Python programming modules for text mining, data processing, and machine learning (ML) methods. A risk prediction model was constructed based on four algorithms: Random Forest, XGBoost, Logistic Regression, and SVM. Additionally, we developed a hybrid model based on a voting mechanism using these four algorithms as the base models. The models were compared and evaluated by the following metrics: accuracy, precision, recall, F1-score, and area under the ROC curve (AUC) values. The experimental results show that the hybrid model based on the voting mechanism exhibits the best prediction performance (accuracy: 0.867, precision: 0.929, recall: 0.805, F1-score: 0.856, AUC: 0.94). This stable risk prediction model provides a valuable reference support for doctors in assessing and diagnosing the risk of IDC hematogenous metastasis. It also improves the work efficiency of doctors and strives to provide patients with increased chances of survival.</p></div> |
| eu_rights_str_mv | openAccess |
| id | Manara_ce021875eead357bf064d8fde3bee2d8 |
| identifier_str_mv | 10.1371/journal.pone.0310410.t006 |
| network_acronym_str | Manara |
| network_name_str | ManaraRepo |
| oai_identifier_str | oai:figshare.com:article/28500010 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Base learner parameters.Jingru Dong (14076094)Ruijiao Lei (20790315)Feiyang Ma (6183380)Lu Yu (74016)Lanlan Wang (160537)Shangzhi Xu (625782)Yunhua Hu (5215919)Jialin Sun (356541)Wenwen Zhang (331647)Haixia Wang (173360)Li Zhang (8200)BiotechnologyDevelopmental BiologyCancerPlant BiologyEnvironmental Sciences not elsewhere classifiedBiological Sciences not elsewhere classifiedMathematical Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedvoting mechanism usingvoting mechanism exhibitsvaluable reference supportinvasive ductal carcinomaexperimental results showcommon pathologic typebest prediction performancehybrid model baseddetecting cancer metastasisrisk prediction modeldevelop cancer metastasesidc hematogenous metastasisdistant metastasis riskbreast </ pbased predictionbreast cancerdistant organsconstructed basedxlink ">work efficiencyused anacondatext miningroc curverelated complicationsrandom forestprovide patientspoor qualitypoor prognosismachine learninglogistic regressionjupyter notebooksincreased chanceshighly susceptiblefour algorithmsfollowing metricsextremely importantdata processingalso improves94 ).<div><p>More than 90% of deaths due to breast cancer (BC) are due to metastasis-related complications, with invasive ductal carcinoma (IDC) of the breast being the most common pathologic type of breast cancer and highly susceptible to metastasis to distant organs. BC patients who develop cancer metastases are more likely to have a poor prognosis and poor quality of life, so it is extremely important to recognize and diagnose whether distant metastases have occurred in IDC as early as possible. In this study, we develop a non-invasive breast cancer classification system for detecting cancer metastasis. We used Anaconda-Jupyter notebooks to develop various Python programming modules for text mining, data processing, and machine learning (ML) methods. A risk prediction model was constructed based on four algorithms: Random Forest, XGBoost, Logistic Regression, and SVM. Additionally, we developed a hybrid model based on a voting mechanism using these four algorithms as the base models. The models were compared and evaluated by the following metrics: accuracy, precision, recall, F1-score, and area under the ROC curve (AUC) values. The experimental results show that the hybrid model based on the voting mechanism exhibits the best prediction performance (accuracy: 0.867, precision: 0.929, recall: 0.805, F1-score: 0.856, AUC: 0.94). This stable risk prediction model provides a valuable reference support for doctors in assessing and diagnosing the risk of IDC hematogenous metastasis. It also improves the work efficiency of doctors and strives to provide patients with increased chances of survival.</p></div>2025-02-26T18:41:25ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1371/journal.pone.0310410.t006https://figshare.com/articles/dataset/Base_learner_parameters_/28500010CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/285000102025-02-26T18:41:25Z |
| spellingShingle | Base learner parameters. Jingru Dong (14076094) Biotechnology Developmental Biology Cancer Plant Biology Environmental Sciences not elsewhere classified Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified voting mechanism using voting mechanism exhibits valuable reference support invasive ductal carcinoma experimental results show common pathologic type best prediction performance hybrid model based detecting cancer metastasis risk prediction model develop cancer metastases idc hematogenous metastasis distant metastasis risk breast </ p based prediction breast cancer distant organs constructed based xlink "> work efficiency used anaconda text mining roc curve related complications random forest provide patients poor quality poor prognosis machine learning logistic regression jupyter notebooks increased chances highly susceptible four algorithms following metrics extremely important data processing also improves 94 ). |
| status_str | publishedVersion |
| title | Base learner parameters. |
| title_full | Base learner parameters. |
| title_fullStr | Base learner parameters. |
| title_full_unstemmed | Base learner parameters. |
| title_short | Base learner parameters. |
| title_sort | Base learner parameters. |
| topic | Biotechnology Developmental Biology Cancer Plant Biology Environmental Sciences not elsewhere classified Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified voting mechanism using voting mechanism exhibits valuable reference support invasive ductal carcinoma experimental results show common pathologic type best prediction performance hybrid model based detecting cancer metastasis risk prediction model develop cancer metastases idc hematogenous metastasis distant metastasis risk breast </ p based prediction breast cancer distant organs constructed based xlink "> work efficiency used anaconda text mining roc curve related complications random forest provide patients poor quality poor prognosis machine learning logistic regression jupyter notebooks increased chances highly susceptible four algorithms following metrics extremely important data processing also improves 94 ). |