Base learner parameters.

<div><p>More than 90% of deaths due to breast cancer (BC) are due to metastasis-related complications, with invasive ductal carcinoma (IDC) of the breast being the most common pathologic type of breast cancer and highly susceptible to metastasis to distant organs. BC patients who develop...

Full description

Saved in:
Bibliographic Details
Main Author: Jingru Dong (14076094) (author)
Other Authors: Ruijiao Lei (20790315) (author), Feiyang Ma (6183380) (author), Lu Yu (74016) (author), Lanlan Wang (160537) (author), Shangzhi Xu (625782) (author), Yunhua Hu (5215919) (author), Jialin Sun (356541) (author), Wenwen Zhang (331647) (author), Haixia Wang (173360) (author), Li Zhang (8200) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1852022433249230848
author Jingru Dong (14076094)
author2 Ruijiao Lei (20790315)
Feiyang Ma (6183380)
Lu Yu (74016)
Lanlan Wang (160537)
Shangzhi Xu (625782)
Yunhua Hu (5215919)
Jialin Sun (356541)
Wenwen Zhang (331647)
Haixia Wang (173360)
Li Zhang (8200)
author2_role author
author
author
author
author
author
author
author
author
author
author_facet Jingru Dong (14076094)
Ruijiao Lei (20790315)
Feiyang Ma (6183380)
Lu Yu (74016)
Lanlan Wang (160537)
Shangzhi Xu (625782)
Yunhua Hu (5215919)
Jialin Sun (356541)
Wenwen Zhang (331647)
Haixia Wang (173360)
Li Zhang (8200)
author_role author
dc.creator.none.fl_str_mv Jingru Dong (14076094)
Ruijiao Lei (20790315)
Feiyang Ma (6183380)
Lu Yu (74016)
Lanlan Wang (160537)
Shangzhi Xu (625782)
Yunhua Hu (5215919)
Jialin Sun (356541)
Wenwen Zhang (331647)
Haixia Wang (173360)
Li Zhang (8200)
dc.date.none.fl_str_mv 2025-02-26T18:41:25Z
dc.identifier.none.fl_str_mv 10.1371/journal.pone.0310410.t006
dc.relation.none.fl_str_mv https://figshare.com/articles/dataset/Base_learner_parameters_/28500010
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biotechnology
Developmental Biology
Cancer
Plant Biology
Environmental Sciences not elsewhere classified
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
Information Systems not elsewhere classified
voting mechanism using
voting mechanism exhibits
valuable reference support
invasive ductal carcinoma
experimental results show
common pathologic type
best prediction performance
hybrid model based
detecting cancer metastasis
risk prediction model
develop cancer metastases
idc hematogenous metastasis
distant metastasis risk
breast </ p
based prediction
breast cancer
distant organs
constructed based
xlink ">
work efficiency
used anaconda
text mining
roc curve
related complications
random forest
provide patients
poor quality
poor prognosis
machine learning
logistic regression
jupyter notebooks
increased chances
highly susceptible
four algorithms
following metrics
extremely important
data processing
also improves
94 ).
dc.title.none.fl_str_mv Base learner parameters.
dc.type.none.fl_str_mv Dataset
info:eu-repo/semantics/publishedVersion
dataset
description <div><p>More than 90% of deaths due to breast cancer (BC) are due to metastasis-related complications, with invasive ductal carcinoma (IDC) of the breast being the most common pathologic type of breast cancer and highly susceptible to metastasis to distant organs. BC patients who develop cancer metastases are more likely to have a poor prognosis and poor quality of life, so it is extremely important to recognize and diagnose whether distant metastases have occurred in IDC as early as possible. In this study, we develop a non-invasive breast cancer classification system for detecting cancer metastasis. We used Anaconda-Jupyter notebooks to develop various Python programming modules for text mining, data processing, and machine learning (ML) methods. A risk prediction model was constructed based on four algorithms: Random Forest, XGBoost, Logistic Regression, and SVM. Additionally, we developed a hybrid model based on a voting mechanism using these four algorithms as the base models. The models were compared and evaluated by the following metrics: accuracy, precision, recall, F1-score, and area under the ROC curve (AUC) values. The experimental results show that the hybrid model based on the voting mechanism exhibits the best prediction performance (accuracy: 0.867, precision: 0.929, recall: 0.805, F1-score: 0.856, AUC: 0.94). This stable risk prediction model provides a valuable reference support for doctors in assessing and diagnosing the risk of IDC hematogenous metastasis. It also improves the work efficiency of doctors and strives to provide patients with increased chances of survival.</p></div>
eu_rights_str_mv openAccess
id Manara_ce021875eead357bf064d8fde3bee2d8
identifier_str_mv 10.1371/journal.pone.0310410.t006
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/28500010
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Base learner parameters.Jingru Dong (14076094)Ruijiao Lei (20790315)Feiyang Ma (6183380)Lu Yu (74016)Lanlan Wang (160537)Shangzhi Xu (625782)Yunhua Hu (5215919)Jialin Sun (356541)Wenwen Zhang (331647)Haixia Wang (173360)Li Zhang (8200)BiotechnologyDevelopmental BiologyCancerPlant BiologyEnvironmental Sciences not elsewhere classifiedBiological Sciences not elsewhere classifiedMathematical Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedvoting mechanism usingvoting mechanism exhibitsvaluable reference supportinvasive ductal carcinomaexperimental results showcommon pathologic typebest prediction performancehybrid model baseddetecting cancer metastasisrisk prediction modeldevelop cancer metastasesidc hematogenous metastasisdistant metastasis riskbreast </ pbased predictionbreast cancerdistant organsconstructed basedxlink ">work efficiencyused anacondatext miningroc curverelated complicationsrandom forestprovide patientspoor qualitypoor prognosismachine learninglogistic regressionjupyter notebooksincreased chanceshighly susceptiblefour algorithmsfollowing metricsextremely importantdata processingalso improves94 ).<div><p>More than 90% of deaths due to breast cancer (BC) are due to metastasis-related complications, with invasive ductal carcinoma (IDC) of the breast being the most common pathologic type of breast cancer and highly susceptible to metastasis to distant organs. BC patients who develop cancer metastases are more likely to have a poor prognosis and poor quality of life, so it is extremely important to recognize and diagnose whether distant metastases have occurred in IDC as early as possible. In this study, we develop a non-invasive breast cancer classification system for detecting cancer metastasis. We used Anaconda-Jupyter notebooks to develop various Python programming modules for text mining, data processing, and machine learning (ML) methods. A risk prediction model was constructed based on four algorithms: Random Forest, XGBoost, Logistic Regression, and SVM. Additionally, we developed a hybrid model based on a voting mechanism using these four algorithms as the base models. The models were compared and evaluated by the following metrics: accuracy, precision, recall, F1-score, and area under the ROC curve (AUC) values. The experimental results show that the hybrid model based on the voting mechanism exhibits the best prediction performance (accuracy: 0.867, precision: 0.929, recall: 0.805, F1-score: 0.856, AUC: 0.94). This stable risk prediction model provides a valuable reference support for doctors in assessing and diagnosing the risk of IDC hematogenous metastasis. It also improves the work efficiency of doctors and strives to provide patients with increased chances of survival.</p></div>2025-02-26T18:41:25ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1371/journal.pone.0310410.t006https://figshare.com/articles/dataset/Base_learner_parameters_/28500010CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/285000102025-02-26T18:41:25Z
spellingShingle Base learner parameters.
Jingru Dong (14076094)
Biotechnology
Developmental Biology
Cancer
Plant Biology
Environmental Sciences not elsewhere classified
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
Information Systems not elsewhere classified
voting mechanism using
voting mechanism exhibits
valuable reference support
invasive ductal carcinoma
experimental results show
common pathologic type
best prediction performance
hybrid model based
detecting cancer metastasis
risk prediction model
develop cancer metastases
idc hematogenous metastasis
distant metastasis risk
breast </ p
based prediction
breast cancer
distant organs
constructed based
xlink ">
work efficiency
used anaconda
text mining
roc curve
related complications
random forest
provide patients
poor quality
poor prognosis
machine learning
logistic regression
jupyter notebooks
increased chances
highly susceptible
four algorithms
following metrics
extremely important
data processing
also improves
94 ).
status_str publishedVersion
title Base learner parameters.
title_full Base learner parameters.
title_fullStr Base learner parameters.
title_full_unstemmed Base learner parameters.
title_short Base learner parameters.
title_sort Base learner parameters.
topic Biotechnology
Developmental Biology
Cancer
Plant Biology
Environmental Sciences not elsewhere classified
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
Information Systems not elsewhere classified
voting mechanism using
voting mechanism exhibits
valuable reference support
invasive ductal carcinoma
experimental results show
common pathologic type
best prediction performance
hybrid model based
detecting cancer metastasis
risk prediction model
develop cancer metastases
idc hematogenous metastasis
distant metastasis risk
breast </ p
based prediction
breast cancer
distant organs
constructed based
xlink ">
work efficiency
used anaconda
text mining
roc curve
related complications
random forest
provide patients
poor quality
poor prognosis
machine learning
logistic regression
jupyter notebooks
increased chances
highly susceptible
four algorithms
following metrics
extremely important
data processing
also improves
94 ).