Hybrid model for precise hepatitis-C classification using improved random forest and SVM method

<div><p>Hepatitis C Virus (HCV) is a viral infection that causes liver inflammation. Annually, approximately 3.4 million cases of HCV are reported worldwide. A diagnosis of HCV in earlier stages helps to save lives. In the HCV review, the authors used a single ML-based prediction model i...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Umesh Kumar Lilhore (17727684) (author)
مؤلفون آخرون: Poongodi Manoharan (17727687) (author), Jasminder Kaur Sandhu (17727690) (author), Sarita Simaiya (17727693) (author), Surjeet Dalal (4906894) (author), Abdullah M. Baqasah (17542077) (author), Majed Alsafyani (17727696) (author), Roobaea Alroobaea (8698965) (author), Ismail Keshta (17727699) (author), Kaamran Raahemifar (707645) (author)
منشور في: 2023
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513533516972032
author Umesh Kumar Lilhore (17727684)
author2 Poongodi Manoharan (17727687)
Jasminder Kaur Sandhu (17727690)
Sarita Simaiya (17727693)
Surjeet Dalal (4906894)
Abdullah M. Baqasah (17542077)
Majed Alsafyani (17727696)
Roobaea Alroobaea (8698965)
Ismail Keshta (17727699)
Kaamran Raahemifar (707645)
author2_role author
author
author
author
author
author
author
author
author
author_facet Umesh Kumar Lilhore (17727684)
Poongodi Manoharan (17727687)
Jasminder Kaur Sandhu (17727690)
Sarita Simaiya (17727693)
Surjeet Dalal (4906894)
Abdullah M. Baqasah (17542077)
Majed Alsafyani (17727696)
Roobaea Alroobaea (8698965)
Ismail Keshta (17727699)
Kaamran Raahemifar (707645)
author_role author
dc.creator.none.fl_str_mv Umesh Kumar Lilhore (17727684)
Poongodi Manoharan (17727687)
Jasminder Kaur Sandhu (17727690)
Sarita Simaiya (17727693)
Surjeet Dalal (4906894)
Abdullah M. Baqasah (17542077)
Majed Alsafyani (17727696)
Roobaea Alroobaea (8698965)
Ismail Keshta (17727699)
Kaamran Raahemifar (707645)
dc.date.none.fl_str_mv 2023-08-01T00:00:00Z
dc.identifier.none.fl_str_mv 10.1038/s41598-023-36605-3
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Hybrid_model_for_precise_hepatitis-C_classification_using_improved_random_forest_and_SVM_method/24935979
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biomedical and clinical sciences
Medical biotechnology
Hybrid
hepatitis‑C
SVM method
Hepatitis C Virus (HCV)
dc.title.none.fl_str_mv Hybrid model for precise hepatitis-C classification using improved random forest and SVM method
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <div><p>Hepatitis C Virus (HCV) is a viral infection that causes liver inflammation. Annually, approximately 3.4 million cases of HCV are reported worldwide. A diagnosis of HCV in earlier stages helps to save lives. In the HCV review, the authors used a single ML-based prediction model in the current research, which encounters several issues, i.e., poor accuracy, data imbalance, and overfitting. This research proposed a Hybrid Predictive Model (HPM) based on an improved random forest and support vector machine to overcome existing research limitations. The proposed model improves a random forest method by adding a bootstrapping approach. The existing RF method is enhanced by adding a bootstrapping process, which helps eliminate the tree’s minor features iteratively to build a strong forest. It improves the performance of the HPM model. The proposed HPM model utilizes a ‘Ranker method’ to rank the dataset features and applies an IRF with SVM, selecting higher-ranked feature elements to build the prediction model. This research uses the online HCV dataset from UCI to measure the proposed model’s performance. The dataset is highly imbalanced; to deal with this issue, we utilized the synthetic minority over-sampling technique (SMOTE). This research performs two experiments. The first experiment is based on data splitting methods, K-fold cross-validation, and training: testing-based splitting. The proposed method achieved an accuracy of 95.89% for k = 5 and 96.29% for k = 10; for the training and testing-based split, the proposed method achieved 91.24% for 80:20 and 92.39% for 70:30, which is the best compared to the existing SVM, MARS, RF, DT, and BGLM methods. In experiment 2, the analysis is performed using feature selection (with SMOTE and without SMOTE). The proposed method achieves an accuracy of 41.541% without SMOTE and 96.82% with SMOTE-based feature selection, which is better than existing ML methods. The experimental results prove the importance of feature selection to achieve higher accuracy in HCV research.</p><p> </p></div><h2>Other Information</h2> <p> Published in: Scientific Reports<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1038/s41598-023-36605-3" target="_blank">https://dx.doi.org/10.1038/s41598-023-36605-3</a></p>
eu_rights_str_mv openAccess
id Manara2_a6f5ef1884a52864f13bc940a18d151f
identifier_str_mv 10.1038/s41598-023-36605-3
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/24935979
publishDate 2023
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Hybrid model for precise hepatitis-C classification using improved random forest and SVM methodUmesh Kumar Lilhore (17727684)Poongodi Manoharan (17727687)Jasminder Kaur Sandhu (17727690)Sarita Simaiya (17727693)Surjeet Dalal (4906894)Abdullah M. Baqasah (17542077)Majed Alsafyani (17727696)Roobaea Alroobaea (8698965)Ismail Keshta (17727699)Kaamran Raahemifar (707645)Biomedical and clinical sciencesMedical biotechnologyHybridhepatitis‑CSVM methodHepatitis C Virus (HCV)<div><p>Hepatitis C Virus (HCV) is a viral infection that causes liver inflammation. Annually, approximately 3.4 million cases of HCV are reported worldwide. A diagnosis of HCV in earlier stages helps to save lives. In the HCV review, the authors used a single ML-based prediction model in the current research, which encounters several issues, i.e., poor accuracy, data imbalance, and overfitting. This research proposed a Hybrid Predictive Model (HPM) based on an improved random forest and support vector machine to overcome existing research limitations. The proposed model improves a random forest method by adding a bootstrapping approach. The existing RF method is enhanced by adding a bootstrapping process, which helps eliminate the tree’s minor features iteratively to build a strong forest. It improves the performance of the HPM model. The proposed HPM model utilizes a ‘Ranker method’ to rank the dataset features and applies an IRF with SVM, selecting higher-ranked feature elements to build the prediction model. This research uses the online HCV dataset from UCI to measure the proposed model’s performance. The dataset is highly imbalanced; to deal with this issue, we utilized the synthetic minority over-sampling technique (SMOTE). This research performs two experiments. The first experiment is based on data splitting methods, K-fold cross-validation, and training: testing-based splitting. The proposed method achieved an accuracy of 95.89% for k = 5 and 96.29% for k = 10; for the training and testing-based split, the proposed method achieved 91.24% for 80:20 and 92.39% for 70:30, which is the best compared to the existing SVM, MARS, RF, DT, and BGLM methods. In experiment 2, the analysis is performed using feature selection (with SMOTE and without SMOTE). The proposed method achieves an accuracy of 41.541% without SMOTE and 96.82% with SMOTE-based feature selection, which is better than existing ML methods. The experimental results prove the importance of feature selection to achieve higher accuracy in HCV research.</p><p> </p></div><h2>Other Information</h2> <p> Published in: Scientific Reports<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1038/s41598-023-36605-3" target="_blank">https://dx.doi.org/10.1038/s41598-023-36605-3</a></p>2023-08-01T00:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1038/s41598-023-36605-3https://figshare.com/articles/journal_contribution/Hybrid_model_for_precise_hepatitis-C_classification_using_improved_random_forest_and_SVM_method/24935979CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/249359792023-08-01T00:00:00Z
spellingShingle Hybrid model for precise hepatitis-C classification using improved random forest and SVM method
Umesh Kumar Lilhore (17727684)
Biomedical and clinical sciences
Medical biotechnology
Hybrid
hepatitis‑C
SVM method
Hepatitis C Virus (HCV)
status_str publishedVersion
title Hybrid model for precise hepatitis-C classification using improved random forest and SVM method
title_full Hybrid model for precise hepatitis-C classification using improved random forest and SVM method
title_fullStr Hybrid model for precise hepatitis-C classification using improved random forest and SVM method
title_full_unstemmed Hybrid model for precise hepatitis-C classification using improved random forest and SVM method
title_short Hybrid model for precise hepatitis-C classification using improved random forest and SVM method
title_sort Hybrid model for precise hepatitis-C classification using improved random forest and SVM method
topic Biomedical and clinical sciences
Medical biotechnology
Hybrid
hepatitis‑C
SVM method
Hepatitis C Virus (HCV)