Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk
<p dir="ltr">This paper introduces novel nonparametric supervised learning techniques for classifying massive datasets, addressing key limitations of existing methods in Big and Streaming Data framework. We propose an offline kernel-based classifier enhanced by Batch Principal Compon...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | |
| منشور في: |
2025
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1864513531473297408 |
|---|---|
| author | Mohamed Chaouch (17983846) |
| author2 | Omama M. Al-Hamed (18021667) |
| author2_role | author |
| author_facet | Mohamed Chaouch (17983846) Omama M. Al-Hamed (18021667) |
| author_role | author |
| dc.creator.none.fl_str_mv | Mohamed Chaouch (17983846) Omama M. Al-Hamed (18021667) |
| dc.date.none.fl_str_mv | 2025-07-30T12:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1109/access.2025.3591883 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/Scalable_Nonparametric_Supervised_Learning_for_Streaming_and_Massive_Data_Applications_in_Healthcare_Monitoring_and_Credit_Risk/30971302 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Biomedical and clinical sciences Reproductive medicine Health sciences Health services and systems Information and computing sciences Data management and data science Machine learning Big data applications classification algorithms dimensionality reduction kernel methods machine learning nonparametric statistics recursive estimation principal component analysis stochastic approximation algorithms supervised learning Vectors Principal component analysis Posterior probability Covariance matrices Accuracy Random forests Probability distribution |
| dc.title.none.fl_str_mv | Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p dir="ltr">This paper introduces novel nonparametric supervised learning techniques for classifying massive datasets, addressing key limitations of existing methods in Big and Streaming Data framework. We propose an offline kernel-based classifier enhanced by Batch Principal Component Analysis (PCA) for dimensionality reduction to mitigate the “curse of dimensionality”. Additionally, an online classifier is developed for streaming data, combining online PCA with a kernel-based recursive classifier using a stochastic approximation algorithm. Application to fetal well-being monitoring demonstrates that the online classifier achieves a competitive median misclassification rate (11.92%), comparable to the offline classifier (11.54%) and Random Forest (11.31%), while requiring only 1/15th of the offline classifier’s computation time. Receiver Operating Characteristic (ROC) analysis shows superior Area Under the Curve (AUC) for the offline classifier but at a significant computational cost. A second study on larger database of credit scoring confirms these findings, showing that the online classifier achieves an F1-score of 96.40% and an accuracy of 93.08%, closely matching the performance of neural networks (96.46%, 93.22%) and boosting (96.51%, 93.31%). Notably, the online classifier accomplishes this with a CPU time of only 0.87 seconds per classification - over 600 times faster than neural networks - demonstrating its effectiveness for high-frequency, real-time financial decision-making.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3591883" target="_blank">https://dx.doi.org/10.1109/access.2025.3591883</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_cc620cbb844fcb2473cd8b6406fe9721 |
| identifier_str_mv | 10.1109/access.2025.3591883 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/30971302 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit RiskMohamed Chaouch (17983846)Omama M. Al-Hamed (18021667)Biomedical and clinical sciencesReproductive medicineHealth sciencesHealth services and systemsInformation and computing sciencesData management and data scienceMachine learningBig data applicationsclassification algorithmsdimensionality reductionkernel methodsmachine learningnonparametric statisticsrecursive estimationprincipal component analysisstochastic approximation algorithmssupervised learningVectorsPrincipal component analysisPosterior probabilityCovariance matricesAccuracyRandom forestsProbability distribution<p dir="ltr">This paper introduces novel nonparametric supervised learning techniques for classifying massive datasets, addressing key limitations of existing methods in Big and Streaming Data framework. We propose an offline kernel-based classifier enhanced by Batch Principal Component Analysis (PCA) for dimensionality reduction to mitigate the “curse of dimensionality”. Additionally, an online classifier is developed for streaming data, combining online PCA with a kernel-based recursive classifier using a stochastic approximation algorithm. Application to fetal well-being monitoring demonstrates that the online classifier achieves a competitive median misclassification rate (11.92%), comparable to the offline classifier (11.54%) and Random Forest (11.31%), while requiring only 1/15th of the offline classifier’s computation time. Receiver Operating Characteristic (ROC) analysis shows superior Area Under the Curve (AUC) for the offline classifier but at a significant computational cost. A second study on larger database of credit scoring confirms these findings, showing that the online classifier achieves an F1-score of 96.40% and an accuracy of 93.08%, closely matching the performance of neural networks (96.46%, 93.22%) and boosting (96.51%, 93.31%). Notably, the online classifier accomplishes this with a CPU time of only 0.87 seconds per classification - over 600 times faster than neural networks - demonstrating its effectiveness for high-frequency, real-time financial decision-making.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3591883" target="_blank">https://dx.doi.org/10.1109/access.2025.3591883</a></p>2025-07-30T12:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2025.3591883https://figshare.com/articles/journal_contribution/Scalable_Nonparametric_Supervised_Learning_for_Streaming_and_Massive_Data_Applications_in_Healthcare_Monitoring_and_Credit_Risk/30971302CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/309713022025-07-30T12:00:00Z |
| spellingShingle | Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk Mohamed Chaouch (17983846) Biomedical and clinical sciences Reproductive medicine Health sciences Health services and systems Information and computing sciences Data management and data science Machine learning Big data applications classification algorithms dimensionality reduction kernel methods machine learning nonparametric statistics recursive estimation principal component analysis stochastic approximation algorithms supervised learning Vectors Principal component analysis Posterior probability Covariance matrices Accuracy Random forests Probability distribution |
| status_str | publishedVersion |
| title | Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk |
| title_full | Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk |
| title_fullStr | Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk |
| title_full_unstemmed | Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk |
| title_short | Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk |
| title_sort | Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk |
| topic | Biomedical and clinical sciences Reproductive medicine Health sciences Health services and systems Information and computing sciences Data management and data science Machine learning Big data applications classification algorithms dimensionality reduction kernel methods machine learning nonparametric statistics recursive estimation principal component analysis stochastic approximation algorithms supervised learning Vectors Principal component analysis Posterior probability Covariance matrices Accuracy Random forests Probability distribution |