Explainable phishing website detection for secure and sustainable cyber infrastructure

<p dir="ltr">Phishing is a social engineering attack and a type of cybercrime that is dangerously and constantly on the rise. Phishing attacks can impact various sectors, including governmental, social, financial, and individual businesses. Traditional methods of identifying phishing...

Full description

Saved in:

Bibliographic Details
Main Author:	Tanzila Kehkashan (20748842) (author)
Other Authors:	Maha Abdelhaq (735574) (author), Ahmad Sami Al-Shamayleh (17541495) (author), Nazish Huda (22682342) (author), Imran Ashraf Yaseen (22682345) (author), Abdelmuttlib Ibrahim Abdalla Ahmed (22682348) (author), Adnan Akhunzada (20151648) (author)
Published:	2025
Subjects:	Information and computing sciences Cybersecurity and privacy Data management and data science Machine learning Phishing website detection RF SHAP URL
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1864513536809500672
author	Tanzila Kehkashan (20748842)
author2	Maha Abdelhaq (735574) Ahmad Sami Al-Shamayleh (17541495) Nazish Huda (22682342) Imran Ashraf Yaseen (22682345) Abdelmuttlib Ibrahim Abdalla Ahmed (22682348) Adnan Akhunzada (20151648)
author2_role	author author author author author author
author_facet	Tanzila Kehkashan (20748842) Maha Abdelhaq (735574) Ahmad Sami Al-Shamayleh (17541495) Nazish Huda (22682342) Imran Ashraf Yaseen (22682345) Abdelmuttlib Ibrahim Abdalla Ahmed (22682348) Adnan Akhunzada (20151648)
author_role	author
dc.creator.none.fl_str_mv	Tanzila Kehkashan (20748842) Maha Abdelhaq (735574) Ahmad Sami Al-Shamayleh (17541495) Nazish Huda (22682342) Imran Ashraf Yaseen (22682345) Abdelmuttlib Ibrahim Abdalla Ahmed (22682348) Adnan Akhunzada (20151648)
dc.date.none.fl_str_mv	2025-11-25T03:00:00Z
dc.identifier.none.fl_str_mv	10.1038/s41598-025-27984-w
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/Explainable_phishing_website_detection_for_secure_and_sustainable_cyber_infrastructure/31995087
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Information and computing sciences Cybersecurity and privacy Data management and data science Machine learning Machine learning Phishing website detection RF SHAP URL
dc.title.none.fl_str_mv	Explainable phishing website detection for secure and sustainable cyber infrastructure
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<p dir="ltr">Phishing is a social engineering attack and a type of cybercrime that is dangerously and constantly on the rise. Phishing attacks can impact various sectors, including governmental, social, financial, and individual businesses. Traditional methods of identifying phishing websites, such as blacklist and heuristic approaches, often fail to provide sufficient protection. Moreover, traditional techniques that combine URLs, webpage content, and external features are time-consuming, require substantial computing power, and are unsuitable for devices with limited resources. Moreover, previous research has often overlooked the critical role of identifying which features are important for detection and their impact on outcomes. Traditional methods might not fully capture the significance of individual features. To overcome this issue, this research applies feature selection techniques, specifically shapley additive explanations, with each model based primarily on the URL to improve the detection process. A dataset with over 11000+ URLs and 30 varied features of the ”Phishing Website Detection” was applied from the Kaggle repository. Then, the models, namely support vector machine (SVM), random forest (RF), decision tree (DT), logistic regression(LR), and K-nearest neighbor, were trained and tested. Each model used shapely additive explanations (SHAP) to improve precision and interpretability by highlighting the most important features. It was tested using some key performance metrics such as accuracy, precision, recall, and F1 score. Compared to all the models that were tested, this random forest model indicates 97% accuracy. The proposed system offers an overall and interpretable solution for phishing detection that contributes to a safer digital environment.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: Scientific Reports<br>License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1038/s41598-025-27984-w" target="_blank">https://dx.doi.org/10.1038/s41598-025-27984-w</a></p>
eu_rights_str_mv	openAccess
id	Manara2_b674393dd9c0702135e2ef3b573cd2a4
identifier_str_mv	10.1038/s41598-025-27984-w
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/31995087
publishDate	2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	Explainable phishing website detection for secure and sustainable cyber infrastructureTanzila Kehkashan (20748842)Maha Abdelhaq (735574)Ahmad Sami Al-Shamayleh (17541495)Nazish Huda (22682342)Imran Ashraf Yaseen (22682345)Abdelmuttlib Ibrahim Abdalla Ahmed (22682348)Adnan Akhunzada (20151648)Information and computing sciencesCybersecurity and privacyData management and data scienceMachine learningMachine learningPhishing website detectionRFSHAPURL<p dir="ltr">Phishing is a social engineering attack and a type of cybercrime that is dangerously and constantly on the rise. Phishing attacks can impact various sectors, including governmental, social, financial, and individual businesses. Traditional methods of identifying phishing websites, such as blacklist and heuristic approaches, often fail to provide sufficient protection. Moreover, traditional techniques that combine URLs, webpage content, and external features are time-consuming, require substantial computing power, and are unsuitable for devices with limited resources. Moreover, previous research has often overlooked the critical role of identifying which features are important for detection and their impact on outcomes. Traditional methods might not fully capture the significance of individual features. To overcome this issue, this research applies feature selection techniques, specifically shapley additive explanations, with each model based primarily on the URL to improve the detection process. A dataset with over 11000+ URLs and 30 varied features of the ”Phishing Website Detection” was applied from the Kaggle repository. Then, the models, namely support vector machine (SVM), random forest (RF), decision tree (DT), logistic regression(LR), and K-nearest neighbor, were trained and tested. Each model used shapely additive explanations (SHAP) to improve precision and interpretability by highlighting the most important features. It was tested using some key performance metrics such as accuracy, precision, recall, and F1 score. Compared to all the models that were tested, this random forest model indicates 97% accuracy. The proposed system offers an overall and interpretable solution for phishing detection that contributes to a safer digital environment.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: Scientific Reports<br>License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1038/s41598-025-27984-w" target="_blank">https://dx.doi.org/10.1038/s41598-025-27984-w</a></p>2025-11-25T03:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1038/s41598-025-27984-whttps://figshare.com/articles/journal_contribution/Explainable_phishing_website_detection_for_secure_and_sustainable_cyber_infrastructure/31995087CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/319950872025-11-25T03:00:00Z
spellingShingle	Explainable phishing website detection for secure and sustainable cyber infrastructure Tanzila Kehkashan (20748842) Information and computing sciences Cybersecurity and privacy Data management and data science Machine learning Machine learning Phishing website detection RF SHAP URL
status_str	publishedVersion
title	Explainable phishing website detection for secure and sustainable cyber infrastructure
title_full	Explainable phishing website detection for secure and sustainable cyber infrastructure
title_fullStr	Explainable phishing website detection for secure and sustainable cyber infrastructure
title_full_unstemmed	Explainable phishing website detection for secure and sustainable cyber infrastructure
title_short	Explainable phishing website detection for secure and sustainable cyber infrastructure
title_sort	Explainable phishing website detection for secure and sustainable cyber infrastructure
topic	Information and computing sciences Cybersecurity and privacy Data management and data science Machine learning Machine learning Phishing website detection RF SHAP URL

Explainable phishing website detection for secure and sustainable cyber infrastructure

Similar Items