A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques

Every year, phishing results in losses of billions of dollars and is a major threat to the Internet economy. Phishing attacks are now most often carried out by email. To better comprehend the existing research trend of phishing email detection, several review studies have been performed. However, it...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: SALLOUM, SAID (author)
مؤلفون آخرون: GABER, TAREK (author), VADERA, SUNIL (author), SHAALAN, KHALED (author)
منشور في: 2022
الموضوعات:
الوصول للمادة أونلاين:https://bspace.buid.ac.ae/handle/1234/3028
https://doi.org/10.1109/ACCESS.2022.3183083.
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1862980618634133504
author SALLOUM, SAID
author2 GABER, TAREK
VADERA, SUNIL
SHAALAN, KHALED
author2_role author
author
author
author_facet SALLOUM, SAID
GABER, TAREK
VADERA, SUNIL
SHAALAN, KHALED
author_role author
dc.creator.none.fl_str_mv SALLOUM, SAID
GABER, TAREK
VADERA, SUNIL
SHAALAN, KHALED
dc.date.none.fl_str_mv 2022
2025-05-14T10:46:40Z
2025-05-14T10:46:40Z
dc.identifier.none.fl_str_mv Salloum, S. et al. (2022) “A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques,” IEEE Access, 10.
2169-3536
https://bspace.buid.ac.ae/handle/1234/3028
https://doi.org/10.1109/ACCESS.2022.3183083.
dc.language.none.fl_str_mv en
dc.publisher.none.fl_str_mv IEEE
dc.relation.none.fl_str_mv IEEE Accessv10 (2022): 65703-65727
dc.subject.none.fl_str_mv Phishing email detection,systematic literature review, natural language processing, machine learning
dc.title.none.fl_str_mv A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques
dc.type.none.fl_str_mv Article
description Every year, phishing results in losses of billions of dollars and is a major threat to the Internet economy. Phishing attacks are now most often carried out by email. To better comprehend the existing research trend of phishing email detection, several review studies have been performed. However, it is important to assess this issue from different perspectives. None of the surveys have ever comprehensively studied the use of Natural Language Processing (NLP) techniques for detection of phishing except one that shed light on the use of NLP techniques for classification and training purposes, while exploring a few alternatives. To bridge the gap, this study aims to systematically review and synthesise research on the use of NLP for detecting phishing emails. Based on specific predefined criteria, a total of 100 research articles published between 2006 and 2022 were identified and analysed. We study the key research areas in phishing email detection using NLP, machine learning algorithms used in phishing detection email, text features in phishing emails, datasets and resources that have been used in phishing emails, and the evaluation criteria. The findings include that the main research area in phishing detection studies is feature extraction and selection, followed by methods for classifying and optimizing the detection of phishing emails. Amongst the range of classification algorithms, support vector machines (SVMs) are heavily utilised for detecting phishing emails. The most frequently used NLP techniques are found to be TF-IDF and word embeddings. Furthermore, the most commonly used datasets for benchmarking phishing email detection methods is the Nazario phishing corpus. Also, Python is the most commonly used one for phishing email detection. It is expected that the findings of this paper can be helpful for the scientific community, especially in the field of NLP application in cybersecurity problems. This survey also is unique in the sense that it relates works to their openly available tools and resources. The analysis of the presented works revealed that not much work had been performed on Arabic language phishing emails using NLP techniques. Therefore, many open issues are associated with Arabic phishing email detection.
id budr_0900c33c53a41ddf0cf57e1b90cbfb96
identifier_str_mv Salloum, S. et al. (2022) “A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques,” IEEE Access, 10.
2169-3536
language_invalid_str_mv en
network_acronym_str budr
network_name_str The British University in Dubai repository
oai_identifier_str oai:bspace.buid.ac.ae:1234/3028
publishDate 2022
publisher.none.fl_str_mv IEEE
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing TechniquesSALLOUM, SAIDGABER, TAREKVADERA, SUNILSHAALAN, KHALEDPhishing email detection,systematic literature review, natural language processing, machine learningEvery year, phishing results in losses of billions of dollars and is a major threat to the Internet economy. Phishing attacks are now most often carried out by email. To better comprehend the existing research trend of phishing email detection, several review studies have been performed. However, it is important to assess this issue from different perspectives. None of the surveys have ever comprehensively studied the use of Natural Language Processing (NLP) techniques for detection of phishing except one that shed light on the use of NLP techniques for classification and training purposes, while exploring a few alternatives. To bridge the gap, this study aims to systematically review and synthesise research on the use of NLP for detecting phishing emails. Based on specific predefined criteria, a total of 100 research articles published between 2006 and 2022 were identified and analysed. We study the key research areas in phishing email detection using NLP, machine learning algorithms used in phishing detection email, text features in phishing emails, datasets and resources that have been used in phishing emails, and the evaluation criteria. The findings include that the main research area in phishing detection studies is feature extraction and selection, followed by methods for classifying and optimizing the detection of phishing emails. Amongst the range of classification algorithms, support vector machines (SVMs) are heavily utilised for detecting phishing emails. The most frequently used NLP techniques are found to be TF-IDF and word embeddings. Furthermore, the most commonly used datasets for benchmarking phishing email detection methods is the Nazario phishing corpus. Also, Python is the most commonly used one for phishing email detection. It is expected that the findings of this paper can be helpful for the scientific community, especially in the field of NLP application in cybersecurity problems. This survey also is unique in the sense that it relates works to their openly available tools and resources. The analysis of the presented works revealed that not much work had been performed on Arabic language phishing emails using NLP techniques. Therefore, many open issues are associated with Arabic phishing email detection.IEEE2025-05-14T10:46:40Z2025-05-14T10:46:40Z2022ArticleSalloum, S. et al. (2022) “A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques,” IEEE Access, 10.2169-3536https://bspace.buid.ac.ae/handle/1234/3028https://doi.org/10.1109/ACCESS.2022.3183083.enIEEE Accessv10 (2022): 65703-65727oai:bspace.buid.ac.ae:1234/30282025-05-14T10:49:04Z
spellingShingle A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques
SALLOUM, SAID
Phishing email detection,systematic literature review, natural language processing, machine learning
title A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques
title_full A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques
title_fullStr A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques
title_full_unstemmed A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques
title_short A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques
title_sort A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques
topic Phishing email detection,systematic literature review, natural language processing, machine learning
url https://bspace.buid.ac.ae/handle/1234/3028
https://doi.org/10.1109/ACCESS.2022.3183083.