A New English/Arabic Parallel Corpus for Phishing Emails

Phishing involves malicious activity whereby phishers, in the disguise of legitimate entities, obtain illegit imate access to the victims’ personal and private information, usually through emails. Currently, phishing attacks and threats are being handled effectively through the use of the latest phi...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: SALLOUM, SAID (author)
مؤلفون آخرون: GABER, TAREK (author), VADERA, SUNIL (author), SHAALAN, KHALED (author)
منشور في: 2023
الموضوعات:
الوصول للمادة أونلاين:https://bspace.buid.ac.ae/handle/1234/2794
https://doi.org/10.1145/3606031.
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1862980613891424256
author SALLOUM, SAID
author2 GABER, TAREK
VADERA, SUNIL
SHAALAN, KHALED
author2_role author
author
author
author_facet SALLOUM, SAID
GABER, TAREK
VADERA, SUNIL
SHAALAN, KHALED
author_role author
dc.creator.none.fl_str_mv SALLOUM, SAID
GABER, TAREK
VADERA, SUNIL
SHAALAN, KHALED
dc.date.none.fl_str_mv 2023
2025-02-11T04:46:49Z
2025-02-11T04:46:49Z
dc.identifier.none.fl_str_mv Salloum, S. et al. (2023) “A New English/Arabic Parallel Corpus for Phishing Emails,” ACM Transactions on Asian and Low-Resource Language Information Processing, 22(7), pp. 1–17.
2375-4699, 2375-4702
https://bspace.buid.ac.ae/handle/1234/2794
https://doi.org/10.1145/3606031.
dc.language.none.fl_str_mv en
dc.publisher.none.fl_str_mv ACM digital library
dc.relation.none.fl_str_mv ACM Transactions on Asian and Low-Resource Language Information Processingv22 n7 (20230725): 1-17
dc.subject.none.fl_str_mv CCS Concepts: • Computing methodologies → Language resources; Additional Key Words and Phrases: English–Arabic Parallel Corpus, phishing emails, Multilayer Perceptron, frequency–inverse document frequency
dc.title.none.fl_str_mv A New English/Arabic Parallel Corpus for Phishing Emails
dc.type.none.fl_str_mv Article
description Phishing involves malicious activity whereby phishers, in the disguise of legitimate entities, obtain illegit imate access to the victims’ personal and private information, usually through emails. Currently, phishing attacks and threats are being handled effectively through the use of the latest phishing email detection so lutions. Most current phishing detection systems assume phishing attacks to be in English, though attacks in other languages are growing. In particular, Arabic is a widely used language and therefore represents a vulnerable target. However, there is a significant shortage of corpora that can be used to develop Arabic phishing detection systems. This article presents the development of a new English-Arabic parallel phishing email corpusthat has been developed from the anti-phishing share task text (IWSPA-AP 2018). The email con tent was to be translated, and the task had been allotted to 10 volunteers who had a university background and were English and Arabic language experts. To evaluate the effectiveness of the new corpus, we develop phishing email detection models using Term Frequency–Inverse Document Frequency and Multilayer Per ceptron using 1,258 emails in Arabic and English that have equal ratios of legitimate and phishing emails. The experimental findings show that the accuracy reaches 96.82% for the Arabic dataset and 94.63% for the emails in English, providing some assurance of the potential value of the parallel corpus developed.
id budr_e92a3fe3fbd84aa2bc1081e06cc09144
identifier_str_mv Salloum, S. et al. (2023) “A New English/Arabic Parallel Corpus for Phishing Emails,” ACM Transactions on Asian and Low-Resource Language Information Processing, 22(7), pp. 1–17.
2375-4699, 2375-4702
language_invalid_str_mv en
network_acronym_str budr
network_name_str The British University in Dubai repository
oai_identifier_str oai:bspace.buid.ac.ae:1234/2794
publishDate 2023
publisher.none.fl_str_mv ACM digital library
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling A New English/Arabic Parallel Corpus for Phishing EmailsSALLOUM, SAIDGABER, TAREKVADERA, SUNILSHAALAN, KHALEDCCS Concepts: • Computing methodologies → Language resources; Additional Key Words and Phrases: English–Arabic Parallel Corpus, phishing emails, Multilayer Perceptron, frequency–inverse document frequencyPhishing involves malicious activity whereby phishers, in the disguise of legitimate entities, obtain illegit imate access to the victims’ personal and private information, usually through emails. Currently, phishing attacks and threats are being handled effectively through the use of the latest phishing email detection so lutions. Most current phishing detection systems assume phishing attacks to be in English, though attacks in other languages are growing. In particular, Arabic is a widely used language and therefore represents a vulnerable target. However, there is a significant shortage of corpora that can be used to develop Arabic phishing detection systems. This article presents the development of a new English-Arabic parallel phishing email corpusthat has been developed from the anti-phishing share task text (IWSPA-AP 2018). The email con tent was to be translated, and the task had been allotted to 10 volunteers who had a university background and were English and Arabic language experts. To evaluate the effectiveness of the new corpus, we develop phishing email detection models using Term Frequency–Inverse Document Frequency and Multilayer Per ceptron using 1,258 emails in Arabic and English that have equal ratios of legitimate and phishing emails. The experimental findings show that the accuracy reaches 96.82% for the Arabic dataset and 94.63% for the emails in English, providing some assurance of the potential value of the parallel corpus developed.ACM digital library2025-02-11T04:46:49Z2025-02-11T04:46:49Z2023ArticleSalloum, S. et al. (2023) “A New English/Arabic Parallel Corpus for Phishing Emails,” ACM Transactions on Asian and Low-Resource Language Information Processing, 22(7), pp. 1–17.2375-4699, 2375-4702https://bspace.buid.ac.ae/handle/1234/2794https://doi.org/10.1145/3606031.enACM Transactions on Asian and Low-Resource Language Information Processingv22 n7 (20230725): 1-17oai:bspace.buid.ac.ae:1234/27942026-01-29T17:13:49Z
spellingShingle A New English/Arabic Parallel Corpus for Phishing Emails
SALLOUM, SAID
CCS Concepts: • Computing methodologies → Language resources; Additional Key Words and Phrases: English–Arabic Parallel Corpus, phishing emails, Multilayer Perceptron, frequency–inverse document frequency
title A New English/Arabic Parallel Corpus for Phishing Emails
title_full A New English/Arabic Parallel Corpus for Phishing Emails
title_fullStr A New English/Arabic Parallel Corpus for Phishing Emails
title_full_unstemmed A New English/Arabic Parallel Corpus for Phishing Emails
title_short A New English/Arabic Parallel Corpus for Phishing Emails
title_sort A New English/Arabic Parallel Corpus for Phishing Emails
topic CCS Concepts: • Computing methodologies → Language resources; Additional Key Words and Phrases: English–Arabic Parallel Corpus, phishing emails, Multilayer Perceptron, frequency–inverse document frequency
url https://bspace.buid.ac.ae/handle/1234/2794
https://doi.org/10.1145/3606031.