Cyberbullying Detection in Arabic Text using Deep Learning

In the new era of digital communications, cyberbullying is a significant concern for society. Cyberbullying involves the use of communication technology and data, including messages, photographs, and videos, to undertake aggressive negative actions to harm others. This practice has spread substantia...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: ALBAYARI, REEM RAMADAN SA’ID (author)
منشور في: 2023
الوصول للمادة أونلاين:https://bspace.buid.ac.ae/handle/1234/2449
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1862980611712483328
author ALBAYARI, REEM RAMADAN SA’ID
author_facet ALBAYARI, REEM RAMADAN SA’ID
author_role author
dc.contributor.none.fl_str_mv Professor Sherief Abdallah
dc.creator.none.fl_str_mv ALBAYARI, REEM RAMADAN SA’ID
dc.date.none.fl_str_mv 2023-11-24T07:10:12Z
2023-11-24T07:10:12Z
2023-03
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv 20181860
https://bspace.buid.ac.ae/handle/1234/2449
dc.language.none.fl_str_mv en
dc.publisher.none.fl_str_mv The British University in Dubai (BUiD)
dc.title.none.fl_str_mv Cyberbullying Detection in Arabic Text using Deep Learning
dc.type.none.fl_str_mv Thesis
description In the new era of digital communications, cyberbullying is a significant concern for society. Cyberbullying involves the use of communication technology and data, including messages, photographs, and videos, to undertake aggressive negative actions to harm others. This practice has spread substantially due to rapid technological development and has gained significant attention in several domains involving data exchange, such as e-commerce, digital marketing, social media platforms, and others. Cyberbullying can negatively impact stakeholders, and can vary from psychological to pathological, such as self-isolation, depression, and anxiety potentially leading to suicide. Hence, detecting any act of cyberbullying in an automated manner will be helpful for stakeholders to prevent any unfortunate results from the victim’s perspective. If conducted automatically, rather than relying on human moderators, the process will be faster, enabling the early detection of cyberbullying before severe harm is caused. Data-driven approaches, such as machine learning (ML), particularly deep learning (DL), have shown promising results. DL approaches provide highly accurate predictive models for detecting cyberbullying. The first contribution of this thesis is conducting an in-depth meta-analysis of existing evaluation methods, classification techniques, and datasets related to ML for cyberbullying problems. The meta-analysis shows that ML approaches, particularly DL, have not been extensively studied for the Arabic text classification of cyberbullying. A potential reason for this research gap is the lack of Arabic-language repositories focusing on cyberbullying despite the large amount of Arabic text that can be extracted from Arabic social media platforms besides e-commerce and mobile applications. Consequently, I have designed and built a new Arabic text repository, the largest available, that can serve me and others in investigating various classifiers to deal with the issue of detecting cyberbullying. This repository contains 200,000 comments, 46,898 of which were annotated by three human annotators. First, the comments were classified as (positive/negative/neutral), and then the negative comments were further classified into two categories based on their level of negativity (toxic, bullying). The dialect for each comment was also added. This gives the dataset an advantage since it can be used for other purposes such as sentiment analysis and dialect identification, not just for cyberbullying detection. For the dataset to be regarded as a benchmark, Fleiss’s Kappa metric was adopted to measure the inter-annotator agreement (IAA), and the results show that the total Fleiss Kappa coefficient is = 0.869 with a p-value of 10-3, indicating near-perfect agreement among the three annotators. The application of DL to cyberbullying detection problems within Arabic text classification can be considered a novel approach due to the complexity of the problem and the tedious process involved, besides the scarcity of relevant research studies. Therefore, this study aims to evaluate several versions of Recurrent Neural Networks (RNNs) and Feedforward Neural Networks (FNNs) for detecting cyberbullying in the Arabic language. Although these algorithms are widely used in text classification and outperform the performance of classical classifiers, many have been extensively investigated in other domains such as sentiment analysis and dialect identification, as well as cyberbullying detection in English text. Hence, a comprehensive study focusing on Arabic cyberbullying can fill this gap in research. In this study, I conduct a performance evaluation and comparison for various DL algorithms (LSTM, GRU, LSTM-ATT, CNN-BLSTM, CNN-LSTM, CNN-BILSTM-LSTM, and LSTM-TCN) on different datasets of Arabic cyberbullying to obtain more precise and dependable findings. As a result of the models’ evaluation, a hybrid DL model is proposed that combines the best characteristics of the baseline models CNN, BLSTM and GRU for identifying cyberbullying. The proposed hybrid model improves the accuracy of all the studied datasets and can be integrated into different social media sites to automatically detect cyberbullying from Arabic social media posts. It has the potential to significantly reduce cyberbullying. Other results, related implications, and limitations, along with future research are also clarified and discussed.
id budr_63ba72a19858b2fbb0e002afede4fc60
identifier_str_mv 20181860
language_invalid_str_mv en
network_acronym_str budr
network_name_str The British University in Dubai repository
oai_identifier_str oai:bspace.buid.ac.ae:1234/2449
publishDate 2023
publisher.none.fl_str_mv The British University in Dubai (BUiD)
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling Cyberbullying Detection in Arabic Text using Deep LearningALBAYARI, REEM RAMADAN SA’IDIn the new era of digital communications, cyberbullying is a significant concern for society. Cyberbullying involves the use of communication technology and data, including messages, photographs, and videos, to undertake aggressive negative actions to harm others. This practice has spread substantially due to rapid technological development and has gained significant attention in several domains involving data exchange, such as e-commerce, digital marketing, social media platforms, and others. Cyberbullying can negatively impact stakeholders, and can vary from psychological to pathological, such as self-isolation, depression, and anxiety potentially leading to suicide. Hence, detecting any act of cyberbullying in an automated manner will be helpful for stakeholders to prevent any unfortunate results from the victim’s perspective. If conducted automatically, rather than relying on human moderators, the process will be faster, enabling the early detection of cyberbullying before severe harm is caused. Data-driven approaches, such as machine learning (ML), particularly deep learning (DL), have shown promising results. DL approaches provide highly accurate predictive models for detecting cyberbullying. The first contribution of this thesis is conducting an in-depth meta-analysis of existing evaluation methods, classification techniques, and datasets related to ML for cyberbullying problems. The meta-analysis shows that ML approaches, particularly DL, have not been extensively studied for the Arabic text classification of cyberbullying. A potential reason for this research gap is the lack of Arabic-language repositories focusing on cyberbullying despite the large amount of Arabic text that can be extracted from Arabic social media platforms besides e-commerce and mobile applications. Consequently, I have designed and built a new Arabic text repository, the largest available, that can serve me and others in investigating various classifiers to deal with the issue of detecting cyberbullying. This repository contains 200,000 comments, 46,898 of which were annotated by three human annotators. First, the comments were classified as (positive/negative/neutral), and then the negative comments were further classified into two categories based on their level of negativity (toxic, bullying). The dialect for each comment was also added. This gives the dataset an advantage since it can be used for other purposes such as sentiment analysis and dialect identification, not just for cyberbullying detection. For the dataset to be regarded as a benchmark, Fleiss’s Kappa metric was adopted to measure the inter-annotator agreement (IAA), and the results show that the total Fleiss Kappa coefficient is = 0.869 with a p-value of 10-3, indicating near-perfect agreement among the three annotators. The application of DL to cyberbullying detection problems within Arabic text classification can be considered a novel approach due to the complexity of the problem and the tedious process involved, besides the scarcity of relevant research studies. Therefore, this study aims to evaluate several versions of Recurrent Neural Networks (RNNs) and Feedforward Neural Networks (FNNs) for detecting cyberbullying in the Arabic language. Although these algorithms are widely used in text classification and outperform the performance of classical classifiers, many have been extensively investigated in other domains such as sentiment analysis and dialect identification, as well as cyberbullying detection in English text. Hence, a comprehensive study focusing on Arabic cyberbullying can fill this gap in research. In this study, I conduct a performance evaluation and comparison for various DL algorithms (LSTM, GRU, LSTM-ATT, CNN-BLSTM, CNN-LSTM, CNN-BILSTM-LSTM, and LSTM-TCN) on different datasets of Arabic cyberbullying to obtain more precise and dependable findings. As a result of the models’ evaluation, a hybrid DL model is proposed that combines the best characteristics of the baseline models CNN, BLSTM and GRU for identifying cyberbullying. The proposed hybrid model improves the accuracy of all the studied datasets and can be integrated into different social media sites to automatically detect cyberbullying from Arabic social media posts. It has the potential to significantly reduce cyberbullying. Other results, related implications, and limitations, along with future research are also clarified and discussed.The British University in Dubai (BUiD)Professor Sherief Abdallah2023-11-24T07:10:12Z2023-11-24T07:10:12Z2023-03Thesisapplication/pdf20181860https://bspace.buid.ac.ae/handle/1234/2449enoai:bspace.buid.ac.ae:1234/24492023-11-24T23:00:20Z
spellingShingle Cyberbullying Detection in Arabic Text using Deep Learning
ALBAYARI, REEM RAMADAN SA’ID
title Cyberbullying Detection in Arabic Text using Deep Learning
title_full Cyberbullying Detection in Arabic Text using Deep Learning
title_fullStr Cyberbullying Detection in Arabic Text using Deep Learning
title_full_unstemmed Cyberbullying Detection in Arabic Text using Deep Learning
title_short Cyberbullying Detection in Arabic Text using Deep Learning
title_sort Cyberbullying Detection in Arabic Text using Deep Learning
url https://bspace.buid.ac.ae/handle/1234/2449