Cyberbullying Detection in Arabic Text using Deep Learning

In the new era of digital communications, cyberbullying is a significant concern for society. Cyberbullying involves the use of communication technology and data, including messages, photographs, and videos, to undertake aggressive negative actions to harm others. This practice has spread substantia...

Full description

Saved in:

Bibliographic Details
Main Author:	ALBAYARI, REEM RAMADAN SA’ID (author)
Published:	2023
Online Access:	https://bspace.buid.ac.ae/handle/1234/2449
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1862980611712483328
author	ALBAYARI, REEM RAMADAN SA’ID
author_facet	ALBAYARI, REEM RAMADAN SA’ID
author_role	author
dc.contributor.none.fl_str_mv	Professor Sherief Abdallah
dc.creator.none.fl_str_mv	ALBAYARI, REEM RAMADAN SA’ID
dc.date.none.fl_str_mv	2023-11-24T07:10:12Z 2023-11-24T07:10:12Z 2023-03
dc.format.none.fl_str_mv	application/pdf
dc.identifier.none.fl_str_mv	20181860 https://bspace.buid.ac.ae/handle/1234/2449
dc.language.none.fl_str_mv	en
dc.publisher.none.fl_str_mv	The British University in Dubai (BUiD)
dc.title.none.fl_str_mv	Cyberbullying Detection in Arabic Text using Deep Learning
dc.type.none.fl_str_mv	Thesis
description	In the new era of digital communications, cyberbullying is a significant concern for society. Cyberbullying involves the use of communication technology and data, including messages, photographs, and videos, to undertake aggressive negative actions to harm others. This practice has spread substantially due to rapid technological development and has gained significant attention in several domains involving data exchange, such as e-commerce, digital marketing, social media platforms, and others. Cyberbullying can negatively impact stakeholders, and can vary from psychological to pathological, such as self-isolation, depression, and anxiety potentially leading to suicide. Hence, detecting any act of cyberbullying in an automated manner will be helpful for stakeholders to prevent any unfortunate results from the victim’s perspective. If conducted automatically, rather than relying on human moderators, the process will be faster, enabling the early detection of cyberbullying before severe harm is caused. Data-driven approaches, such as machine learning (ML), particularly deep learning (DL), have shown promising results. DL approaches provide highly accurate predictive models for detecting cyberbullying. The first contribution of this thesis is conducting an in-depth meta-analysis of existing evaluation methods, classification techniques, and datasets related to ML for cyberbullying problems. The meta-analysis shows that ML approaches, particularly DL, have not been extensively studied for the Arabic text classification of cyberbullying. A potential reason for this research gap is the lack of Arabic-language repositories focusing on cyberbullying despite the large amount of Arabic text that can be extracted from Arabic social media platforms besides e-commerce and mobile applications. Consequently, I have designed and built a new Arabic text repository, the largest available, that can serve me and others in investigating various classifiers to deal with the issue of detecting cyberbullying. This repository contains 200,000 comments, 46,898 of which were annotated by three human annotators. First, the comments were classified as (positive/negative/neutral), and then the negative comments were further classified into two categories based on their level of negativity (toxic, bullying). The dialect for each comment was also added. This gives the dataset an advantage since it can be used for other purposes such as sentiment analysis and dialect identification, not just for cyberbullying detection. For the dataset to be regarded as a benchmark, Fleiss’s Kappa metric was adopted to measure the inter-annotator agreement (IAA), and the results show that the total Fleiss Kappa coefficient is = 0.869 with a p-value of 10-3, indicating near-perfect agreement among the three annotators. The application of DL to cyberbullying detection problems within Arabic text classification can be considered a novel approach due to the complexity of the problem and the tedious process involved, besides the scarcity of relevant research studies. Therefore, this study aims to evaluate several versions of Recurrent Neural Networks (RNNs) and Feedforward Neural Networks (FNNs) for detecting cyberbullying in the Arabic language. Although these algorithms are widely used in text classification and outperform the performance of classical classifiers, many have been extensively investigated in other domains such as sentiment analysis and dialect identification, as well as cyberbullying detection in English text. Hence, a comprehensive study focusing on Arabic cyberbullying can fill this gap in research. In this study, I conduct a performance evaluation and comparison for various DL algorithms (LSTM, GRU, LSTM-ATT, CNN-BLSTM, CNN-LSTM, CNN-BILSTM-LSTM, and LSTM-TCN) on different datasets of Arabic cyberbullying to obtain more precise and dependable findings. As a result of the models’ evaluation, a hybrid DL model is proposed that combines the best characteristics of the baseline models CNN, BLSTM and GRU for identifying cyberbullying. The proposed hybrid model improves the accuracy of all the studied datasets and can be integrated into different social media sites to automatically detect cyberbullying from Arabic social media posts. It has the potential to significantly reduce cyberbullying. Other results, related implications, and limitations, along with future research are also clarified and discussed.
id	budr_63ba72a19858b2fbb0e002afede4fc60
identifier_str_mv	20181860
language_invalid_str_mv	en
network_acronym_str	budr
network_name_str	The British University in Dubai repository
oai_identifier_str	oai:bspace.buid.ac.ae:1234/2449
publishDate	2023
publisher.none.fl_str_mv	The British University in Dubai (BUiD)
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling	Cyberbullying Detection in Arabic Text using Deep LearningALBAYARI, REEM RAMADAN SA’IDIn the new era of digital communications, cyberbullying is a significant concern for society. Cyberbullying involves the use of communication technology and data, including messages, photographs, and videos, to undertake aggressive negative actions to harm others. This practice has spread substantially due to rapid technological development and has gained significant attention in several domains involving data exchange, such as e-commerce, digital marketing, social media platforms, and others. Cyberbullying can negatively impact stakeholders, and can vary from psychological to pathological, such as self-isolation, depression, and anxiety potentially leading to suicide. Hence, detecting any act of cyberbullying in an automated manner will be helpful for stakeholders to prevent any unfortunate results from the victim’s perspective. If conducted automatically, rather than relying on human moderators, the process will be faster, enabling the early detection of cyberbullying before severe harm is caused. Data-driven approaches, such as machine learning (ML), particularly deep learning (DL), have shown promising results. DL approaches provide highly accurate predictive models for detecting cyberbullying. The first contribution of this thesis is conducting an in-depth meta-analysis of existing evaluation methods, classification techniques, and datasets related to ML for cyberbullying problems. The meta-analysis shows that ML approaches, particularly DL, have not been extensively studied for the Arabic text classification of cyberbullying. A potential reason for this research gap is the lack of Arabic-language repositories focusing on cyberbullying despite the large amount of Arabic text that can be extracted from Arabic social media platforms besides e-commerce and mobile applications. Consequently, I have designed and built a new Arabic text repository, the largest available, that can serve me and others in investigating various classifiers to deal with the issue of detecting cyberbullying. This repository contains 200,000 comments, 46,898 of which were annotated by three human annotators. First, the comments were classified as (positive/negative/neutral), and then the negative comments were further classified into two categories based on their level of negativity (toxic, bullying). The dialect for each comment was also added. This gives the dataset an advantage since it can be used for other purposes such as sentiment analysis and dialect identification, not just for cyberbullying detection. For the dataset to be regarded as a benchmark, Fleiss’s Kappa metric was adopted to measure the inter-annotator agreement (IAA), and the results show that the total Fleiss Kappa coefficient is = 0.869 with a p-value of 10-3, indicating near-perfect agreement among the three annotators. The application of DL to cyberbullying detection problems within Arabic text classification can be considered a novel approach due to the complexity of the problem and the tedious process involved, besides the scarcity of relevant research studies. Therefore, this study aims to evaluate several versions of Recurrent Neural Networks (RNNs) and Feedforward Neural Networks (FNNs) for detecting cyberbullying in the Arabic language. Although these algorithms are widely used in text classification and outperform the performance of classical classifiers, many have been extensively investigated in other domains such as sentiment analysis and dialect identification, as well as cyberbullying detection in English text. Hence, a comprehensive study focusing on Arabic cyberbullying can fill this gap in research. In this study, I conduct a performance evaluation and comparison for various DL algorithms (LSTM, GRU, LSTM-ATT, CNN-BLSTM, CNN-LSTM, CNN-BILSTM-LSTM, and LSTM-TCN) on different datasets of Arabic cyberbullying to obtain more precise and dependable findings. As a result of the models’ evaluation, a hybrid DL model is proposed that combines the best characteristics of the baseline models CNN, BLSTM and GRU for identifying cyberbullying. The proposed hybrid model improves the accuracy of all the studied datasets and can be integrated into different social media sites to automatically detect cyberbullying from Arabic social media posts. It has the potential to significantly reduce cyberbullying. Other results, related implications, and limitations, along with future research are also clarified and discussed.The British University in Dubai (BUiD)Professor Sherief Abdallah2023-11-24T07:10:12Z2023-11-24T07:10:12Z2023-03Thesisapplication/pdf20181860https://bspace.buid.ac.ae/handle/1234/2449enoai:bspace.buid.ac.ae:1234/24492023-11-24T23:00:20Z
spellingShingle	Cyberbullying Detection in Arabic Text using Deep Learning ALBAYARI, REEM RAMADAN SA’ID
title	Cyberbullying Detection in Arabic Text using Deep Learning
title_full	Cyberbullying Detection in Arabic Text using Deep Learning
title_fullStr	Cyberbullying Detection in Arabic Text using Deep Learning
title_full_unstemmed	Cyberbullying Detection in Arabic Text using Deep Learning
title_short	Cyberbullying Detection in Arabic Text using Deep Learning
title_sort	Cyberbullying Detection in Arabic Text using Deep Learning
url	https://bspace.buid.ac.ae/handle/1234/2449

Cyberbullying Detection in Arabic Text using Deep Learning

Similar Items