Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models

This thesis addresses the crucial research problem of accurate detection and moderation of offensive language in Arabic text, considering the intricacies posed by the language's complex morphology, dialectal variations, orthographic ambiguity, orthographic noise, limited linguistic resources, a...

Full description

Saved in:

Bibliographic Details
Main Author:	WAHDAN, AHLAM MOHAMMAD (author)
Published:	2023
Subjects:	Arabic offensive classification, preprocessing techniques, AraBERT, AraBERT preprocessing, ensemble methodology, deep learning, neural networks, LSTM, natural language processing, emoji interpretation
Online Access:	https://bspace.buid.ac.ae/handle/1234/2537
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1862980618068951040
author	WAHDAN, AHLAM MOHAMMAD
author_facet	WAHDAN, AHLAM MOHAMMAD
author_role	author
dc.contributor.none.fl_str_mv	Professor Khaled Shaalan Dr Mostafa AL-Emran
dc.creator.none.fl_str_mv	WAHDAN, AHLAM MOHAMMAD
dc.date.none.fl_str_mv	2023-10 2024-03-19T05:59:02Z 2024-03-19T05:59:02Z
dc.format.none.fl_str_mv	application/pdf
dc.identifier.none.fl_str_mv	20194461 https://bspace.buid.ac.ae/handle/1234/2537
dc.language.none.fl_str_mv	en
dc.publisher.none.fl_str_mv	The British University in Dubai (BUiD)
dc.subject.none.fl_str_mv	Arabic offensive classification, preprocessing techniques, AraBERT, AraBERT preprocessing, ensemble methodology, deep learning, neural networks, LSTM, natural language processing, emoji interpretation
dc.title.none.fl_str_mv	Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models
dc.type.none.fl_str_mv	Dissertation
description	This thesis addresses the crucial research problem of accurate detection and moderation of offensive language in Arabic text, considering the intricacies posed by the language's complex morphology, dialectal variations, orthographic ambiguity, orthographic noise, limited linguistic resources, and the necessity for comprehensive coverage of offensive language expressions. The research objectives are delineated through four key research questions. Firstly, the study aims to identify the existing research gaps in Arabic Text Classification (ATC) through an extensive and rigorous systematic literature review. The study adopts a scholarly and formal approach, aiming to identify the specific areas within ATC research that lack comprehensive exploration or exhibit inadequacies in existing knowledge. This endeavor is grounded in the rigorous analysis and synthesis of relevant academic literature, ensuring a meticulous examination of the current state of research in ATC. Secondly, it investigates the effects of employing novel pre-processing methods on the performance of Arabic Text Classification. Thirdly, the research endeavors to determine the most effective model for enhancing the accuracy of Arabic offensive text classification by introducing a novel approach using pre-trained models; AraBERT model in conjunction with fully connected neural networks (NN) and long short-term memory (LSTM) networks. Finally, the study evaluates the proposed model's ability to classify Arabic offensive text effectively. The research methodology consists of two integral parts, comprising dataset description, the proposed framework. The dataset description provides insights into the two datasets utilized, namely OSACT and SEMEval. The framework elucidates the proposed model, which leverages a combination of pretrained models and neural networks, thereby achieving a high level of effectiveness in classifying Arabic offensive text. The model's performance is meticulously assessed using various evaluation metrics, including accuracy and F1-macro score, and is compared against other classifier models. The research findings demonstrate the superiority of the proposed model over the baseline AraBERT model, with the proposed model achieving an accuracy of 0.870 compared to the baseline accuracy of 0.820, along with an F1-score of 0.853 compared to the baseline's 0.800. This emphasizes the model's exceptional capacity to accurately identify offensive content in Arabic text. The implications of this research extend to diverse domains and stakeholders, encompassing decision makers, developers, and policy makers. The insights garnered from the study can be instrumental in making informed decisions pertaining to the integration of Arabic text classification systems in various operational settings. By comprehending the proposed model's performance and efficacy, decision makers can assess its potential impact on optimizing processes such as information retrieval, content filtering, and sentiment analysis in Arabic text. In conclusion, this thesis contributes significantly to the existing literature by addressing the complexities associated with offensive language identification in Arabic text and introducing an innovative approach that integrates pretrained models with deep learning techniques and neural networks. The demonstrated effectiveness and superior performance of the proposed model underscore its potential for practical implementation in real-world scenarios, thereby bolstering the field of Arabic offensive text classification.
id	budr_421e3b3ae7e06bd98db9bc762e746fb4
identifier_str_mv	20194461
language_invalid_str_mv	en
network_acronym_str	budr
network_name_str	The British University in Dubai repository
oai_identifier_str	oai:bspace.buid.ac.ae:1234/2537
publishDate	2023
publisher.none.fl_str_mv	The British University in Dubai (BUiD)
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling	Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM ModelsWAHDAN, AHLAM MOHAMMADArabic offensive classification, preprocessing techniques, AraBERT, AraBERT preprocessing, ensemble methodology, deep learning, neural networks, LSTM, natural language processing, emoji interpretationThis thesis addresses the crucial research problem of accurate detection and moderation of offensive language in Arabic text, considering the intricacies posed by the language's complex morphology, dialectal variations, orthographic ambiguity, orthographic noise, limited linguistic resources, and the necessity for comprehensive coverage of offensive language expressions. The research objectives are delineated through four key research questions. Firstly, the study aims to identify the existing research gaps in Arabic Text Classification (ATC) through an extensive and rigorous systematic literature review. The study adopts a scholarly and formal approach, aiming to identify the specific areas within ATC research that lack comprehensive exploration or exhibit inadequacies in existing knowledge. This endeavor is grounded in the rigorous analysis and synthesis of relevant academic literature, ensuring a meticulous examination of the current state of research in ATC. Secondly, it investigates the effects of employing novel pre-processing methods on the performance of Arabic Text Classification. Thirdly, the research endeavors to determine the most effective model for enhancing the accuracy of Arabic offensive text classification by introducing a novel approach using pre-trained models; AraBERT model in conjunction with fully connected neural networks (NN) and long short-term memory (LSTM) networks. Finally, the study evaluates the proposed model's ability to classify Arabic offensive text effectively. The research methodology consists of two integral parts, comprising dataset description, the proposed framework. The dataset description provides insights into the two datasets utilized, namely OSACT and SEMEval. The framework elucidates the proposed model, which leverages a combination of pretrained models and neural networks, thereby achieving a high level of effectiveness in classifying Arabic offensive text. The model's performance is meticulously assessed using various evaluation metrics, including accuracy and F1-macro score, and is compared against other classifier models. The research findings demonstrate the superiority of the proposed model over the baseline AraBERT model, with the proposed model achieving an accuracy of 0.870 compared to the baseline accuracy of 0.820, along with an F1-score of 0.853 compared to the baseline's 0.800. This emphasizes the model's exceptional capacity to accurately identify offensive content in Arabic text. The implications of this research extend to diverse domains and stakeholders, encompassing decision makers, developers, and policy makers. The insights garnered from the study can be instrumental in making informed decisions pertaining to the integration of Arabic text classification systems in various operational settings. By comprehending the proposed model's performance and efficacy, decision makers can assess its potential impact on optimizing processes such as information retrieval, content filtering, and sentiment analysis in Arabic text. In conclusion, this thesis contributes significantly to the existing literature by addressing the complexities associated with offensive language identification in Arabic text and introducing an innovative approach that integrates pretrained models with deep learning techniques and neural networks. The demonstrated effectiveness and superior performance of the proposed model underscore its potential for practical implementation in real-world scenarios, thereby bolstering the field of Arabic offensive text classification.The British University in Dubai (BUiD)Professor Khaled ShaalanDr Mostafa AL-Emran2024-03-19T05:59:02Z2024-03-19T05:59:02Z2023-10Dissertationapplication/pdf20194461https://bspace.buid.ac.ae/handle/1234/2537enoai:bspace.buid.ac.ae:1234/25372024-03-19T23:00:46Z
spellingShingle	Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models WAHDAN, AHLAM MOHAMMAD Arabic offensive classification, preprocessing techniques, AraBERT, AraBERT preprocessing, ensemble methodology, deep learning, neural networks, LSTM, natural language processing, emoji interpretation
title	Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models
title_full	Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models
title_fullStr	Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models
title_full_unstemmed	Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models
title_short	Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models
title_sort	Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models
topic	Arabic offensive classification, preprocessing techniques, AraBERT, AraBERT preprocessing, ensemble methodology, deep learning, neural networks, LSTM, natural language processing, emoji interpretation
url	https://bspace.buid.ac.ae/handle/1234/2537

Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models

Similar Items