Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models
This thesis addresses the crucial research problem of accurate detection and moderation of offensive language in Arabic text, considering the intricacies posed by the language's complex morphology, dialectal variations, orthographic ambiguity, orthographic noise, limited linguistic resources, a...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| منشور في: |
2023
|
| الموضوعات: | |
| الوصول للمادة أونلاين: | https://bspace.buid.ac.ae/handle/1234/2537 |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1862980618068951040 |
|---|---|
| author | WAHDAN, AHLAM MOHAMMAD |
| author_facet | WAHDAN, AHLAM MOHAMMAD |
| author_role | author |
| dc.contributor.none.fl_str_mv | Professor Khaled Shaalan Dr Mostafa AL-Emran |
| dc.creator.none.fl_str_mv | WAHDAN, AHLAM MOHAMMAD |
| dc.date.none.fl_str_mv | 2023-10 2024-03-19T05:59:02Z 2024-03-19T05:59:02Z |
| dc.format.none.fl_str_mv | application/pdf |
| dc.identifier.none.fl_str_mv | 20194461 https://bspace.buid.ac.ae/handle/1234/2537 |
| dc.language.none.fl_str_mv | en |
| dc.publisher.none.fl_str_mv | The British University in Dubai (BUiD) |
| dc.subject.none.fl_str_mv | Arabic offensive classification, preprocessing techniques, AraBERT, AraBERT preprocessing, ensemble methodology, deep learning, neural networks, LSTM, natural language processing, emoji interpretation |
| dc.title.none.fl_str_mv | Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models |
| dc.type.none.fl_str_mv | Dissertation |
| description | This thesis addresses the crucial research problem of accurate detection and moderation of offensive language in Arabic text, considering the intricacies posed by the language's complex morphology, dialectal variations, orthographic ambiguity, orthographic noise, limited linguistic resources, and the necessity for comprehensive coverage of offensive language expressions. The research objectives are delineated through four key research questions. Firstly, the study aims to identify the existing research gaps in Arabic Text Classification (ATC) through an extensive and rigorous systematic literature review. The study adopts a scholarly and formal approach, aiming to identify the specific areas within ATC research that lack comprehensive exploration or exhibit inadequacies in existing knowledge. This endeavor is grounded in the rigorous analysis and synthesis of relevant academic literature, ensuring a meticulous examination of the current state of research in ATC. Secondly, it investigates the effects of employing novel pre-processing methods on the performance of Arabic Text Classification. Thirdly, the research endeavors to determine the most effective model for enhancing the accuracy of Arabic offensive text classification by introducing a novel approach using pre-trained models; AraBERT model in conjunction with fully connected neural networks (NN) and long short-term memory (LSTM) networks. Finally, the study evaluates the proposed model's ability to classify Arabic offensive text effectively. The research methodology consists of two integral parts, comprising dataset description, the proposed framework. The dataset description provides insights into the two datasets utilized, namely OSACT and SEMEval. The framework elucidates the proposed model, which leverages a combination of pretrained models and neural networks, thereby achieving a high level of effectiveness in classifying Arabic offensive text. The model's performance is meticulously assessed using various evaluation metrics, including accuracy and F1-macro score, and is compared against other classifier models. The research findings demonstrate the superiority of the proposed model over the baseline AraBERT model, with the proposed model achieving an accuracy of 0.870 compared to the baseline accuracy of 0.820, along with an F1-score of 0.853 compared to the baseline's 0.800. This emphasizes the model's exceptional capacity to accurately identify offensive content in Arabic text. The implications of this research extend to diverse domains and stakeholders, encompassing decision makers, developers, and policy makers. The insights garnered from the study can be instrumental in making informed decisions pertaining to the integration of Arabic text classification systems in various operational settings. By comprehending the proposed model's performance and efficacy, decision makers can assess its potential impact on optimizing processes such as information retrieval, content filtering, and sentiment analysis in Arabic text. In conclusion, this thesis contributes significantly to the existing literature by addressing the complexities associated with offensive language identification in Arabic text and introducing an innovative approach that integrates pretrained models with deep learning techniques and neural networks. The demonstrated effectiveness and superior performance of the proposed model underscore its potential for practical implementation in real-world scenarios, thereby bolstering the field of Arabic offensive text classification. |
| id | budr_421e3b3ae7e06bd98db9bc762e746fb4 |
| identifier_str_mv | 20194461 |
| language_invalid_str_mv | en |
| network_acronym_str | budr |
| network_name_str | The British University in Dubai repository |
| oai_identifier_str | oai:bspace.buid.ac.ae:1234/2537 |
| publishDate | 2023 |
| publisher.none.fl_str_mv | The British University in Dubai (BUiD) |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| spelling | Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM ModelsWAHDAN, AHLAM MOHAMMADArabic offensive classification, preprocessing techniques, AraBERT, AraBERT preprocessing, ensemble methodology, deep learning, neural networks, LSTM, natural language processing, emoji interpretationThis thesis addresses the crucial research problem of accurate detection and moderation of offensive language in Arabic text, considering the intricacies posed by the language's complex morphology, dialectal variations, orthographic ambiguity, orthographic noise, limited linguistic resources, and the necessity for comprehensive coverage of offensive language expressions. The research objectives are delineated through four key research questions. Firstly, the study aims to identify the existing research gaps in Arabic Text Classification (ATC) through an extensive and rigorous systematic literature review. The study adopts a scholarly and formal approach, aiming to identify the specific areas within ATC research that lack comprehensive exploration or exhibit inadequacies in existing knowledge. This endeavor is grounded in the rigorous analysis and synthesis of relevant academic literature, ensuring a meticulous examination of the current state of research in ATC. Secondly, it investigates the effects of employing novel pre-processing methods on the performance of Arabic Text Classification. Thirdly, the research endeavors to determine the most effective model for enhancing the accuracy of Arabic offensive text classification by introducing a novel approach using pre-trained models; AraBERT model in conjunction with fully connected neural networks (NN) and long short-term memory (LSTM) networks. Finally, the study evaluates the proposed model's ability to classify Arabic offensive text effectively. The research methodology consists of two integral parts, comprising dataset description, the proposed framework. The dataset description provides insights into the two datasets utilized, namely OSACT and SEMEval. The framework elucidates the proposed model, which leverages a combination of pretrained models and neural networks, thereby achieving a high level of effectiveness in classifying Arabic offensive text. The model's performance is meticulously assessed using various evaluation metrics, including accuracy and F1-macro score, and is compared against other classifier models. The research findings demonstrate the superiority of the proposed model over the baseline AraBERT model, with the proposed model achieving an accuracy of 0.870 compared to the baseline accuracy of 0.820, along with an F1-score of 0.853 compared to the baseline's 0.800. This emphasizes the model's exceptional capacity to accurately identify offensive content in Arabic text. The implications of this research extend to diverse domains and stakeholders, encompassing decision makers, developers, and policy makers. The insights garnered from the study can be instrumental in making informed decisions pertaining to the integration of Arabic text classification systems in various operational settings. By comprehending the proposed model's performance and efficacy, decision makers can assess its potential impact on optimizing processes such as information retrieval, content filtering, and sentiment analysis in Arabic text. In conclusion, this thesis contributes significantly to the existing literature by addressing the complexities associated with offensive language identification in Arabic text and introducing an innovative approach that integrates pretrained models with deep learning techniques and neural networks. The demonstrated effectiveness and superior performance of the proposed model underscore its potential for practical implementation in real-world scenarios, thereby bolstering the field of Arabic offensive text classification.The British University in Dubai (BUiD)Professor Khaled ShaalanDr Mostafa AL-Emran2024-03-19T05:59:02Z2024-03-19T05:59:02Z2023-10Dissertationapplication/pdf20194461https://bspace.buid.ac.ae/handle/1234/2537enoai:bspace.buid.ac.ae:1234/25372024-03-19T23:00:46Z |
| spellingShingle | Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models WAHDAN, AHLAM MOHAMMAD Arabic offensive classification, preprocessing techniques, AraBERT, AraBERT preprocessing, ensemble methodology, deep learning, neural networks, LSTM, natural language processing, emoji interpretation |
| title | Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models |
| title_full | Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models |
| title_fullStr | Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models |
| title_full_unstemmed | Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models |
| title_short | Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models |
| title_sort | Enhancing Arabic Offensive Tweet Classification: An Ensemble Approach Integrating AraBERT, Neural Networks, and LSTM Models |
| topic | Arabic offensive classification, preprocessing techniques, AraBERT, AraBERT preprocessing, ensemble methodology, deep learning, neural networks, LSTM, natural language processing, emoji interpretation |
| url | https://bspace.buid.ac.ae/handle/1234/2537 |