The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

Sentiment Analysis is achieved by using Natural Language Processing (NLP) techniques and finds wide applications in analyzing social media content to determine people's opinions, attitudes, and emotions toward entities, individuals, issues, events, or topics. The accuracy of sentiment analysis...

Full description

Saved in:
Bibliographic Details
Main Author: Munem Nerabie, Abdul (author)
Other Authors: AlKhatib, Manar (author), Samuel Mathew, Sujith (author), El Barachi, May (author), Oroumchian, Farhad (author)
Published: 2021
Subjects:
Online Access:https://bspace.buid.ac.ae/handle/1234/3058
https://doi.org/10.1016/j.procs.2021.03.026.
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1862980615746355200
author Munem Nerabie, Abdul
author2 AlKhatib, Manar
Samuel Mathew, Sujith
El Barachi, May
Oroumchian, Farhad
author2_role author
author
author
author
author_facet Munem Nerabie, Abdul
AlKhatib, Manar
Samuel Mathew, Sujith
El Barachi, May
Oroumchian, Farhad
author_role author
dc.creator.none.fl_str_mv Munem Nerabie, Abdul
AlKhatib, Manar
Samuel Mathew, Sujith
El Barachi, May
Oroumchian, Farhad
dc.date.none.fl_str_mv 2021-03-26
2025-05-15T10:27:45Z
2025-05-15T10:27:45Z
dc.identifier.none.fl_str_mv Nerabie, A.M. et al. (2021) “The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach,” Procedia Computer Science, 184, pp. 148–155.
1877-0509
https://bspace.buid.ac.ae/handle/1234/3058
https://doi.org/10.1016/j.procs.2021.03.026.
dc.language.none.fl_str_mv en_US
dc.publisher.none.fl_str_mv Elsevier
dc.relation.none.fl_str_mv Procedia Computer Sciencev184 (2021): 148-155
dc.subject.none.fl_str_mv Sentiment Analysis; Part of Speech Tagging; Arabic Language; Dialect Arabic; Neural Network.
dc.title.none.fl_str_mv The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach
dc.type.none.fl_str_mv Article
description Sentiment Analysis is achieved by using Natural Language Processing (NLP) techniques and finds wide applications in analyzing social media content to determine people's opinions, attitudes, and emotions toward entities, individuals, issues, events, or topics. The accuracy of sentiment analysis depends on automatic Part-of-Speech (PoS) tagging which is required to label words according to grammatical categories. The challenge of analyzing the Arabic language has found considerable research interest, but now the challenge is amplified with the addition of social media dialects. While numerous morphological analyzers and PoS taggers were proposed for Modern Standard Arabic (MSA), we are now witnessing an increased interest in applying those techniques to the Arabic dialect that is prominent in social media. Indeed, social media texts (e.g. posts, comments, and replies) differ significantly from MSA texts in terms of vocabulary and grammatical structure. Such differences call for reviewing the PoS tagging methods to adapt social media texts. Furthermore, the lack of sufficiently large and diverse social media text corpora constitutes one of the reasons that automatic PoS tagging of social media content has been rarely studied. In this paper, we address those limitations by proposing a novel Arabic social media text corpus that is enriched with complete PoS information, including tags, lemmas, and synonyms. The proposed corpus constitutes the largest manually annotated Arabic corpus to date, with more than 5 million tokens, 238,600 MSA texts, and words from Arabic social media dialect, collected from 65,000 online users' accounts. Furthermore, our proposed corpus was used to train a custom Long Short-Term Memory deep learning model and showed excellent performance in terms of sentiment classification accuracy and F1-score. The obtained results demonstrate that the use of a diverse corpus that is enriched with PoS information significantly enhances the performance of social media analysis techniques and opens the door for advanced features such as opinion mining and emotion intelligence.
id budr_4aac030b280f52b828a496c10a27353b
identifier_str_mv Nerabie, A.M. et al. (2021) “The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach,” Procedia Computer Science, 184, pp. 148–155.
1877-0509
language_invalid_str_mv en_US
network_acronym_str budr
network_name_str The British University in Dubai repository
oai_identifier_str oai:bspace.buid.ac.ae:1234/3058
publishDate 2021
publisher.none.fl_str_mv Elsevier
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning ApproachMunem Nerabie, AbdulAlKhatib, ManarSamuel Mathew, SujithEl Barachi, MayOroumchian, FarhadSentiment Analysis; Part of Speech Tagging; Arabic Language; Dialect Arabic; Neural Network.Sentiment Analysis is achieved by using Natural Language Processing (NLP) techniques and finds wide applications in analyzing social media content to determine people's opinions, attitudes, and emotions toward entities, individuals, issues, events, or topics. The accuracy of sentiment analysis depends on automatic Part-of-Speech (PoS) tagging which is required to label words according to grammatical categories. The challenge of analyzing the Arabic language has found considerable research interest, but now the challenge is amplified with the addition of social media dialects. While numerous morphological analyzers and PoS taggers were proposed for Modern Standard Arabic (MSA), we are now witnessing an increased interest in applying those techniques to the Arabic dialect that is prominent in social media. Indeed, social media texts (e.g. posts, comments, and replies) differ significantly from MSA texts in terms of vocabulary and grammatical structure. Such differences call for reviewing the PoS tagging methods to adapt social media texts. Furthermore, the lack of sufficiently large and diverse social media text corpora constitutes one of the reasons that automatic PoS tagging of social media content has been rarely studied. In this paper, we address those limitations by proposing a novel Arabic social media text corpus that is enriched with complete PoS information, including tags, lemmas, and synonyms. The proposed corpus constitutes the largest manually annotated Arabic corpus to date, with more than 5 million tokens, 238,600 MSA texts, and words from Arabic social media dialect, collected from 65,000 online users' accounts. Furthermore, our proposed corpus was used to train a custom Long Short-Term Memory deep learning model and showed excellent performance in terms of sentiment classification accuracy and F1-score. The obtained results demonstrate that the use of a diverse corpus that is enriched with PoS information significantly enhances the performance of social media analysis techniques and opens the door for advanced features such as opinion mining and emotion intelligence.Elsevier2025-05-15T10:27:45Z2025-05-15T10:27:45Z2021-03-26ArticleNerabie, A.M. et al. (2021) “The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach,” Procedia Computer Science, 184, pp. 148–155.1877-0509https://bspace.buid.ac.ae/handle/1234/3058https://doi.org/10.1016/j.procs.2021.03.026.en_USProcedia Computer Sciencev184 (2021): 148-155oai:bspace.buid.ac.ae:1234/30582026-01-29T16:53:51Z
spellingShingle The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach
Munem Nerabie, Abdul
Sentiment Analysis; Part of Speech Tagging; Arabic Language; Dialect Arabic; Neural Network.
title The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach
title_full The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach
title_fullStr The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach
title_full_unstemmed The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach
title_short The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach
title_sort The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach
topic Sentiment Analysis; Part of Speech Tagging; Arabic Language; Dialect Arabic; Neural Network.
url https://bspace.buid.ac.ae/handle/1234/3058
https://doi.org/10.1016/j.procs.2021.03.026.