Sentiment analysis for Arabizi in social media. (c2015)

With the vast increase of social media users over the past few years, millions of product reviews are discussed and posted in online forums and social media such as Facebook and Twitter. There are many applications for sentiment analysis and opinion mining in which governments or stock market observ...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Tobaili, Taha (author)
التنسيق: masterThesis
منشور في: 2015
الموضوعات:
الوصول للمادة أونلاين:http://hdl.handle.net/10725/2702
https://doi.org/10.26756/th.2015.27
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513459092193280
author Tobaili, Taha
author_facet Tobaili, Taha
author_role author
dc.creator.none.fl_str_mv Tobaili, Taha
dc.date.none.fl_str_mv 2015-11-27T11:10:31Z
2015-11-27T11:10:31Z
2016-02-02
8/26/2015
dc.identifier.none.fl_str_mv http://hdl.handle.net/10725/2702
https://doi.org/10.26756/th.2015.27
dc.language.none.fl_str_mv en
dc.publisher.none.fl_str_mv Lebanese American University
dc.rights.*.fl_str_mv info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Data mining -- Analysis
Public opinion -- Data processing
Natural language processing (Computer science)
Arabic language -- Lexicology -- Data processing
Web 2.0 -- Terminology
Dissertations, Academic
Lebanese American University -- Dissertations
dc.title.none.fl_str_mv Sentiment analysis for Arabizi in social media. (c2015)
dc.type.none.fl_str_mv Thesis
info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/masterThesis
description With the vast increase of social media users over the past few years, millions of product reviews are discussed and posted in online forums and social media such as Facebook and Twitter. There are many applications for sentiment analysis and opinion mining in which governments or stock market observers use social media data to study the opinion of the public and predict election results or stock fluctuations. This is also useful for companies to collect feedback on their product releases. Filling rating surveys is no longer efficient when we have a free growing database full of the public’s opinion. It is therefore intuitive to make use of the social media’s textual data to build an automated software that predicts the sentiment of the public; however the challenge arises in analyzing informal languages. Most sentiment analysis research and progress is currently conducted in formal English. One major challenge is applying sentiment analysis techniques onto other languages. With approximately four million tweets posted daily in several Arabizi dialects, an informal Arabic whereby sentences are written using English alpha numerals e.g. Yalla 7abibi, it is very useful to have a data mining tool that can analyze the sentiment of Twitter users in the Arab world. We took the initiative to make use of this abundance of data by analyzing it and predicting sentiment. Applying the same sentiment analysis techniques that are used on English for Arabic is not a simple task due to their semantic and structural differences, and because Arabic is a rich morphological language. Informal Arabic lacks standardization and has no grammar, thus sentimental analysis in this area is considered a complex process. Sentiment Analysis for Arabic has been studied for MSA (Modern Standard Arabic) but rarely for informal Arabic, and non-existent for Arabizi; whereas most of the youth in Lebanon text in Arabizi claiming that it is easier than texting in Arabic. The prevalence of this expanding linguistic trend motivated us to target this NLP challenge. In this study, we created a novel Lexicon of around 10,000 informal opinion words using regular expressions to match over 50,000 words. We also created an algorithm that lemmatizes Arabizi words, and classifies input sentences into positive, negative or neutral categories. We collected around 400,000 Lines of Arabizi data from Whatsapp, Facebook, and Twitter. We filtered them and tested a small sample across our classifier achieving 80% classification accuracy. The dialect chosen for the lexicon is Lebanese, our native language.
eu_rights_str_mv openAccess
format masterThesis
id LAURepo_f1855fa4b8456ee4dc3c811e02fb9fc9
language_invalid_str_mv en
network_acronym_str LAURepo
network_name_str Lebanese American University repository
oai_identifier_str oai:laur.lau.edu.lb:10725/2702
publishDate 2015
publisher.none.fl_str_mv Lebanese American University
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling Sentiment analysis for Arabizi in social media. (c2015)Tobaili, TahaData mining -- AnalysisPublic opinion -- Data processingNatural language processing (Computer science)Arabic language -- Lexicology -- Data processingWeb 2.0 -- TerminologyDissertations, AcademicLebanese American University -- DissertationsWith the vast increase of social media users over the past few years, millions of product reviews are discussed and posted in online forums and social media such as Facebook and Twitter. There are many applications for sentiment analysis and opinion mining in which governments or stock market observers use social media data to study the opinion of the public and predict election results or stock fluctuations. This is also useful for companies to collect feedback on their product releases. Filling rating surveys is no longer efficient when we have a free growing database full of the public’s opinion. It is therefore intuitive to make use of the social media’s textual data to build an automated software that predicts the sentiment of the public; however the challenge arises in analyzing informal languages. Most sentiment analysis research and progress is currently conducted in formal English. One major challenge is applying sentiment analysis techniques onto other languages. With approximately four million tweets posted daily in several Arabizi dialects, an informal Arabic whereby sentences are written using English alpha numerals e.g. Yalla 7abibi, it is very useful to have a data mining tool that can analyze the sentiment of Twitter users in the Arab world. We took the initiative to make use of this abundance of data by analyzing it and predicting sentiment. Applying the same sentiment analysis techniques that are used on English for Arabic is not a simple task due to their semantic and structural differences, and because Arabic is a rich morphological language. Informal Arabic lacks standardization and has no grammar, thus sentimental analysis in this area is considered a complex process. Sentiment Analysis for Arabic has been studied for MSA (Modern Standard Arabic) but rarely for informal Arabic, and non-existent for Arabizi; whereas most of the youth in Lebanon text in Arabizi claiming that it is easier than texting in Arabic. The prevalence of this expanding linguistic trend motivated us to target this NLP challenge. In this study, we created a novel Lexicon of around 10,000 informal opinion words using regular expressions to match over 50,000 words. We also created an algorithm that lemmatizes Arabizi words, and classifies input sentences into positive, negative or neutral categories. We collected around 400,000 Lines of Arabizi data from Whatsapp, Facebook, and Twitter. We filtered them and tested a small sample across our classifier achieving 80% classification accuracy. The dialect chosen for the lexicon is Lebanese, our native language.N/A1 hard copy: x, 65 leaves; ill., col. map; 30 cm. available at RNL.Bibliography: leaves 54-56.Lebanese American University2015-11-27T11:10:31Z2015-11-27T11:10:31Z8/26/20152016-02-02Thesisinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesishttp://hdl.handle.net/10725/2702https://doi.org/10.26756/th.2015.27eninfo:eu-repo/semantics/openAccessoai:laur.lau.edu.lb:10725/27022021-03-19T09:59:49Z
spellingShingle Sentiment analysis for Arabizi in social media. (c2015)
Tobaili, Taha
Data mining -- Analysis
Public opinion -- Data processing
Natural language processing (Computer science)
Arabic language -- Lexicology -- Data processing
Web 2.0 -- Terminology
Dissertations, Academic
Lebanese American University -- Dissertations
status_str publishedVersion
title Sentiment analysis for Arabizi in social media. (c2015)
title_full Sentiment analysis for Arabizi in social media. (c2015)
title_fullStr Sentiment analysis for Arabizi in social media. (c2015)
title_full_unstemmed Sentiment analysis for Arabizi in social media. (c2015)
title_short Sentiment analysis for Arabizi in social media. (c2015)
title_sort Sentiment analysis for Arabizi in social media. (c2015)
topic Data mining -- Analysis
Public opinion -- Data processing
Natural language processing (Computer science)
Arabic language -- Lexicology -- Data processing
Web 2.0 -- Terminology
Dissertations, Academic
Lebanese American University -- Dissertations
url http://hdl.handle.net/10725/2702
https://doi.org/10.26756/th.2015.27