Sentiment analysis for Arabizi in social media. (c2015)
With the vast increase of social media users over the past few years, millions of product reviews are discussed and posted in online forums and social media such as Facebook and Twitter. There are many applications for sentiment analysis and opinion mining in which governments or stock market observ...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| التنسيق: | masterThesis |
| منشور في: |
2015
|
| الموضوعات: | |
| الوصول للمادة أونلاين: | http://hdl.handle.net/10725/2702 https://doi.org/10.26756/th.2015.27 |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1864513459092193280 |
|---|---|
| author | Tobaili, Taha |
| author_facet | Tobaili, Taha |
| author_role | author |
| dc.creator.none.fl_str_mv | Tobaili, Taha |
| dc.date.none.fl_str_mv | 2015-11-27T11:10:31Z 2015-11-27T11:10:31Z 2016-02-02 8/26/2015 |
| dc.identifier.none.fl_str_mv | http://hdl.handle.net/10725/2702 https://doi.org/10.26756/th.2015.27 |
| dc.language.none.fl_str_mv | en |
| dc.publisher.none.fl_str_mv | Lebanese American University |
| dc.rights.*.fl_str_mv | info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Data mining -- Analysis Public opinion -- Data processing Natural language processing (Computer science) Arabic language -- Lexicology -- Data processing Web 2.0 -- Terminology Dissertations, Academic Lebanese American University -- Dissertations |
| dc.title.none.fl_str_mv | Sentiment analysis for Arabizi in social media. (c2015) |
| dc.type.none.fl_str_mv | Thesis info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/masterThesis |
| description | With the vast increase of social media users over the past few years, millions of product reviews are discussed and posted in online forums and social media such as Facebook and Twitter. There are many applications for sentiment analysis and opinion mining in which governments or stock market observers use social media data to study the opinion of the public and predict election results or stock fluctuations. This is also useful for companies to collect feedback on their product releases. Filling rating surveys is no longer efficient when we have a free growing database full of the public’s opinion. It is therefore intuitive to make use of the social media’s textual data to build an automated software that predicts the sentiment of the public; however the challenge arises in analyzing informal languages. Most sentiment analysis research and progress is currently conducted in formal English. One major challenge is applying sentiment analysis techniques onto other languages. With approximately four million tweets posted daily in several Arabizi dialects, an informal Arabic whereby sentences are written using English alpha numerals e.g. Yalla 7abibi, it is very useful to have a data mining tool that can analyze the sentiment of Twitter users in the Arab world. We took the initiative to make use of this abundance of data by analyzing it and predicting sentiment. Applying the same sentiment analysis techniques that are used on English for Arabic is not a simple task due to their semantic and structural differences, and because Arabic is a rich morphological language. Informal Arabic lacks standardization and has no grammar, thus sentimental analysis in this area is considered a complex process. Sentiment Analysis for Arabic has been studied for MSA (Modern Standard Arabic) but rarely for informal Arabic, and non-existent for Arabizi; whereas most of the youth in Lebanon text in Arabizi claiming that it is easier than texting in Arabic. The prevalence of this expanding linguistic trend motivated us to target this NLP challenge. In this study, we created a novel Lexicon of around 10,000 informal opinion words using regular expressions to match over 50,000 words. We also created an algorithm that lemmatizes Arabizi words, and classifies input sentences into positive, negative or neutral categories. We collected around 400,000 Lines of Arabizi data from Whatsapp, Facebook, and Twitter. We filtered them and tested a small sample across our classifier achieving 80% classification accuracy. The dialect chosen for the lexicon is Lebanese, our native language. |
| eu_rights_str_mv | openAccess |
| format | masterThesis |
| id | LAURepo_f1855fa4b8456ee4dc3c811e02fb9fc9 |
| language_invalid_str_mv | en |
| network_acronym_str | LAURepo |
| network_name_str | Lebanese American University repository |
| oai_identifier_str | oai:laur.lau.edu.lb:10725/2702 |
| publishDate | 2015 |
| publisher.none.fl_str_mv | Lebanese American University |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| spelling | Sentiment analysis for Arabizi in social media. (c2015)Tobaili, TahaData mining -- AnalysisPublic opinion -- Data processingNatural language processing (Computer science)Arabic language -- Lexicology -- Data processingWeb 2.0 -- TerminologyDissertations, AcademicLebanese American University -- DissertationsWith the vast increase of social media users over the past few years, millions of product reviews are discussed and posted in online forums and social media such as Facebook and Twitter. There are many applications for sentiment analysis and opinion mining in which governments or stock market observers use social media data to study the opinion of the public and predict election results or stock fluctuations. This is also useful for companies to collect feedback on their product releases. Filling rating surveys is no longer efficient when we have a free growing database full of the public’s opinion. It is therefore intuitive to make use of the social media’s textual data to build an automated software that predicts the sentiment of the public; however the challenge arises in analyzing informal languages. Most sentiment analysis research and progress is currently conducted in formal English. One major challenge is applying sentiment analysis techniques onto other languages. With approximately four million tweets posted daily in several Arabizi dialects, an informal Arabic whereby sentences are written using English alpha numerals e.g. Yalla 7abibi, it is very useful to have a data mining tool that can analyze the sentiment of Twitter users in the Arab world. We took the initiative to make use of this abundance of data by analyzing it and predicting sentiment. Applying the same sentiment analysis techniques that are used on English for Arabic is not a simple task due to their semantic and structural differences, and because Arabic is a rich morphological language. Informal Arabic lacks standardization and has no grammar, thus sentimental analysis in this area is considered a complex process. Sentiment Analysis for Arabic has been studied for MSA (Modern Standard Arabic) but rarely for informal Arabic, and non-existent for Arabizi; whereas most of the youth in Lebanon text in Arabizi claiming that it is easier than texting in Arabic. The prevalence of this expanding linguistic trend motivated us to target this NLP challenge. In this study, we created a novel Lexicon of around 10,000 informal opinion words using regular expressions to match over 50,000 words. We also created an algorithm that lemmatizes Arabizi words, and classifies input sentences into positive, negative or neutral categories. We collected around 400,000 Lines of Arabizi data from Whatsapp, Facebook, and Twitter. We filtered them and tested a small sample across our classifier achieving 80% classification accuracy. The dialect chosen for the lexicon is Lebanese, our native language.N/A1 hard copy: x, 65 leaves; ill., col. map; 30 cm. available at RNL.Bibliography: leaves 54-56.Lebanese American University2015-11-27T11:10:31Z2015-11-27T11:10:31Z8/26/20152016-02-02Thesisinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesishttp://hdl.handle.net/10725/2702https://doi.org/10.26756/th.2015.27eninfo:eu-repo/semantics/openAccessoai:laur.lau.edu.lb:10725/27022021-03-19T09:59:49Z |
| spellingShingle | Sentiment analysis for Arabizi in social media. (c2015) Tobaili, Taha Data mining -- Analysis Public opinion -- Data processing Natural language processing (Computer science) Arabic language -- Lexicology -- Data processing Web 2.0 -- Terminology Dissertations, Academic Lebanese American University -- Dissertations |
| status_str | publishedVersion |
| title | Sentiment analysis for Arabizi in social media. (c2015) |
| title_full | Sentiment analysis for Arabizi in social media. (c2015) |
| title_fullStr | Sentiment analysis for Arabizi in social media. (c2015) |
| title_full_unstemmed | Sentiment analysis for Arabizi in social media. (c2015) |
| title_short | Sentiment analysis for Arabizi in social media. (c2015) |
| title_sort | Sentiment analysis for Arabizi in social media. (c2015) |
| topic | Data mining -- Analysis Public opinion -- Data processing Natural language processing (Computer science) Arabic language -- Lexicology -- Data processing Web 2.0 -- Terminology Dissertations, Academic Lebanese American University -- Dissertations |
| url | http://hdl.handle.net/10725/2702 https://doi.org/10.26756/th.2015.27 |