Intelligent Hybrid Feature Selection for Textual Sentiment Classification

<p dir="ltr">Sentiment Analysis (SA) aims to extract useful information from online Unstructured User-Generated Contents (UUGC) and classify them into positive and negative classes. State-of-the-art techniques for SA suffer a high dimensional feature space because of noisy and irrele...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Jawad Khan (6422669) (author)
مؤلفون آخرون: Aftab Alam (5158601) (author), Youngmoon Lee (19570210) (author)
منشور في: 2021
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513505725513728
author Jawad Khan (6422669)
author2 Aftab Alam (5158601)
Youngmoon Lee (19570210)
author2_role author
author
author_facet Jawad Khan (6422669)
Aftab Alam (5158601)
Youngmoon Lee (19570210)
author_role author
dc.creator.none.fl_str_mv Jawad Khan (6422669)
Aftab Alam (5158601)
Youngmoon Lee (19570210)
dc.date.none.fl_str_mv 2021-10-08T09:00:00Z
dc.identifier.none.fl_str_mv 10.1109/access.2021.3118982
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Intelligent_Hybrid_Feature_Selection_for_Textual_Sentiment_Classification/26976598
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Information and computing sciences
Machine learning
Feature extraction
Support vector machines
Motion pictures
Entropy
Sentiment analysis
Semantics
Social networking (online)
Sentiment classification
hybrid feature selection
ensemble learning
linguistic semantic rules
wide coverage sentiment lexicons
natural language processing
dc.title.none.fl_str_mv Intelligent Hybrid Feature Selection for Textual Sentiment Classification
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p dir="ltr">Sentiment Analysis (SA) aims to extract useful information from online Unstructured User-Generated Contents (UUGC) and classify them into positive and negative classes. State-of-the-art techniques for SA suffer a high dimensional feature space because of noisy and irrelevant features from the UUGC. Researchers have also proposed feature extraction and selection techniques to reduce high dimensional feature space, but they fall short in extracting and selecting the most effective sentiment features for sentiment model learning. Effective feature extraction and selection are significant for the SA because they can boost the learning algorithm’s predictive performance while reducing the high-dimensional feature space. To address these concerns, we propose an Intelligent Hybrid Feature Selection for Sentiment Analysis (IHFSSA) based on ensemble learning methods. IHFSSA first identifies sentiment features in the review text utilizing Penn Treebank part-of-speech tagset and integrated Wide Coverage Sentiment Lexicons (WCSL). The sentiment features subset is then selected employing a fast and simple rank-based ensemble of multiple filters feature selection method. The selected sentiment features are further refined by applying a wrapper-based backward feature selection method. Finally, for textual sentiment classification, the well-known classification algorithms Support Vector Machine (SVM), Naive Bayes (NB), Generalized Linear Model (GLM) are trained in the ensemble model on the refined sentiment feature set. The in-depth evaluation using heterogeneous domain benchmark datasets demonstrates that IHFSSA outperforms existing SA techniques.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" rel="noreferrer" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2021.3118982" target="_blank">https://dx.doi.org/10.1109/access.2021.3118982</a></p>
eu_rights_str_mv openAccess
id Manara2_f1d875b2a495cd2b59f37ac7cbef30ff
identifier_str_mv 10.1109/access.2021.3118982
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/26976598
publishDate 2021
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Intelligent Hybrid Feature Selection for Textual Sentiment ClassificationJawad Khan (6422669)Aftab Alam (5158601)Youngmoon Lee (19570210)Information and computing sciencesMachine learningFeature extractionSupport vector machinesMotion picturesEntropySentiment analysisSemanticsSocial networking (online)Sentiment classificationhybrid feature selectionensemble learninglinguistic semantic ruleswide coverage sentiment lexiconsnatural language processing<p dir="ltr">Sentiment Analysis (SA) aims to extract useful information from online Unstructured User-Generated Contents (UUGC) and classify them into positive and negative classes. State-of-the-art techniques for SA suffer a high dimensional feature space because of noisy and irrelevant features from the UUGC. Researchers have also proposed feature extraction and selection techniques to reduce high dimensional feature space, but they fall short in extracting and selecting the most effective sentiment features for sentiment model learning. Effective feature extraction and selection are significant for the SA because they can boost the learning algorithm’s predictive performance while reducing the high-dimensional feature space. To address these concerns, we propose an Intelligent Hybrid Feature Selection for Sentiment Analysis (IHFSSA) based on ensemble learning methods. IHFSSA first identifies sentiment features in the review text utilizing Penn Treebank part-of-speech tagset and integrated Wide Coverage Sentiment Lexicons (WCSL). The sentiment features subset is then selected employing a fast and simple rank-based ensemble of multiple filters feature selection method. The selected sentiment features are further refined by applying a wrapper-based backward feature selection method. Finally, for textual sentiment classification, the well-known classification algorithms Support Vector Machine (SVM), Naive Bayes (NB), Generalized Linear Model (GLM) are trained in the ensemble model on the refined sentiment feature set. The in-depth evaluation using heterogeneous domain benchmark datasets demonstrates that IHFSSA outperforms existing SA techniques.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" rel="noreferrer" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2021.3118982" target="_blank">https://dx.doi.org/10.1109/access.2021.3118982</a></p>2021-10-08T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2021.3118982https://figshare.com/articles/journal_contribution/Intelligent_Hybrid_Feature_Selection_for_Textual_Sentiment_Classification/26976598CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/269765982021-10-08T09:00:00Z
spellingShingle Intelligent Hybrid Feature Selection for Textual Sentiment Classification
Jawad Khan (6422669)
Information and computing sciences
Machine learning
Feature extraction
Support vector machines
Motion pictures
Entropy
Sentiment analysis
Semantics
Social networking (online)
Sentiment classification
hybrid feature selection
ensemble learning
linguistic semantic rules
wide coverage sentiment lexicons
natural language processing
status_str publishedVersion
title Intelligent Hybrid Feature Selection for Textual Sentiment Classification
title_full Intelligent Hybrid Feature Selection for Textual Sentiment Classification
title_fullStr Intelligent Hybrid Feature Selection for Textual Sentiment Classification
title_full_unstemmed Intelligent Hybrid Feature Selection for Textual Sentiment Classification
title_short Intelligent Hybrid Feature Selection for Textual Sentiment Classification
title_sort Intelligent Hybrid Feature Selection for Textual Sentiment Classification
topic Information and computing sciences
Machine learning
Feature extraction
Support vector machines
Motion pictures
Entropy
Sentiment analysis
Semantics
Social networking (online)
Sentiment classification
hybrid feature selection
ensemble learning
linguistic semantic rules
wide coverage sentiment lexicons
natural language processing