Supervised term-category feature weighting for improved text classification

Text classification is a central task in Natural Language Processing (NLP) that aims at categorizing text documents into predefined classes or categories. It requires appropriate features to describe the contents and meaning of text documents, and map them with their target categories. Existing text...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Attieh, Joseph (author)
مؤلفون آخرون: Tekli, Joe (author)
التنسيق: article
منشور في: 2022
الوصول للمادة أونلاين:http://hdl.handle.net/10725/15996
https://doi.org/10.1016/j.knosys.2022.110215
http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php
https://www.sciencedirect.com/science/article/pii/S0950705122013119
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513471844974592
author Attieh, Joseph
author2 Tekli, Joe
author2_role author
author_facet Attieh, Joseph
Tekli, Joe
author_role author
dc.creator.none.fl_str_mv Attieh, Joseph
Tekli, Joe
dc.date.none.fl_str_mv 2022-12-28
2023
2024-08-20T10:08:25Z
2024-08-20T10:08:25Z
dc.identifier.none.fl_str_mv 0950-7051
http://hdl.handle.net/10725/15996
https://doi.org/10.1016/j.knosys.2022.110215
Attieh, J., & Tekli, J. (2023). Supervised term-category feature weighting for improved text classification. Knowledge-Based Systems, 261, 110215.
http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php
https://www.sciencedirect.com/science/article/pii/S0950705122013119
dc.language.none.fl_str_mv en
dc.relation.none.fl_str_mv Knowledge-Based Systems
dc.rights.*.fl_str_mv info:eu-repo/semantics/openAccess
dc.title.none.fl_str_mv Supervised term-category feature weighting for improved text classification
dc.type.none.fl_str_mv Article
info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/article
description Text classification is a central task in Natural Language Processing (NLP) that aims at categorizing text documents into predefined classes or categories. It requires appropriate features to describe the contents and meaning of text documents, and map them with their target categories. Existing text feature representations rely on a weighted representation of the document terms. Hence, choosing a suitable method for term weighting is of major importance and can help increase the effectiveness of the classification task. In this study, we provide a novel text classification framework for Category-based Feature Engineering titled CFE. It consists of a supervised weighting scheme defined based on a variant of the TF-ICF (Term Frequency-Inverse Category Frequency) model, embedded into three new lean classification approaches: (i) IterativeAdditive (flat), (ii) GradientDescentANN (1-layered), and (iii) FeedForwardANN (2-layered). The IterativeAdditive approach augments each document representation with a set of synthetic features inferred from TF-ICF category representations. It builds a term-category TF-ICF matrix using an iterative and additive algorithm that produces category vector representations and updates until reaching convergence. GradientDescentANN replaces the iterative additive process mentioned previously by computing the term-category matrix using a gradient descent ANN model. Training the ANN using the gradient descent algorithm allows updating the term-category matrix until reaching convergence. FeedForwardANN uses a feed-forward ANN model to transform document representations into the category vector space. The transformed document vectors are then compared with the target category vectors, and are associated with the most similar categories. We have implemented CFE including its three classification approaches, and we have conducted a large battery of tests to evaluate their performance. Experimental results on five benchmark datasets show that our lean approaches mostly improve text classification accuracy while requiring significantly less computation time compared with their deep model alternatives.
eu_rights_str_mv openAccess
format article
id LAURepo_45f1235dbd4865c225e3a23b0fb0bd8a
identifier_str_mv 0950-7051
Attieh, J., & Tekli, J. (2023). Supervised term-category feature weighting for improved text classification. Knowledge-Based Systems, 261, 110215.
language_invalid_str_mv en
network_acronym_str LAURepo
network_name_str Lebanese American University repository
oai_identifier_str oai:laur.lau.edu.lb:10725/15996
publishDate 2022
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling Supervised term-category feature weighting for improved text classificationAttieh, JosephTekli, JoeText classification is a central task in Natural Language Processing (NLP) that aims at categorizing text documents into predefined classes or categories. It requires appropriate features to describe the contents and meaning of text documents, and map them with their target categories. Existing text feature representations rely on a weighted representation of the document terms. Hence, choosing a suitable method for term weighting is of major importance and can help increase the effectiveness of the classification task. In this study, we provide a novel text classification framework for Category-based Feature Engineering titled CFE. It consists of a supervised weighting scheme defined based on a variant of the TF-ICF (Term Frequency-Inverse Category Frequency) model, embedded into three new lean classification approaches: (i) IterativeAdditive (flat), (ii) GradientDescentANN (1-layered), and (iii) FeedForwardANN (2-layered). The IterativeAdditive approach augments each document representation with a set of synthetic features inferred from TF-ICF category representations. It builds a term-category TF-ICF matrix using an iterative and additive algorithm that produces category vector representations and updates until reaching convergence. GradientDescentANN replaces the iterative additive process mentioned previously by computing the term-category matrix using a gradient descent ANN model. Training the ANN using the gradient descent algorithm allows updating the term-category matrix until reaching convergence. FeedForwardANN uses a feed-forward ANN model to transform document representations into the category vector space. The transformed document vectors are then compared with the target category vectors, and are associated with the most similar categories. We have implemented CFE including its three classification approaches, and we have conducted a large battery of tests to evaluate their performance. Experimental results on five benchmark datasets show that our lean approaches mostly improve text classification accuracy while requiring significantly less computation time compared with their deep model alternatives.Published2024-08-20T10:08:25Z2024-08-20T10:08:25Z20232022-12-28Articleinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article0950-7051http://hdl.handle.net/10725/15996https://doi.org/10.1016/j.knosys.2022.110215Attieh, J., & Tekli, J. (2023). Supervised term-category feature weighting for improved text classification. Knowledge-Based Systems, 261, 110215.http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.phphttps://www.sciencedirect.com/science/article/pii/S0950705122013119enKnowledge-Based Systemsinfo:eu-repo/semantics/openAccessoai:laur.lau.edu.lb:10725/159962024-08-20T10:34:11Z
spellingShingle Supervised term-category feature weighting for improved text classification
Attieh, Joseph
status_str publishedVersion
title Supervised term-category feature weighting for improved text classification
title_full Supervised term-category feature weighting for improved text classification
title_fullStr Supervised term-category feature weighting for improved text classification
title_full_unstemmed Supervised term-category feature weighting for improved text classification
title_short Supervised term-category feature weighting for improved text classification
title_sort Supervised term-category feature weighting for improved text classification
url http://hdl.handle.net/10725/15996
https://doi.org/10.1016/j.knosys.2022.110215
http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php
https://www.sciencedirect.com/science/article/pii/S0950705122013119