Optimizing Document Classification: Unleashing the Power of Genetic Algorithms

<p dir="ltr">Many individuals, including researchers, professors, and students, encounter difficulties when searching for scholarly documents, papers, and journals within a specific domain. Consequently, scholars have begun to focus on document classification problem, offering variou...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Ghulam Mustafa (458105) (author)
مؤلفون آخرون: Abid Rauf (17541708) (author), Ahmad Sami Al-Shamayleh (17541495) (author), Muhammad Sulaiman (9106025) (author), Wagdi Alrawagfeh (17271664) (author), Muhammad Tanvir Afzal (4162504) (author), Adnan Akhunzada (3134064) (author)
منشور في: 2023
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513527493951488
author Ghulam Mustafa (458105)
author2 Abid Rauf (17541708)
Ahmad Sami Al-Shamayleh (17541495)
Muhammad Sulaiman (9106025)
Wagdi Alrawagfeh (17271664)
Muhammad Tanvir Afzal (4162504)
Adnan Akhunzada (3134064)
author2_role author
author
author
author
author
author
author_facet Ghulam Mustafa (458105)
Abid Rauf (17541708)
Ahmad Sami Al-Shamayleh (17541495)
Muhammad Sulaiman (9106025)
Wagdi Alrawagfeh (17271664)
Muhammad Tanvir Afzal (4162504)
Adnan Akhunzada (3134064)
author_role author
dc.creator.none.fl_str_mv Ghulam Mustafa (458105)
Abid Rauf (17541708)
Ahmad Sami Al-Shamayleh (17541495)
Muhammad Sulaiman (9106025)
Wagdi Alrawagfeh (17271664)
Muhammad Tanvir Afzal (4162504)
Adnan Akhunzada (3134064)
dc.date.none.fl_str_mv 2023-07-04T06:00:00Z
dc.identifier.none.fl_str_mv 10.1109/access.2023.3292248
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Optimizing_Document_Classification_Unleashing_the_Power_of_Genetic_Algorithms/25205225
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Engineering
Electrical engineering
Electronics, sensors and digital hardware
Materials engineering
Metadata
Feature extraction
Bit error rate
Support vector machines
Genetic algorithms
Classification algorithms
Semantics
Document classification (DC)
Word2Vector (W2V)
bag of word (BOW)
term frequency (TF)
association for computing machinery (ACM)
machine learning (ML)
dc.title.none.fl_str_mv Optimizing Document Classification: Unleashing the Power of Genetic Algorithms
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p dir="ltr">Many individuals, including researchers, professors, and students, encounter difficulties when searching for scholarly documents, papers, and journals within a specific domain. Consequently, scholars have begun to focus on document classification problem, offering various methods to address this issue. Researchers have utilized diverse data sources, such as citations, metadata, content, and hybrids, in their approaches.In these sources, the meta-data-based approach stands out for research paper classification due to its availability at no cost. Various scholars have employed different metadata parameters of research articles, including the title, abstract, keywords, and general terms, for research paper classification. In this study, we chose four meta-data-based features such as, title, keyword, abstract, and general terms from the SANTOS dataset, which was prepared by ACM. To represent these features numerically, we employed a semantic-based model called BERT instead of the commonly used count-based models. BERT generates a 768-dimensional vector for each record, which introduces significant time complexity during computation. Additionally, our proposed model optimizes the features using a genetic algorithm. Optimal feature selection performances a crucial role in this domain, enhancing the overall accuracy of the document classification system while reducing the time complexity associated with selecting the most relevant features from this large-dimensional space. For classification purposes, we employed GNB and SVM classifiers. The outcomes of our study exposed that the combination of title and keywords outperformed other combinations.<br></p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="http://creativecommons.org/licenses/by/4.0" target="_blank">http://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2023.3292248" target="_blank">https://dx.doi.org/10.1109/access.2023.3292248</a></p>
eu_rights_str_mv openAccess
id Manara2_4e0eba5cac140cf91b5c6affa2e4e3ab
identifier_str_mv 10.1109/access.2023.3292248
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/25205225
publishDate 2023
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Optimizing Document Classification: Unleashing the Power of Genetic AlgorithmsGhulam Mustafa (458105)Abid Rauf (17541708)Ahmad Sami Al-Shamayleh (17541495)Muhammad Sulaiman (9106025)Wagdi Alrawagfeh (17271664)Muhammad Tanvir Afzal (4162504)Adnan Akhunzada (3134064)EngineeringElectrical engineeringElectronics, sensors and digital hardwareMaterials engineeringMetadataFeature extractionBit error rateSupport vector machinesGenetic algorithmsClassification algorithmsSemanticsDocument classification (DC)Word2Vector (W2V)bag of word (BOW)term frequency (TF)association for computing machinery (ACM)machine learning (ML)<p dir="ltr">Many individuals, including researchers, professors, and students, encounter difficulties when searching for scholarly documents, papers, and journals within a specific domain. Consequently, scholars have begun to focus on document classification problem, offering various methods to address this issue. Researchers have utilized diverse data sources, such as citations, metadata, content, and hybrids, in their approaches.In these sources, the meta-data-based approach stands out for research paper classification due to its availability at no cost. Various scholars have employed different metadata parameters of research articles, including the title, abstract, keywords, and general terms, for research paper classification. In this study, we chose four meta-data-based features such as, title, keyword, abstract, and general terms from the SANTOS dataset, which was prepared by ACM. To represent these features numerically, we employed a semantic-based model called BERT instead of the commonly used count-based models. BERT generates a 768-dimensional vector for each record, which introduces significant time complexity during computation. Additionally, our proposed model optimizes the features using a genetic algorithm. Optimal feature selection performances a crucial role in this domain, enhancing the overall accuracy of the document classification system while reducing the time complexity associated with selecting the most relevant features from this large-dimensional space. For classification purposes, we employed GNB and SVM classifiers. The outcomes of our study exposed that the combination of title and keywords outperformed other combinations.<br></p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="http://creativecommons.org/licenses/by/4.0" target="_blank">http://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2023.3292248" target="_blank">https://dx.doi.org/10.1109/access.2023.3292248</a></p>2023-07-04T06:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2023.3292248https://figshare.com/articles/journal_contribution/Optimizing_Document_Classification_Unleashing_the_Power_of_Genetic_Algorithms/25205225CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/252052252023-07-04T06:00:00Z
spellingShingle Optimizing Document Classification: Unleashing the Power of Genetic Algorithms
Ghulam Mustafa (458105)
Engineering
Electrical engineering
Electronics, sensors and digital hardware
Materials engineering
Metadata
Feature extraction
Bit error rate
Support vector machines
Genetic algorithms
Classification algorithms
Semantics
Document classification (DC)
Word2Vector (W2V)
bag of word (BOW)
term frequency (TF)
association for computing machinery (ACM)
machine learning (ML)
status_str publishedVersion
title Optimizing Document Classification: Unleashing the Power of Genetic Algorithms
title_full Optimizing Document Classification: Unleashing the Power of Genetic Algorithms
title_fullStr Optimizing Document Classification: Unleashing the Power of Genetic Algorithms
title_full_unstemmed Optimizing Document Classification: Unleashing the Power of Genetic Algorithms
title_short Optimizing Document Classification: Unleashing the Power of Genetic Algorithms
title_sort Optimizing Document Classification: Unleashing the Power of Genetic Algorithms
topic Engineering
Electrical engineering
Electronics, sensors and digital hardware
Materials engineering
Metadata
Feature extraction
Bit error rate
Support vector machines
Genetic algorithms
Classification algorithms
Semantics
Document classification (DC)
Word2Vector (W2V)
bag of word (BOW)
term frequency (TF)
association for computing machinery (ACM)
machine learning (ML)