An Innovative Automatic Indexing Method for Arabic Text

Automatic indexing and texts retrieval methods for languages have been studied for a long time. Compared to other languages, there is still limited research which has been conducted for the automated Arabic Text Categorization. In this work, we present an innovative method to reinforce the accuracy...

Full description

Saved in:
Bibliographic Details
Main Author: Masri, Nour (author)
Format: masterThesis
Published: 2020
Subjects:
Online Access:http://hdl.handle.net/10725/13983
https://doi.org/10.26756/th.2022.445
http://libraries.lau.edu.lb/research/laur/terms-of-use/thesis.php
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automatic indexing and texts retrieval methods for languages have been studied for a long time. Compared to other languages, there is still limited research which has been conducted for the automated Arabic Text Categorization. In this work, we present an innovative method to reinforce the accuracy of automatic indexing of Arabic texts by introducing a Thesaurus. Our model extracts new relevant words by referring to the introduced thesaurus which identi es words correlation. The Thesaurus is built through an NLTK toolkit which contains a library that lists the synonyms of a certain word available in WordNet library. The words having the same meaning and that frequently appear together were grouped under one umbrella using a JSON dictionary making it easier to identify the texts topic. Our results exhibit notable improvement in accuracy and efficiency compared to previous works.