Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images

Recognition of continuous sign language is challenging as the number of words is a sentence and their boundaries are unknown during the recognition stage. This work proposes a two-stage solution in which the number of words in a sign language sentence is predicted in the first stage. The sentence is...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Shanableh, Tamer (author)
التنسيق: article
منشور في: 2023
الموضوعات:
الوصول للمادة أونلاين:http://hdl.handle.net/11073/25399
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513436277276672
author Shanableh, Tamer
author_facet Shanableh, Tamer
author_role author
dc.creator.none.fl_str_mv Shanableh, Tamer
dc.date.none.fl_str_mv 2023-11-21T05:22:32Z
2023-11-21T05:22:32Z
2023
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv T. Shanableh, "Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images," in IEEE Access, vol. 11, pp. 126823-126833, 2023, doi: 10.1109/ACCESS.2023.3332250.
2169-3536
http://hdl.handle.net/11073/25399
10.1109/ACCESS.2023.3332250
dc.language.none.fl_str_mv en_US
dc.publisher.none.fl_str_mv IEEE
dc.relation.none.fl_str_mv https://doi.org/10.1109/ACCESS.2023.3332250
dc.subject.none.fl_str_mv Sign language
Feature extraction
Video processing
Deep learning
dc.title.none.fl_str_mv Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
dc.type.none.fl_str_mv Peer-Reviewed
Postprint
info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/article
description Recognition of continuous sign language is challenging as the number of words is a sentence and their boundaries are unknown during the recognition stage. This work proposes a two-stage solution in which the number of words in a sign language sentence is predicted in the first stage. The sentence is then temporally segmented accordingly and each segment is represented in a single image using a novel solution that entails summation of frame differences using motion estimation and compensation. This results in a single image representation per sign language word referred to as a motion image. CNN transfer learning is used to convert each of these motion images into a feature vector which is used for either model generation or sign language recognition. As such, two deep learning models are generated; one for predicting the number of words per sentence and the other for recognizing the meaning of the sign language sentences. The proposed solution of predicting the number of words per sentence and thereafter segmenting the sentence into equal segments worked well. This is because each motion image can contain traces of previous or successive words. This byproduct of the proposed solution is advantageous as it puts words into context, thus justifying the excellent sign language recognition rates reported. It is shown that bidirectional LSTM layers result in the most accurate models for both stages. In the experimental results section we use an existing dataset that contains 40 sentences generated from 80 sign language words. The experiments revealed that the proposed solution resulted in a word and sentence recognition rates of 97.3% and 92.6% respectively. The percentage increase over the best results reported in the literature for the same dataset are 1.8% and 9.1% for both word and sentences recognitions respectively.
format article
id aus_0dba9536efe22b0dfb835b1590598cac
identifier_str_mv T. Shanableh, "Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images," in IEEE Access, vol. 11, pp. 126823-126833, 2023, doi: 10.1109/ACCESS.2023.3332250.
2169-3536
10.1109/ACCESS.2023.3332250
language_invalid_str_mv en_US
network_acronym_str aus
network_name_str aus
oai_identifier_str oai:repository.aus.edu:11073/25399
publishDate 2023
publisher.none.fl_str_mv IEEE
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion ImagesShanableh, TamerSign languageFeature extractionVideo processingDeep learningRecognition of continuous sign language is challenging as the number of words is a sentence and their boundaries are unknown during the recognition stage. This work proposes a two-stage solution in which the number of words in a sign language sentence is predicted in the first stage. The sentence is then temporally segmented accordingly and each segment is represented in a single image using a novel solution that entails summation of frame differences using motion estimation and compensation. This results in a single image representation per sign language word referred to as a motion image. CNN transfer learning is used to convert each of these motion images into a feature vector which is used for either model generation or sign language recognition. As such, two deep learning models are generated; one for predicting the number of words per sentence and the other for recognizing the meaning of the sign language sentences. The proposed solution of predicting the number of words per sentence and thereafter segmenting the sentence into equal segments worked well. This is because each motion image can contain traces of previous or successive words. This byproduct of the proposed solution is advantageous as it puts words into context, thus justifying the excellent sign language recognition rates reported. It is shown that bidirectional LSTM layers result in the most accurate models for both stages. In the experimental results section we use an existing dataset that contains 40 sentences generated from 80 sign language words. The experiments revealed that the proposed solution resulted in a word and sentence recognition rates of 97.3% and 92.6% respectively. The percentage increase over the best results reported in the literature for the same dataset are 1.8% and 9.1% for both word and sentences recognitions respectively.American University of SharjahIEEE2023-11-21T05:22:32Z2023-11-21T05:22:32Z2023Peer-ReviewedPostprintinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfT. Shanableh, "Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images," in IEEE Access, vol. 11, pp. 126823-126833, 2023, doi: 10.1109/ACCESS.2023.3332250.2169-3536http://hdl.handle.net/11073/2539910.1109/ACCESS.2023.3332250en_UShttps://doi.org/10.1109/ACCESS.2023.3332250oai:repository.aus.edu:11073/253992024-08-22T12:07:31Z
spellingShingle Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
Shanableh, Tamer
Sign language
Feature extraction
Video processing
Deep learning
status_str publishedVersion
title Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
title_full Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
title_fullStr Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
title_full_unstemmed Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
title_short Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
title_sort Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
topic Sign language
Feature extraction
Video processing
Deep learning
url http://hdl.handle.net/11073/25399