Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images

Recognition of continuous sign language is challenging as the number of words is a sentence and their boundaries are unknown during the recognition stage. This work proposes a two-stage solution in which the number of words in a sign language sentence is predicted in the first stage. The sentence is...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Shanableh, Tamer (author)
التنسيق:	article
منشور في:	2023
الموضوعات:	Sign language Feature extraction Video processing Deep learning
الوصول للمادة أونلاين:	http://hdl.handle.net/11073/25399
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

_version_	1864513436277276672
author	Shanableh, Tamer
author_facet	Shanableh, Tamer
author_role	author
dc.creator.none.fl_str_mv	Shanableh, Tamer
dc.date.none.fl_str_mv	2023-11-21T05:22:32Z 2023-11-21T05:22:32Z 2023
dc.format.none.fl_str_mv	application/pdf
dc.identifier.none.fl_str_mv	T. Shanableh, "Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images," in IEEE Access, vol. 11, pp. 126823-126833, 2023, doi: 10.1109/ACCESS.2023.3332250. 2169-3536 http://hdl.handle.net/11073/25399 10.1109/ACCESS.2023.3332250
dc.language.none.fl_str_mv	en_US
dc.publisher.none.fl_str_mv	IEEE
dc.relation.none.fl_str_mv	https://doi.org/10.1109/ACCESS.2023.3332250
dc.subject.none.fl_str_mv	Sign language Feature extraction Video processing Deep learning
dc.title.none.fl_str_mv	Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
dc.type.none.fl_str_mv	Peer-Reviewed Postprint info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/article
description	Recognition of continuous sign language is challenging as the number of words is a sentence and their boundaries are unknown during the recognition stage. This work proposes a two-stage solution in which the number of words in a sign language sentence is predicted in the first stage. The sentence is then temporally segmented accordingly and each segment is represented in a single image using a novel solution that entails summation of frame differences using motion estimation and compensation. This results in a single image representation per sign language word referred to as a motion image. CNN transfer learning is used to convert each of these motion images into a feature vector which is used for either model generation or sign language recognition. As such, two deep learning models are generated; one for predicting the number of words per sentence and the other for recognizing the meaning of the sign language sentences. The proposed solution of predicting the number of words per sentence and thereafter segmenting the sentence into equal segments worked well. This is because each motion image can contain traces of previous or successive words. This byproduct of the proposed solution is advantageous as it puts words into context, thus justifying the excellent sign language recognition rates reported. It is shown that bidirectional LSTM layers result in the most accurate models for both stages. In the experimental results section we use an existing dataset that contains 40 sentences generated from 80 sign language words. The experiments revealed that the proposed solution resulted in a word and sentence recognition rates of 97.3% and 92.6% respectively. The percentage increase over the best results reported in the literature for the same dataset are 1.8% and 9.1% for both word and sentences recognitions respectively.
format	article
id	aus_0dba9536efe22b0dfb835b1590598cac
identifier_str_mv	T. Shanableh, "Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images," in IEEE Access, vol. 11, pp. 126823-126833, 2023, doi: 10.1109/ACCESS.2023.3332250. 2169-3536 10.1109/ACCESS.2023.3332250
language_invalid_str_mv	en_US
network_acronym_str	aus
network_name_str	aus
oai_identifier_str	oai:repository.aus.edu:11073/25399
publishDate	2023
publisher.none.fl_str_mv	IEEE
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling	Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion ImagesShanableh, TamerSign languageFeature extractionVideo processingDeep learningRecognition of continuous sign language is challenging as the number of words is a sentence and their boundaries are unknown during the recognition stage. This work proposes a two-stage solution in which the number of words in a sign language sentence is predicted in the first stage. The sentence is then temporally segmented accordingly and each segment is represented in a single image using a novel solution that entails summation of frame differences using motion estimation and compensation. This results in a single image representation per sign language word referred to as a motion image. CNN transfer learning is used to convert each of these motion images into a feature vector which is used for either model generation or sign language recognition. As such, two deep learning models are generated; one for predicting the number of words per sentence and the other for recognizing the meaning of the sign language sentences. The proposed solution of predicting the number of words per sentence and thereafter segmenting the sentence into equal segments worked well. This is because each motion image can contain traces of previous or successive words. This byproduct of the proposed solution is advantageous as it puts words into context, thus justifying the excellent sign language recognition rates reported. It is shown that bidirectional LSTM layers result in the most accurate models for both stages. In the experimental results section we use an existing dataset that contains 40 sentences generated from 80 sign language words. The experiments revealed that the proposed solution resulted in a word and sentence recognition rates of 97.3% and 92.6% respectively. The percentage increase over the best results reported in the literature for the same dataset are 1.8% and 9.1% for both word and sentences recognitions respectively.American University of SharjahIEEE2023-11-21T05:22:32Z2023-11-21T05:22:32Z2023Peer-ReviewedPostprintinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfT. Shanableh, "Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images," in IEEE Access, vol. 11, pp. 126823-126833, 2023, doi: 10.1109/ACCESS.2023.3332250.2169-3536http://hdl.handle.net/11073/2539910.1109/ACCESS.2023.3332250en_UShttps://doi.org/10.1109/ACCESS.2023.3332250oai:repository.aus.edu:11073/253992024-08-22T12:07:31Z
spellingShingle	Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images Shanableh, Tamer Sign language Feature extraction Video processing Deep learning
status_str	publishedVersion
title	Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
title_full	Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
title_fullStr	Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
title_full_unstemmed	Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
title_short	Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
title_sort	Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images
topic	Sign language Feature extraction Video processing Deep learning
url	http://hdl.handle.net/11073/25399

Two-Stage Deep Learning Solution for Continuous Arabic Sign Language Recognition Using Word Count Prediction and Motion Images

مواد مشابهة