Diverse Pose Lip-Reading Framework

<p dir="ltr">Lip-reading is a technique to understand speech by observing a speaker’s lips movement. It has numerous applications; for example, it is helpful for hearing impaired persons and understanding the speech in noisy environments. Most of the previous works of lips-reading fo...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Naheed Akhter (17364436) (author)
مؤلفون آخرون:	Mushtaq Ali (3598514) (author), Lal Hussain (14100502) (author), Mohsin Shah (3144564) (author), Toqeer Mahmood (4994945) (author), Amjad Ali (51075) (author), Ala Al-Fuqaha (4434340) (author)
منشور في:	2022
الموضوعات:	Information and computing sciences Artificial intelligence Human-centred computing Machine learning lip reading machine learning face frontalization generative adversarial network
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

_version_	1864513506550743040
author	Naheed Akhter (17364436)
author2	Mushtaq Ali (3598514) Lal Hussain (14100502) Mohsin Shah (3144564) Toqeer Mahmood (4994945) Amjad Ali (51075) Ala Al-Fuqaha (4434340)
author2_role	author author author author author author
author_facet	Naheed Akhter (17364436) Mushtaq Ali (3598514) Lal Hussain (14100502) Mohsin Shah (3144564) Toqeer Mahmood (4994945) Amjad Ali (51075) Ala Al-Fuqaha (4434340)
author_role	author
dc.creator.none.fl_str_mv	Naheed Akhter (17364436) Mushtaq Ali (3598514) Lal Hussain (14100502) Mohsin Shah (3144564) Toqeer Mahmood (4994945) Amjad Ali (51075) Ala Al-Fuqaha (4434340)
dc.date.none.fl_str_mv	2022-09-22T09:00:00Z
dc.identifier.none.fl_str_mv	10.3390/app12199532
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/Diverse_Pose_Lip-Reading_Framework/26889103
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Information and computing sciences Artificial intelligence Human-centred computing Machine learning lip reading machine learning face frontalization generative adversarial network
dc.title.none.fl_str_mv	Diverse Pose Lip-Reading Framework
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<p dir="ltr">Lip-reading is a technique to understand speech by observing a speaker’s lips movement. It has numerous applications; for example, it is helpful for hearing impaired persons and understanding the speech in noisy environments. Most of the previous works of lips-reading focused on frontal and near frontal face lip-reading and some of them targeted multiple poses in high quality videos. However, their results are not satisfactory on low quality videos containing multiple poses. In this research work, a lip-reading framework is proposed for improving the recognition rate in low quality videos. In this work, a Multiple Pose (MP) dataset of low quality videos containing multiple extreme poses is built. The proposed framework decomposes the input video into frames and enhances them by applying the Contrast Limited Adaptive Histogram Equalization (CLAHE) method. Next, faces are detected from enhanced frames and frontalized the multiple poses using the face frontalization Generative Adversarial Network (FF-GAN). After face frontalization, the mouth region is extracted. The extracted mouth region in the whole video and its respective sentences are then provided to the ResNet during the training process. The proposed framework achieved a sentence prediction accuracy of 90% on a testing dataset containing 100 silent low-quality videos with multiple poses that are better as compared to state-of-the-art methods.</p><h2>Other Information</h2><p dir="ltr">Published in: Applied Sciences<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3390/app12199532" target="_blank">https://dx.doi.org/10.3390/app12199532</a></p>
eu_rights_str_mv	openAccess
id	Manara2_34adb94afe49459bd0a017a425940549
identifier_str_mv	10.3390/app12199532
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/26889103
publishDate	2022
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	Diverse Pose Lip-Reading FrameworkNaheed Akhter (17364436)Mushtaq Ali (3598514)Lal Hussain (14100502)Mohsin Shah (3144564)Toqeer Mahmood (4994945)Amjad Ali (51075)Ala Al-Fuqaha (4434340)Information and computing sciencesArtificial intelligenceHuman-centred computingMachine learninglip readingmachine learningface frontalizationgenerative adversarial network<p dir="ltr">Lip-reading is a technique to understand speech by observing a speaker’s lips movement. It has numerous applications; for example, it is helpful for hearing impaired persons and understanding the speech in noisy environments. Most of the previous works of lips-reading focused on frontal and near frontal face lip-reading and some of them targeted multiple poses in high quality videos. However, their results are not satisfactory on low quality videos containing multiple poses. In this research work, a lip-reading framework is proposed for improving the recognition rate in low quality videos. In this work, a Multiple Pose (MP) dataset of low quality videos containing multiple extreme poses is built. The proposed framework decomposes the input video into frames and enhances them by applying the Contrast Limited Adaptive Histogram Equalization (CLAHE) method. Next, faces are detected from enhanced frames and frontalized the multiple poses using the face frontalization Generative Adversarial Network (FF-GAN). After face frontalization, the mouth region is extracted. The extracted mouth region in the whole video and its respective sentences are then provided to the ResNet during the training process. The proposed framework achieved a sentence prediction accuracy of 90% on a testing dataset containing 100 silent low-quality videos with multiple poses that are better as compared to state-of-the-art methods.</p><h2>Other Information</h2><p dir="ltr">Published in: Applied Sciences<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3390/app12199532" target="_blank">https://dx.doi.org/10.3390/app12199532</a></p>2022-09-22T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.3390/app12199532https://figshare.com/articles/journal_contribution/Diverse_Pose_Lip-Reading_Framework/26889103CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/268891032022-09-22T09:00:00Z
spellingShingle	Diverse Pose Lip-Reading Framework Naheed Akhter (17364436) Information and computing sciences Artificial intelligence Human-centred computing Machine learning lip reading machine learning face frontalization generative adversarial network
status_str	publishedVersion
title	Diverse Pose Lip-Reading Framework
title_full	Diverse Pose Lip-Reading Framework
title_fullStr	Diverse Pose Lip-Reading Framework
title_full_unstemmed	Diverse Pose Lip-Reading Framework
title_short	Diverse Pose Lip-Reading Framework
title_sort	Diverse Pose Lip-Reading Framework
topic	Information and computing sciences Artificial intelligence Human-centred computing Machine learning lip reading machine learning face frontalization generative adversarial network

Diverse Pose Lip-Reading Framework

مواد مشابهة