EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment

<p dir="ltr">The primary goal of this study is to develop a deep neural network for action recognition that enhances accuracy and minimizes computational costs. In this regard, we propose a modified EMO-MoviNet-A2* architecture that integrates Evolving Normalization (EvoNorm), Mish a...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Tarique Hussain (5946023) (author)
مؤلفون آخرون: Zulfiqar Ali Memon (17632191) (author), Rizwan Qureshi (15279193) (author), Tanvir Alam (638619) (author)
منشور في: 2023
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513532076228608
author Tarique Hussain (5946023)
author2 Zulfiqar Ali Memon (17632191)
Rizwan Qureshi (15279193)
Tanvir Alam (638619)
author2_role author
author
author
author_facet Tarique Hussain (5946023)
Zulfiqar Ali Memon (17632191)
Rizwan Qureshi (15279193)
Tanvir Alam (638619)
author_role author
dc.creator.none.fl_str_mv Tarique Hussain (5946023)
Zulfiqar Ali Memon (17632191)
Rizwan Qureshi (15279193)
Tanvir Alam (638619)
dc.date.none.fl_str_mv 2023-09-27T03:00:00Z
dc.identifier.none.fl_str_mv 10.3390/s23198106
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/EMO-MoviNet_Enhancing_Action_Recognition_in_Videos_with_EvoNorm_Mish_Activation_and_Optimal_Frame_Selection_for_Efficient_Mobile_Deployment/24805176
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Engineering
Communications engineering
Information and computing sciences
Computer vision and multimedia computation
Machine learning
mobile networks
video classification
action recognition
deep learning
dc.title.none.fl_str_mv EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p dir="ltr">The primary goal of this study is to develop a deep neural network for action recognition that enhances accuracy and minimizes computational costs. In this regard, we propose a modified EMO-MoviNet-A2* architecture that integrates Evolving Normalization (EvoNorm), Mish activation, and optimal frame selection to improve the accuracy and efficiency of action recognition tasks in videos. The asterisk notation indicates that this model also incorporates the stream buffer concept. The Mobile Video Network (MoviNet) is a member of the memory-efficient architectures discovered through Neural Architecture Search (NAS), which balances accuracy and efficiency by integrating spatial, temporal, and spatio-temporal operations. Our research implements the MoviNet model on the UCF101 and HMDB51 datasets, pre-trained on the kinetics dataset. Upon implementation on the UCF101 dataset, a generalization gap was observed, with the model performing better on the training set than on the testing set. To address this issue, we replaced batch normalization with EvoNorm, which unifies normalization and activation functions. Another area that required improvement was key-frame selection. We also developed a novel technique called Optimal Frame Selection (OFS) to identify key-frames within videos more effectively than random or densely frame selection methods. Combining OFS with Mish nonlinearity resulted in a 0.8–1% improvement in accuracy in our UCF101 20-classes experiment. The EMO-MoviNet-A2* model consumes 86% fewer FLOPs and approximately 90% fewer parameters on the UCF101 dataset, with a trade-off of 1–2% accuracy. Additionally, it achieves 5–7% higher accuracy on the HMDB51 dataset while requiring seven times fewer FLOPs and ten times fewer parameters compared to the reference model, Motion-Augmented RGB Stream (MARS).</p><h2>Other Information</h2><p dir="ltr">Published in: Sensors<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3390/s23198106" target="_blank">https://dx.doi.org/10.3390/s23198106</a></p>
eu_rights_str_mv openAccess
id Manara2_54433e2a7a39fb091e8ab239c10d3e45
identifier_str_mv 10.3390/s23198106
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/24805176
publishDate 2023
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile DeploymentTarique Hussain (5946023)Zulfiqar Ali Memon (17632191)Rizwan Qureshi (15279193)Tanvir Alam (638619)EngineeringCommunications engineeringInformation and computing sciencesComputer vision and multimedia computationMachine learningmobile networksvideo classificationaction recognitiondeep learning<p dir="ltr">The primary goal of this study is to develop a deep neural network for action recognition that enhances accuracy and minimizes computational costs. In this regard, we propose a modified EMO-MoviNet-A2* architecture that integrates Evolving Normalization (EvoNorm), Mish activation, and optimal frame selection to improve the accuracy and efficiency of action recognition tasks in videos. The asterisk notation indicates that this model also incorporates the stream buffer concept. The Mobile Video Network (MoviNet) is a member of the memory-efficient architectures discovered through Neural Architecture Search (NAS), which balances accuracy and efficiency by integrating spatial, temporal, and spatio-temporal operations. Our research implements the MoviNet model on the UCF101 and HMDB51 datasets, pre-trained on the kinetics dataset. Upon implementation on the UCF101 dataset, a generalization gap was observed, with the model performing better on the training set than on the testing set. To address this issue, we replaced batch normalization with EvoNorm, which unifies normalization and activation functions. Another area that required improvement was key-frame selection. We also developed a novel technique called Optimal Frame Selection (OFS) to identify key-frames within videos more effectively than random or densely frame selection methods. Combining OFS with Mish nonlinearity resulted in a 0.8–1% improvement in accuracy in our UCF101 20-classes experiment. The EMO-MoviNet-A2* model consumes 86% fewer FLOPs and approximately 90% fewer parameters on the UCF101 dataset, with a trade-off of 1–2% accuracy. Additionally, it achieves 5–7% higher accuracy on the HMDB51 dataset while requiring seven times fewer FLOPs and ten times fewer parameters compared to the reference model, Motion-Augmented RGB Stream (MARS).</p><h2>Other Information</h2><p dir="ltr">Published in: Sensors<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3390/s23198106" target="_blank">https://dx.doi.org/10.3390/s23198106</a></p>2023-09-27T03:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.3390/s23198106https://figshare.com/articles/journal_contribution/EMO-MoviNet_Enhancing_Action_Recognition_in_Videos_with_EvoNorm_Mish_Activation_and_Optimal_Frame_Selection_for_Efficient_Mobile_Deployment/24805176CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/248051762023-09-27T03:00:00Z
spellingShingle EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment
Tarique Hussain (5946023)
Engineering
Communications engineering
Information and computing sciences
Computer vision and multimedia computation
Machine learning
mobile networks
video classification
action recognition
deep learning
status_str publishedVersion
title EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment
title_full EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment
title_fullStr EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment
title_full_unstemmed EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment
title_short EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment
title_sort EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment
topic Engineering
Communications engineering
Information and computing sciences
Computer vision and multimedia computation
Machine learning
mobile networks
video classification
action recognition
deep learning