A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset

<div><p>Depth video sequence-based deep models for recognizing human actions are scarce compared to RGB and skeleton video sequences-based models. This scarcity limits the research advancements based on depth data, as training deep models with small-scale data is challenging. In this wor...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Mohammad Farhad Bulbul (18278689) (author)
مؤلفون آخرون:	Amin Ullah (12015113) (author), Hazrat Ali (421019) (author), Daijin Kim (18278692) (author)
منشور في:	2022
الموضوعات:	Chemical sciences Analytical chemistry Engineering Electrical engineering Electronics, sensors and digital hardware Physical sciences Atomic, molecular and optical physics 3D action recognition depth map sequence CNN transfer learning bi-directional LSTM RNN attention
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

_version_	1864513519766994944
author	Mohammad Farhad Bulbul (18278689)
author2	Amin Ullah (12015113) Hazrat Ali (421019) Daijin Kim (18278692)
author2_role	author author author
author_facet	Mohammad Farhad Bulbul (18278689) Amin Ullah (12015113) Hazrat Ali (421019) Daijin Kim (18278692)
author_role	author
dc.creator.none.fl_str_mv	Mohammad Farhad Bulbul (18278689) Amin Ullah (12015113) Hazrat Ali (421019) Daijin Kim (18278692)
dc.date.none.fl_str_mv	2022-09-09T03:00:00Z
dc.identifier.none.fl_str_mv	10.3390/s22186841
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/A_Deep_Sequence_Learning_Framework_for_Action_Recognition_in_Small-Scale_Depth_Video_Dataset/25513873
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Chemical sciences Analytical chemistry Engineering Electrical engineering Electronics, sensors and digital hardware Physical sciences Atomic, molecular and optical physics 3D action recognition depth map sequence CNN transfer learning bi-directional LSTM RNN attention
dc.title.none.fl_str_mv	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<div><p>Depth video sequence-based deep models for recognizing human actions are scarce compared to RGB and skeleton video sequences-based models. This scarcity limits the research advancements based on depth data, as training deep models with small-scale data is challenging. In this work, we propose a sequence classification deep model using depth video data for scenarios when the video data are limited. Unlike summarizing the frame contents of each frame into a single class, our method can directly classify a depth video, i.e., a sequence of depth frames. Firstly, the proposed system transforms an input depth video into three sequences of multi-view temporal motion frames. Together with the three temporal motion sequences, the input depth frame sequence offers a four-stream representation of the input depth action video. Next, the DenseNet121 architecture is employed along with ImageNet pre-trained weights to extract the discriminating frame-level action features of depth and temporal motion frames. The extracted four sets of feature vectors about frames of four streams are fed into four bi-directional (BLSTM) networks. The temporal features are further analyzed through multi-head self-attention (MHSA) to capture multi-view sequence correlations. Finally, the concatenated genre of their outputs is processed through dense layers to classify the input depth video. The experimental results on two small-scale benchmark depth datasets, MSRAction3D and DHA, demonstrate that the proposed framework is efficacious even for insufficient training samples and superior to the existing depth data-based action recognition methods.</p><p> </p></div><h2>Other Information</h2> <p> Published in: Sensors<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3390/s22186841" target="_blank">https://dx.doi.org/10.3390/s22186841</a></p>
eu_rights_str_mv	openAccess
id	Manara2_078c5200a6ac613eb296bc440a275a2f
identifier_str_mv	10.3390/s22186841
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/25513873
publishDate	2022
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video DatasetMohammad Farhad Bulbul (18278689)Amin Ullah (12015113)Hazrat Ali (421019)Daijin Kim (18278692)Chemical sciencesAnalytical chemistryEngineeringElectrical engineeringElectronics, sensors and digital hardwarePhysical sciencesAtomic, molecular and optical physics3D action recognitiondepth map sequenceCNNtransfer learningbi-directional LSTMRNNattention<div><p>Depth video sequence-based deep models for recognizing human actions are scarce compared to RGB and skeleton video sequences-based models. This scarcity limits the research advancements based on depth data, as training deep models with small-scale data is challenging. In this work, we propose a sequence classification deep model using depth video data for scenarios when the video data are limited. Unlike summarizing the frame contents of each frame into a single class, our method can directly classify a depth video, i.e., a sequence of depth frames. Firstly, the proposed system transforms an input depth video into three sequences of multi-view temporal motion frames. Together with the three temporal motion sequences, the input depth frame sequence offers a four-stream representation of the input depth action video. Next, the DenseNet121 architecture is employed along with ImageNet pre-trained weights to extract the discriminating frame-level action features of depth and temporal motion frames. The extracted four sets of feature vectors about frames of four streams are fed into four bi-directional (BLSTM) networks. The temporal features are further analyzed through multi-head self-attention (MHSA) to capture multi-view sequence correlations. Finally, the concatenated genre of their outputs is processed through dense layers to classify the input depth video. The experimental results on two small-scale benchmark depth datasets, MSRAction3D and DHA, demonstrate that the proposed framework is efficacious even for insufficient training samples and superior to the existing depth data-based action recognition methods.</p><p> </p></div><h2>Other Information</h2> <p> Published in: Sensors<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3390/s22186841" target="_blank">https://dx.doi.org/10.3390/s22186841</a></p>2022-09-09T03:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.3390/s22186841https://figshare.com/articles/journal_contribution/A_Deep_Sequence_Learning_Framework_for_Action_Recognition_in_Small-Scale_Depth_Video_Dataset/25513873CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/255138732022-09-09T03:00:00Z
spellingShingle	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset Mohammad Farhad Bulbul (18278689) Chemical sciences Analytical chemistry Engineering Electrical engineering Electronics, sensors and digital hardware Physical sciences Atomic, molecular and optical physics 3D action recognition depth map sequence CNN transfer learning bi-directional LSTM RNN attention
status_str	publishedVersion
title	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
title_full	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
title_fullStr	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
title_full_unstemmed	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
title_short	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
title_sort	A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
topic	Chemical sciences Analytical chemistry Engineering Electrical engineering Electronics, sensors and digital hardware Physical sciences Atomic, molecular and optical physics 3D action recognition depth map sequence CNN transfer learning bi-directional LSTM RNN attention

A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset

مواد مشابهة