A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection

<p dir="ltr">In intelligent systems for real-time security and safety monitoring, the proliferation of surveillance cameras has fueled a growing interest in using deep learning-based artificial intelligence (AI) models for violence detection. Most current approaches consider violence...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Elizabeth B. Varghese (19198018) (author)
مؤلفون آخرون: Almiqdad Elzein (13141038) (author), Yin Yang (35103) (author), Marwa Qaraqe (10135172) (author)
منشور في: 2025
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513524485586944
author Elizabeth B. Varghese (19198018)
author2 Almiqdad Elzein (13141038)
Yin Yang (35103)
Marwa Qaraqe (10135172)
author2_role author
author
author
author_facet Elizabeth B. Varghese (19198018)
Almiqdad Elzein (13141038)
Yin Yang (35103)
Marwa Qaraqe (10135172)
author_role author
dc.creator.none.fl_str_mv Elizabeth B. Varghese (19198018)
Almiqdad Elzein (13141038)
Yin Yang (35103)
Marwa Qaraqe (10135172)
dc.date.none.fl_str_mv 2025-09-16T09:00:00Z
dc.identifier.none.fl_str_mv 10.1007/s00521-025-11641-4
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/A_temporal_spatial_deep_learning_framework_leveraging_dynamic_3D_attention_maps_for_violence_detection/31167886
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Information and computing sciences
Artificial intelligence
Computer vision and multimedia computation
Cybersecurity and privacy
Machine learning
Video surveillance
Violence detection
Computer vision
3D spatiotemporal attention maps
Residual convolutional neural network
dc.title.none.fl_str_mv A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p dir="ltr">In intelligent systems for real-time security and safety monitoring, the proliferation of surveillance cameras has fueled a growing interest in using deep learning-based artificial intelligence (AI) models for violence detection. Most current approaches consider violence detection as a video classification task, overlooking the fact that violent activities occur within relatively small spatiotemporal regions. Moreover, these activities depend on relationships among multiple such regions, making a single region analysis inadequate, especially for larger-scale violence. This paper proposes a novel temporal–spatial attention framework inspired by human visual perception, which dynamically focuses on multiple informative regions across space and time. By learning where, when, and for how long to attend within a video, using dynamic three-dimensional attention prediction networks, the model captures complex patterns of violent behavior more effectively. Experiments on four public benchmark datasets and a real-world dataset created for this study demonstrate that the proposed approach outperforms existing methods in accuracy and interpretability.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: Neural Computing and Applications<br>License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1007/s00521-025-11641-4" target="_blank">https://dx.doi.org/10.1007/s00521-025-11641-4</a></p>
eu_rights_str_mv openAccess
id Manara2_169e839da7f4f1c826db2ba407e638da
identifier_str_mv 10.1007/s00521-025-11641-4
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/31167886
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detectionElizabeth B. Varghese (19198018)Almiqdad Elzein (13141038)Yin Yang (35103)Marwa Qaraqe (10135172)Information and computing sciencesArtificial intelligenceComputer vision and multimedia computationCybersecurity and privacyMachine learningVideo surveillanceViolence detectionComputer vision3D spatiotemporal attention mapsResidual convolutional neural network<p dir="ltr">In intelligent systems for real-time security and safety monitoring, the proliferation of surveillance cameras has fueled a growing interest in using deep learning-based artificial intelligence (AI) models for violence detection. Most current approaches consider violence detection as a video classification task, overlooking the fact that violent activities occur within relatively small spatiotemporal regions. Moreover, these activities depend on relationships among multiple such regions, making a single region analysis inadequate, especially for larger-scale violence. This paper proposes a novel temporal–spatial attention framework inspired by human visual perception, which dynamically focuses on multiple informative regions across space and time. By learning where, when, and for how long to attend within a video, using dynamic three-dimensional attention prediction networks, the model captures complex patterns of violent behavior more effectively. Experiments on four public benchmark datasets and a real-world dataset created for this study demonstrate that the proposed approach outperforms existing methods in accuracy and interpretability.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: Neural Computing and Applications<br>License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1007/s00521-025-11641-4" target="_blank">https://dx.doi.org/10.1007/s00521-025-11641-4</a></p>2025-09-16T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1007/s00521-025-11641-4https://figshare.com/articles/journal_contribution/A_temporal_spatial_deep_learning_framework_leveraging_dynamic_3D_attention_maps_for_violence_detection/31167886CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/311678862025-09-16T09:00:00Z
spellingShingle A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection
Elizabeth B. Varghese (19198018)
Information and computing sciences
Artificial intelligence
Computer vision and multimedia computation
Cybersecurity and privacy
Machine learning
Video surveillance
Violence detection
Computer vision
3D spatiotemporal attention maps
Residual convolutional neural network
status_str publishedVersion
title A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection
title_full A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection
title_fullStr A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection
title_full_unstemmed A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection
title_short A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection
title_sort A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection
topic Information and computing sciences
Artificial intelligence
Computer vision and multimedia computation
Cybersecurity and privacy
Machine learning
Video surveillance
Violence detection
Computer vision
3D spatiotemporal attention maps
Residual convolutional neural network