A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection
<p dir="ltr">In intelligent systems for real-time security and safety monitoring, the proliferation of surveillance cameras has fueled a growing interest in using deep learning-based artificial intelligence (AI) models for violence detection. Most current approaches consider violence...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , , |
| منشور في: |
2025
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1864513524485586944 |
|---|---|
| author | Elizabeth B. Varghese (19198018) |
| author2 | Almiqdad Elzein (13141038) Yin Yang (35103) Marwa Qaraqe (10135172) |
| author2_role | author author author |
| author_facet | Elizabeth B. Varghese (19198018) Almiqdad Elzein (13141038) Yin Yang (35103) Marwa Qaraqe (10135172) |
| author_role | author |
| dc.creator.none.fl_str_mv | Elizabeth B. Varghese (19198018) Almiqdad Elzein (13141038) Yin Yang (35103) Marwa Qaraqe (10135172) |
| dc.date.none.fl_str_mv | 2025-09-16T09:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1007/s00521-025-11641-4 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/A_temporal_spatial_deep_learning_framework_leveraging_dynamic_3D_attention_maps_for_violence_detection/31167886 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Information and computing sciences Artificial intelligence Computer vision and multimedia computation Cybersecurity and privacy Machine learning Video surveillance Violence detection Computer vision 3D spatiotemporal attention maps Residual convolutional neural network |
| dc.title.none.fl_str_mv | A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p dir="ltr">In intelligent systems for real-time security and safety monitoring, the proliferation of surveillance cameras has fueled a growing interest in using deep learning-based artificial intelligence (AI) models for violence detection. Most current approaches consider violence detection as a video classification task, overlooking the fact that violent activities occur within relatively small spatiotemporal regions. Moreover, these activities depend on relationships among multiple such regions, making a single region analysis inadequate, especially for larger-scale violence. This paper proposes a novel temporal–spatial attention framework inspired by human visual perception, which dynamically focuses on multiple informative regions across space and time. By learning where, when, and for how long to attend within a video, using dynamic three-dimensional attention prediction networks, the model captures complex patterns of violent behavior more effectively. Experiments on four public benchmark datasets and a real-world dataset created for this study demonstrate that the proposed approach outperforms existing methods in accuracy and interpretability.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: Neural Computing and Applications<br>License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1007/s00521-025-11641-4" target="_blank">https://dx.doi.org/10.1007/s00521-025-11641-4</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_169e839da7f4f1c826db2ba407e638da |
| identifier_str_mv | 10.1007/s00521-025-11641-4 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/31167886 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detectionElizabeth B. Varghese (19198018)Almiqdad Elzein (13141038)Yin Yang (35103)Marwa Qaraqe (10135172)Information and computing sciencesArtificial intelligenceComputer vision and multimedia computationCybersecurity and privacyMachine learningVideo surveillanceViolence detectionComputer vision3D spatiotemporal attention mapsResidual convolutional neural network<p dir="ltr">In intelligent systems for real-time security and safety monitoring, the proliferation of surveillance cameras has fueled a growing interest in using deep learning-based artificial intelligence (AI) models for violence detection. Most current approaches consider violence detection as a video classification task, overlooking the fact that violent activities occur within relatively small spatiotemporal regions. Moreover, these activities depend on relationships among multiple such regions, making a single region analysis inadequate, especially for larger-scale violence. This paper proposes a novel temporal–spatial attention framework inspired by human visual perception, which dynamically focuses on multiple informative regions across space and time. By learning where, when, and for how long to attend within a video, using dynamic three-dimensional attention prediction networks, the model captures complex patterns of violent behavior more effectively. Experiments on four public benchmark datasets and a real-world dataset created for this study demonstrate that the proposed approach outperforms existing methods in accuracy and interpretability.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: Neural Computing and Applications<br>License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1007/s00521-025-11641-4" target="_blank">https://dx.doi.org/10.1007/s00521-025-11641-4</a></p>2025-09-16T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1007/s00521-025-11641-4https://figshare.com/articles/journal_contribution/A_temporal_spatial_deep_learning_framework_leveraging_dynamic_3D_attention_maps_for_violence_detection/31167886CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/311678862025-09-16T09:00:00Z |
| spellingShingle | A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection Elizabeth B. Varghese (19198018) Information and computing sciences Artificial intelligence Computer vision and multimedia computation Cybersecurity and privacy Machine learning Video surveillance Violence detection Computer vision 3D spatiotemporal attention maps Residual convolutional neural network |
| status_str | publishedVersion |
| title | A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection |
| title_full | A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection |
| title_fullStr | A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection |
| title_full_unstemmed | A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection |
| title_short | A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection |
| title_sort | A temporal–spatial deep learning framework leveraging dynamic 3D attention maps for violence detection |
| topic | Information and computing sciences Artificial intelligence Computer vision and multimedia computation Cybersecurity and privacy Machine learning Video surveillance Violence detection Computer vision 3D spatiotemporal attention maps Residual convolutional neural network |