Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers

<p dir="ltr">Despite the recent progress in emotion recognition, state-of-the-art systems are unable to achieve improved performance in cross-language settings. In this article we propose a Multimodal Dual Attention Transformer (MDAT) model to improve cross-language multimodal emotio...

Full description

Saved in:
Bibliographic Details
Main Author: Syed Aun Muhammad Zaidi (22225033) (author)
Other Authors: Siddique Latif (17248783) (author), Junaid Qadir (16494902) (author)
Published: 2024
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1864513540001366016
author Syed Aun Muhammad Zaidi (22225033)
author2 Siddique Latif (17248783)
Junaid Qadir (16494902)
author2_role author
author
author_facet Syed Aun Muhammad Zaidi (22225033)
Siddique Latif (17248783)
Junaid Qadir (16494902)
author_role author
dc.creator.none.fl_str_mv Syed Aun Muhammad Zaidi (22225033)
Siddique Latif (17248783)
Junaid Qadir (16494902)
dc.date.none.fl_str_mv 2024-10-28T09:00:00Z
dc.identifier.none.fl_str_mv 10.1109/ojcs.2024.3486904
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Enhancing_Cross-Language_Multimodal_Emotion_Recognition_With_Dual_Attention_Transformers/30095002
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Information and computing sciences
Artificial intelligence
Machine learning
Co-attention networks
graph attention networks
multi-modal learning
multimodal emotion recognition
Emotion recognition
Transformers
Speech recognition
Data models
Computational modeling
Adaptation models
Vectors
Speech enhancement
Attention mechanisms
Training
dc.title.none.fl_str_mv Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p dir="ltr">Despite the recent progress in emotion recognition, state-of-the-art systems are unable to achieve improved performance in cross-language settings. In this article we propose a Multimodal Dual Attention Transformer (MDAT) model to improve cross-language multimodal emotion recognition. Our model utilises pre-trained models for multimodal feature extraction and is equipped with dual attention mechanisms including graph attention and co-attention to capture complex dependencies across different modalities and languages to achieve improved cross-language multimodal emotion recognition. In addition, our model also exploits a transformer encoder layer for high-level feature representation to improve emotion classification accuracy. This novel construct preserves modality-specific emotional information while enhancing cross-modality and cross-language feature generalisation, resulting in improved performance with minimal target language data. We assess our model's performance on four publicly available emotion recognition datasets and establish its superior effectiveness compared to recent approaches and baseline models.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Open Journal of the Computer Society<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/ojcs.2024.3486904" target="_blank">https://dx.doi.org/10.1109/ojcs.2024.3486904</a></p>
eu_rights_str_mv openAccess
id Manara2_67ee87792baf754fb1c22f205a2ad997
identifier_str_mv 10.1109/ojcs.2024.3486904
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/30095002
publishDate 2024
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention TransformersSyed Aun Muhammad Zaidi (22225033)Siddique Latif (17248783)Junaid Qadir (16494902)Information and computing sciencesArtificial intelligenceMachine learningCo-attention networksgraph attention networksmulti-modal learningmultimodal emotion recognitionEmotion recognitionTransformersSpeech recognitionData modelsComputational modelingAdaptation modelsVectorsSpeech enhancementAttention mechanismsTraining<p dir="ltr">Despite the recent progress in emotion recognition, state-of-the-art systems are unable to achieve improved performance in cross-language settings. In this article we propose a Multimodal Dual Attention Transformer (MDAT) model to improve cross-language multimodal emotion recognition. Our model utilises pre-trained models for multimodal feature extraction and is equipped with dual attention mechanisms including graph attention and co-attention to capture complex dependencies across different modalities and languages to achieve improved cross-language multimodal emotion recognition. In addition, our model also exploits a transformer encoder layer for high-level feature representation to improve emotion classification accuracy. This novel construct preserves modality-specific emotional information while enhancing cross-modality and cross-language feature generalisation, resulting in improved performance with minimal target language data. We assess our model's performance on four publicly available emotion recognition datasets and establish its superior effectiveness compared to recent approaches and baseline models.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Open Journal of the Computer Society<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/ojcs.2024.3486904" target="_blank">https://dx.doi.org/10.1109/ojcs.2024.3486904</a></p>2024-10-28T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/ojcs.2024.3486904https://figshare.com/articles/journal_contribution/Enhancing_Cross-Language_Multimodal_Emotion_Recognition_With_Dual_Attention_Transformers/30095002CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/300950022024-10-28T09:00:00Z
spellingShingle Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
Syed Aun Muhammad Zaidi (22225033)
Information and computing sciences
Artificial intelligence
Machine learning
Co-attention networks
graph attention networks
multi-modal learning
multimodal emotion recognition
Emotion recognition
Transformers
Speech recognition
Data models
Computational modeling
Adaptation models
Vectors
Speech enhancement
Attention mechanisms
Training
status_str publishedVersion
title Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
title_full Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
title_fullStr Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
title_full_unstemmed Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
title_short Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
title_sort Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
topic Information and computing sciences
Artificial intelligence
Machine learning
Co-attention networks
graph attention networks
multi-modal learning
multimodal emotion recognition
Emotion recognition
Transformers
Speech recognition
Data models
Computational modeling
Adaptation models
Vectors
Speech enhancement
Attention mechanisms
Training