Adjustment step size.

<div><p>In modern multimodal interaction design, integrating information from diverse modalities—such as speech, vision, and text—presents a significant challenge. These modalities differ in structure, timing, and data volume, often leading to mismatches, low computational efficiency, an...

Full description

Saved in:
Bibliographic Details
Main Author: Qingnan Ji (22662198) (author)
Other Authors: Jinxia Wang (355966) (author), Lixian Wang (465239) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1852014634840621056
author Qingnan Ji (22662198)
author2 Jinxia Wang (355966)
Lixian Wang (465239)
author2_role author
author
author_facet Qingnan Ji (22662198)
Jinxia Wang (355966)
Lixian Wang (465239)
author_role author
dc.creator.none.fl_str_mv Qingnan Ji (22662198)
Jinxia Wang (355966)
Lixian Wang (465239)
dc.date.none.fl_str_mv 2025-11-21T18:26:37Z
dc.identifier.none.fl_str_mv 10.1371/journal.pone.0326662.g003
dc.relation.none.fl_str_mv https://figshare.com/articles/figure/Adjustment_step_size_/30676947
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biophysics
Cancer
Science Policy
Space Science
Biological Sciences not elsewhere classified
Information Systems not elsewhere classified
text &# 8212
suboptimal user experiences
novel optimization strategy
multimodal interaction scenarios
modal correlation matrix
keypoint detection ),
intelligent interaction design
iemocap )&# 8212
experimental results show
dynamic weighting mechanism
advancements contribute meaningfully
multimodal information fusion
textual data relevant
structured matching models
multimodal information integration
integration efficiency increases
correlation matrix constraint
low computational efficiency
integration process
integrating information
matching process
data volume
computational complexity
xlink ">
temporal alignment
study introduces
study aims
significantly reduced
significant challenge
optimized kuhn
often leading
munkres algorithm
modalities differ
improved kuhn
importance scores
feature extraction
experience ratings
critical contribution
computer collaboration
broad applicability
baseline method
also incorporating
ablation studies
7 %,
4 %.
dc.title.none.fl_str_mv Adjustment step size.
dc.type.none.fl_str_mv Image
Figure
info:eu-repo/semantics/publishedVersion
image
description <div><p>In modern multimodal interaction design, integrating information from diverse modalities—such as speech, vision, and text—presents a significant challenge. These modalities differ in structure, timing, and data volume, often leading to mismatches, low computational efficiency, and suboptimal user experiences during the integration process. This study aims to enhance both the efficiency and accuracy of multimodal information fusion. To achieve this, publicly available datasets—Carnegie Mellon University Multimodal Opinion Sentiment Intensity (CMU-MOSI) and Interactive Emotional Dyadic Motion Capture (IEMOCAP)—are employed to collect speech, visual, and textual data relevant to multimodal interaction scenarios. The data undergo preprocessing steps including noise reduction, feature extraction (e.g., Mel Frequency Cepstral Coefficients and keypoint detection), and temporal alignment. An improved Kuhn-Munkres algorithm is then proposed, extending the traditional bipartite graph matching model to support weighted multimodal matching. The algorithm dynamically adjusts weight coefficients based on the importance scores of each modality, while also incorporating a cross-modal correlation matrix as a constraint to improve the robustness of the matching process. The enhanced algorithm’s performance is validated through information matching efficiency tests and user interaction satisfaction surveys. Experimental results show that it improves multimodal information matching accuracy by 28.2% over the baseline method. Integration efficiency increases by 18.7%, and computational complexity is significantly reduced, with average computation time decreased by 15.4%. User satisfaction also improves, with a 19.5% increase in experience ratings. Ablation studies further confirm the critical contribution of both the dynamic weighting mechanism and the correlation matrix constraint to the overall performance. This study introduces a novel optimization strategy for multimodal information integration, offering substantial theoretical value and broad applicability in intelligent interaction design and human-computer collaboration. These advancements contribute meaningfully to the development of next-generation multimodal interaction systems.</p></div>
eu_rights_str_mv openAccess
id Manara_ca76b02eba6439bdf8a6c9d685e2e95f
identifier_str_mv 10.1371/journal.pone.0326662.g003
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/30676947
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Adjustment step size.Qingnan Ji (22662198)Jinxia Wang (355966)Lixian Wang (465239)BiophysicsCancerScience PolicySpace ScienceBiological Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedtext &# 8212suboptimal user experiencesnovel optimization strategymultimodal interaction scenariosmodal correlation matrixkeypoint detection ),intelligent interaction designiemocap )&# 8212experimental results showdynamic weighting mechanismadvancements contribute meaningfullymultimodal information fusiontextual data relevantstructured matching modelsmultimodal information integrationintegration efficiency increasescorrelation matrix constraintlow computational efficiencyintegration processintegrating informationmatching processdata volumecomputational complexityxlink ">temporal alignmentstudy introducesstudy aimssignificantly reducedsignificant challengeoptimized kuhnoften leadingmunkres algorithmmodalities differimproved kuhnimportance scoresfeature extractionexperience ratingscritical contributioncomputer collaborationbroad applicabilitybaseline methodalso incorporatingablation studies7 %,4 %.<div><p>In modern multimodal interaction design, integrating information from diverse modalities—such as speech, vision, and text—presents a significant challenge. These modalities differ in structure, timing, and data volume, often leading to mismatches, low computational efficiency, and suboptimal user experiences during the integration process. This study aims to enhance both the efficiency and accuracy of multimodal information fusion. To achieve this, publicly available datasets—Carnegie Mellon University Multimodal Opinion Sentiment Intensity (CMU-MOSI) and Interactive Emotional Dyadic Motion Capture (IEMOCAP)—are employed to collect speech, visual, and textual data relevant to multimodal interaction scenarios. The data undergo preprocessing steps including noise reduction, feature extraction (e.g., Mel Frequency Cepstral Coefficients and keypoint detection), and temporal alignment. An improved Kuhn-Munkres algorithm is then proposed, extending the traditional bipartite graph matching model to support weighted multimodal matching. The algorithm dynamically adjusts weight coefficients based on the importance scores of each modality, while also incorporating a cross-modal correlation matrix as a constraint to improve the robustness of the matching process. The enhanced algorithm’s performance is validated through information matching efficiency tests and user interaction satisfaction surveys. Experimental results show that it improves multimodal information matching accuracy by 28.2% over the baseline method. Integration efficiency increases by 18.7%, and computational complexity is significantly reduced, with average computation time decreased by 15.4%. User satisfaction also improves, with a 19.5% increase in experience ratings. Ablation studies further confirm the critical contribution of both the dynamic weighting mechanism and the correlation matrix constraint to the overall performance. This study introduces a novel optimization strategy for multimodal information integration, offering substantial theoretical value and broad applicability in intelligent interaction design and human-computer collaboration. These advancements contribute meaningfully to the development of next-generation multimodal interaction systems.</p></div>2025-11-21T18:26:37ZImageFigureinfo:eu-repo/semantics/publishedVersionimage10.1371/journal.pone.0326662.g003https://figshare.com/articles/figure/Adjustment_step_size_/30676947CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/306769472025-11-21T18:26:37Z
spellingShingle Adjustment step size.
Qingnan Ji (22662198)
Biophysics
Cancer
Science Policy
Space Science
Biological Sciences not elsewhere classified
Information Systems not elsewhere classified
text &# 8212
suboptimal user experiences
novel optimization strategy
multimodal interaction scenarios
modal correlation matrix
keypoint detection ),
intelligent interaction design
iemocap )&# 8212
experimental results show
dynamic weighting mechanism
advancements contribute meaningfully
multimodal information fusion
textual data relevant
structured matching models
multimodal information integration
integration efficiency increases
correlation matrix constraint
low computational efficiency
integration process
integrating information
matching process
data volume
computational complexity
xlink ">
temporal alignment
study introduces
study aims
significantly reduced
significant challenge
optimized kuhn
often leading
munkres algorithm
modalities differ
improved kuhn
importance scores
feature extraction
experience ratings
critical contribution
computer collaboration
broad applicability
baseline method
also incorporating
ablation studies
7 %,
4 %.
status_str publishedVersion
title Adjustment step size.
title_full Adjustment step size.
title_fullStr Adjustment step size.
title_full_unstemmed Adjustment step size.
title_short Adjustment step size.
title_sort Adjustment step size.
topic Biophysics
Cancer
Science Policy
Space Science
Biological Sciences not elsewhere classified
Information Systems not elsewhere classified
text &# 8212
suboptimal user experiences
novel optimization strategy
multimodal interaction scenarios
modal correlation matrix
keypoint detection ),
intelligent interaction design
iemocap )&# 8212
experimental results show
dynamic weighting mechanism
advancements contribute meaningfully
multimodal information fusion
textual data relevant
structured matching models
multimodal information integration
integration efficiency increases
correlation matrix constraint
low computational efficiency
integration process
integrating information
matching process
data volume
computational complexity
xlink ">
temporal alignment
study introduces
study aims
significantly reduced
significant challenge
optimized kuhn
often leading
munkres algorithm
modalities differ
improved kuhn
importance scores
feature extraction
experience ratings
critical contribution
computer collaboration
broad applicability
baseline method
also incorporating
ablation studies
7 %,
4 %.