Structure of the feature aggregation module.

<div><p>Addressing the limitations in current visual question answering (VQA) models face limitations in multimodal feature fusion capabilities and often lack adequate consideration of local information, this study proposes a multimodal Transformer VQA network based on local and global i...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Cuiyang Huang (21647898) (author)
مؤلفون آخرون:	Zihan Hu (15363084) (author)
منشور في:	2025
الموضوعات:	Biotechnology Science Policy Space Science Biological Sciences not elsewhere classified Information Systems not elsewhere classified xlink "> addressing thereby enhancing visual reducing linguistic noise language feature fusion experimental results demonstrate integrates multimodal knowledge lgmtnet employs attention lgmtnet effectively focuses essential question terms deepen question comprehension extracting multimodal features multimodal representation module models face limitations global information integration local image features local features within question features global features multimodal transformer local information lgmtnet ). decoder module study proposes question context deep encoder
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

Structure of the feature aggregation module.

مواد مشابهة