The overview framework of the coarse-to-fine alignment network.

<p>The CFAN consists of a video alignment module to learn the common visual-semantic space, a cross-modal interaction module to explore the fine-grained alignment among frames, proposals and video. Also, the multi-level coarse-to-fine alignment information flows between modules to make full us...

Full description

Saved in:
Bibliographic Details
Main Author: Lingwen Meng (8968106) (author)
Other Authors: Fangyuan Liu (1438045) (author), Mingyong Xin (15185747) (author), Siqi Guo (355869) (author), Fu Zou (21370430) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!