The overview framework of the coarse-to-fine alignment network.
<p>The CFAN consists of a video alignment module to learn the common visual-semantic space, a cross-modal interaction module to explore the fine-grained alignment among frames, proposals and video. Also, the multi-level coarse-to-fine alignment information flows between modules to make full us...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , , |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|