Code and data
<p dir="ltr">State-of-the-art (SOTA) Automatic Speech Recognition (ASR) systems primarily rely on acoustic information while disregarding additional multi-modal context. However, visual information are essential in disambiguation and adaptation. </p><p dir="ltr">...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|