Bowen Chen: SSAT-Adapter: Enhancing Vision-Language Model Few-shot Learning with Auxiliary Tasks

<p dir="ltr">Traditional deep learning models often struggle in few-shot learning scenarios, where limited labeled data is available.</p><p dir="ltr">While the Contrastive Language-Image Pre-training (CLIP) model demonstrates impressive zero-shot capabilities, i...

Full description

Saved in:

Bibliographic Details
Main Author:	Bowen Chen (12156618) (author)
Other Authors:	Yun Sing Koh (1221624) (author), Gill Dobbie (1192893) (author)
Published:	2025
Subjects:	Computer vision Vision-Language Models Few-shot Learning Auxiliary Learning
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	<p dir="ltr">Traditional deep learning models often struggle in few-shot learning scenarios, where limited labeled data is available.</p><p dir="ltr">While the Contrastive Language-Image Pre-training (CLIP) model demonstrates impressive zero-shot capabilities, its performance in few-shot scenarios remains limited. </p><p dir="ltr">Existing methods primarily aim to leverage the limited labeled dataset, but this offers limited potential for improvement.</p><p dir="ltr">To overcome the limitations of small datasets in few-shot learning, we introduce a novel framework, SSAT-Adapter, that leverages CLIP's language understanding to generate informative auxiliary tasks and improve CLIP's performance and adaptability in few-shot settings.</p>

Bowen Chen: SSAT-Adapter: Enhancing Vision-Language Model Few-shot Learning with Auxiliary Tasks

Similar Items