Multimodal Learning 相关度: 9/10

Explicit Uncertainty Modeling for Active CLIP Adaptation with Dual Prompt Tuning

Qian-Wei Wang, Yaguang Song, Shu-Tao Xia

arXiv: 2602.04340v1 发布: 2026-02-04 更新: 2026-02-04

下载 PDF arXiv 页面

AI 摘要

提出基于双Prompt调整的主动CLIP适应框架，显式建模不确定性以优化样本选择。

主要贡献

提出双Prompt调整方法，包括正向和负向Prompt
显式建模预测标签的置信度，用于不确定性估计
在主动学习场景下，优于现有方法

方法论

在CLIP的文本分支中引入可学习的正负Prompt，正向prompt增强判别性，负向prompt建模预测概率。

原文摘要

Pre-trained vision-language models such as CLIP exhibit strong transferability, yet adapting them to downstream image classification tasks under limited annotation budgets remains challenging. In active learning settings, the model must select the most informative samples for annotation from a large pool of unlabeled data. Existing approaches typically estimate uncertainty via entropy-based criteria or representation clustering, without explicitly modeling uncertainty from the model perspective. In this work, we propose a robust uncertainty modeling framework for active CLIP adaptation based on dual-prompt tuning. We introduce two learnable prompts in the textual branch of CLIP. The positive prompt enhances the discriminability of task-specific textual embeddings corresponding to light-weight tuned visual embeddings, improving classification reliability. Meanwhile, the negative prompt is trained in an reversed manner to explicitly model the probability that the predicted label is correct, providing a principled uncertainty signal for guiding active sample selection. Extensive experiments across different fine-tuning paradigms demonstrate that our method consistently outperforms existing active learning methods under the same annotation budget.

arXiv 分类

cs.CV cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类