Multimodal Learning 相关度: 9/10

Domain-Invariant Prompt Learning for Vision-Language Models

Arsham Gholamzadeh Khoee, Yinan Yu, Robert Feldt
arXiv: 2603.28555v1 发布: 2026-03-30 更新: 2026-03-30

AI 摘要

DiCoOp通过对抗训练扩展CoOp,学习领域不变的视觉语言模型Prompt,提升领域泛化能力。

主要贡献

  • 提出Domain-invariant Context Optimization (DiCoOp)
  • 使用对抗训练学习领域不变的prompt
  • 在领域泛化任务上超越CoOp

方法论

DiCoOp利用对抗训练,迫使模型学习领域不变的prompt,同时保持分类的判别能力,提升模型在未见领域上的表现。

原文摘要

Large pre-trained vision-language models like CLIP have transformed computer vision by aligning images and text in a shared feature space, enabling robust zero-shot transfer via prompting. Soft-prompting, such as Context Optimization (CoOp), effectively adapts these models for downstream recognition tasks by learning a set of context vectors. However, CoOp lacks explicit mechanisms for handling domain shifts across unseen distributions. To address this, we propose Domain-invariant Context Optimization (DiCoOp), an extension of CoOp optimized for domain generalization. By employing an adversarial training approach, DiCoOp forces the model to learn domain-invariant prompts while preserving discriminative power for classification. Experimental results show that DiCoOp consistently surpasses CoOp in domain generalization tasks across diverse visual domains.

标签

领域泛化 视觉语言模型 Prompt学习 对抗训练

arXiv 分类

cs.CV cs.AI