Multimodal Learning 相关度: 9/10

Evolving Prompt Adaptation for Vision-Language Models

Enming Zhang, Jiayang Li, Yanru Wu, Zhenyu Liu, Yang Li

arXiv: 2603.09493v1 发布: 2026-03-10 更新: 2026-03-10

下载 PDF arXiv 页面

AI 摘要

EvoPrompt通过控制prompt的进化路径，实现VLMs在小样本学习中的稳定和知识保留。

主要贡献

提出了EvoPrompt框架，用于稳定且知识保留的VLM微调。
引入了Modality-Shared Prompt Projector (MPP)生成分层prompt。
设计了进化训练策略，解耦prompt更新的方向和幅度。
提出了Feature Geometric Regularization (FGR)防止特征坍塌。

方法论

EvoPrompt利用MPP生成分层prompt，通过进化训练和FGR稳定prompt的进化，从而在微调时保留预训练知识。

原文摘要

The adaptation of large-scale vision-language models (VLMs) to downstream tasks with limited labeled data remains a significant challenge. While parameter-efficient prompt learning methods offer a promising path, they often suffer from catastrophic forgetting of pre-trained knowledge. Toward addressing this limitation, our work is grounded in the insight that governing the evolutionary path of prompts is essential for forgetting-free adaptation. To this end, we propose EvoPrompt, a novel framework designed to explicitly steer the prompt trajectory for stable, knowledge-preserving fine-tuning. Specifically, our approach employs a Modality-Shared Prompt Projector (MPP) to generate hierarchical prompts from a unified embedding space. Critically, an evolutionary training strategy decouples low-rank updates into directional and magnitude components, preserving early-learned semantic directions while only adapting their magnitude, thus enabling prompts to evolve without discarding foundational knowledge. This process is further stabilized by Feature Geometric Regularization (FGR), which enforces feature decorrelation to prevent representation collapse. Extensive experiments demonstrate that EvoPrompt achieves state-of-the-art performance in few-shot learning while robustly preserving the original zero-shot capabilities of pre-trained VLMs.

arXiv 分类

cs.CV cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类