Multimodal Learning 相关度: 9/10

AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

Yubo Cui, Xianchao Guan, Zijun Xiong, Zheng Zhang
arXiv: 2603.29410v1 发布: 2026-03-31 更新: 2026-03-31

AI 摘要

AGFT通过对齐视觉特征和文本嵌入,提升视觉-语言模型在零样本对抗攻击下的鲁棒性。

主要贡献

  • 提出了对齐引导的微调框架(AGFT)
  • 利用软对齐分布进行文本引导的对抗训练
  • 引入分布一致性校准机制

方法论

利用原始模型的概率预测,将对抗视觉特征与文本嵌入对齐,并通过分布一致性校准机制解决微调引入的结构性差异。

原文摘要

Pre-trained vision-language models (VLMs) exhibit strong zero-shot generalization but remain vulnerable to adversarial perturbations. Existing classification-guided adversarial fine-tuning methods often disrupt pre-trained cross-modal alignment, weakening visual-textual correspondence and degrading zero-shot performance. In this paper, we propose an Alignment-Guided Fine-Tuning (AGFT) framework that enhances zero-shot adversarial robustness while preserving the cross-modal semantic structure. Unlike label-based methods that rely on hard labels and fail to maintain the relative relationships between image and text, AGFT leverages the probabilistic predictions of the original model for text-guided adversarial training, which aligns adversarial visual features with textual embeddings via soft alignment distributions, improving zero-shot adversarial robustness. To address structural discrepancies introduced by fine-tuning, we introduce a distribution consistency calibration mechanism that adjusts the robust model output to match a temperature-scaled version of the pre-trained model predictions. Extensive experiments across multiple zero-shot benchmarks demonstrate that AGFT outperforms state-of-the-art methods while significantly improving zero-shot adversarial robustness.

标签

对抗鲁棒性 视觉-语言模型 零样本学习

arXiv 分类

cs.CV cs.AI cs.LG