Multimodal Learning 相关度: 9/10

VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense

Nadav Kadvil, Ayellet Tal

arXiv: 2602.19570v1 发布: 2026-02-23 更新: 2026-02-23

下载 PDF arXiv 页面

AI 摘要

提出一种高效的LVLM对抗攻击检测防御方法，结合图像变换和数据整合，无需训练。

主要贡献

提出一种多阶段的对抗攻击检测机制
结合图像变换和Agent数据整合来恢复模型正确行为
在保证精度的情况下提高了效率

方法论

通过内容保持变换评估图像一致性，检查文本嵌入空间差异，最后用LLM整合多重响应。

原文摘要

Large Vision-Language Models (LVLMs) can be vulnerable to adversarial images that subtly bias their outputs toward plausible yet incorrect responses. We introduce a general, efficient, and training-free defense that combines image transformations with agentic data consolidation to recover correct model behavior. A key component of our approach is a two-stage detection mechanism that quickly filters out the majority of clean inputs. We first assess image consistency under content-preserving transformations at negligible computational cost. For more challenging cases, we examine discrepancies in a text-embedding space. Only when necessary do we invoke a powerful LLM to resolve attack-induced divergences. A key idea is to consolidate multiple responses, leveraging both their similarities and their differences. We show that our method achieves state-of-the-art accuracy while maintaining notable efficiency: most clean images skip costly processing, and even in the presence of numerous adversarial examples, the overhead remains minimal.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类