Multimodal Learning 相关度: 9/10

TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Exploration

Chunxiao Li, Lijun Li, Jing Shao
arXiv: 2603.22882v1 发布: 2026-03-24 更新: 2026-03-24

AI 摘要

TreeTeaming通过层级策略探索,实现对视觉-语言模型(VLM)的自主红队测试。

主要贡献

  • 提出TreeTeaming自动化红队测试框架
  • 利用LLM进行动态、演进式策略探索
  • 在多个VLM上取得SOTA攻击成功率

方法论

使用LLM作为策略编排器,动态构建和扩展策略树,并用多模态执行器执行复杂策略。

原文摘要

The rapid advancement of Vision-Language Models (VLMs) has brought their safety vulnerabilities into sharp focus. However, existing red teaming methods are fundamentally constrained by an inherent linear exploration paradigm, confining them to optimizing within a predefined strategy set and preventing the discovery of novel, diverse exploits. To transcend this limitation, we introduce TreeTeaming, an automated red teaming framework that reframes strategy exploration from static testing to a dynamic, evolutionary discovery process. At its core lies a strategic Orchestrator, powered by a Large Language Model (LLM), which autonomously decides whether to evolve promising attack paths or explore diverse strategic branches, thereby dynamically constructing and expanding a strategy tree. A multimodal actuator is then tasked with executing these complex strategies. In the experiments across 12 prominent VLMs, TreeTeaming achieves state-of-the-art attack success rates on 11 models, outperforming existing methods and reaching up to 87.60\% on GPT-4o. The framework also demonstrates superior strategic diversity over the union of previously public jailbreak strategies. Furthermore, the generated attacks exhibit an average toxicity reduction of 23.09\%, showcasing their stealth and subtlety. Our work introduces a new paradigm for automated vulnerability discovery, underscoring the necessity of proactive exploration beyond static heuristics to secure frontier AI models.

标签

红队测试 视觉-语言模型 LLM 安全

arXiv 分类

cs.LG cs.CV