SlowBA: An efficiency backdoor attack towards VLM-based GUI agents
AI 摘要
提出SlowBA攻击,通过触发长推理链操纵VLM GUI智能体的响应延迟,同时保持任务准确性。
主要贡献
- 提出SlowBA攻击,针对VLM GUI智能体的响应效率。
- 使用奖励级别后门注入策略(RBI)来操纵响应延迟。
- 设计逼真的弹窗作为触发器,提高攻击隐蔽性。
方法论
通过两阶段RBI策略,首先对齐长响应格式,然后通过强化学习学习触发感知的激活,以诱导过长的推理链。
原文摘要
Modern vision-language-model (VLM) based graphical user interface (GUI) agents are expected not only to execute actions accurately but also to respond to user instructions with low latency. While existing research on GUI-agent security mainly focuses on manipulating action correctness, the security risks related to response efficiency remain largely unexplored. In this paper, we introduce SlowBA, a novel backdoor attack that targets the responsiveness of VLM-based GUI agents. The key idea is to manipulate response latency by inducing excessively long reasoning chains under specific trigger patterns. To achieve this, we propose a two-stage reward-level backdoor injection (RBI) strategy that first aligns the long-response format and then learns trigger-aware activation through reinforcement learning. In addition, we design realistic pop-up windows as triggers that naturally appear in GUI environments, improving the stealthiness of the attack. Extensive experiments across multiple datasets and baselines demonstrate that SlowBA can significantly increase response length and latency while largely preserving task accuracy. The attack remains effective even with a small poisoning ratio and under several defense settings. These findings reveal a previously overlooked security vulnerability in GUI agents and highlight the need for defenses that consider both action correctness and response efficiency. Code can be found in https://github.com/tu-tuing/SlowBA.