LLM Memory & RAG 相关度: 7/10

Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models

Zikang Ding, Haomiao Yang, Meng Hao, Wenbo Jiang, Kunlan Xiang, Runmeng Du, Yijing Liu, Ruichen Zhang, Dusit Niyato
arXiv: 2603.11949v1 发布: 2026-03-12 更新: 2026-03-12

AI 摘要

提出延迟后门攻击(DBA),利用时间维度作为新型攻击面,通过延迟触发实现隐蔽性。

主要贡献

  • 提出了延迟后门攻击(DBA)的概念,并证明了其可行性。
  • 设计并实现了DND原型,验证了DBA的有效性。
  • 提出了双指标评估框架(ASR和ASR_delay)来评估延迟效果。
  • 证明了DBA能够抵抗现有的防御机制。

方法论

通过嵌入轻量级的、有状态的逻辑模块延迟激活,达到可配置的延迟阈值,并利用NLP基准测试验证效果。

原文摘要

Backdoor attacks against pre-trained models (PTMs) have traditionally operated under an ``immediacy assumption,'' where malicious behavior manifests instantly upon trigger occurrence. This work revisits and challenges this paradigm by introducing \textit{\textbf{Delayed Backdoor Attacks (DBA)}}, a new class of threats in which activation is temporally decoupled from trigger exposure. We propose that this \textbf{temporal dimension} is the key to unlocking a previously infeasible class of attacks: those that use common, everyday words as triggers. To examine the feasibility of this paradigm, we design and implement a proof-of-concept prototype, termed \underline{D}elayed Backdoor Attacks Based on \underline{N}onlinear \underline{D}ecay (DND). DND embeds a lightweight, stateful logic module that postpones activation until a configurable threshold is reached, producing a distinct latency phase followed by a controlled outbreak. We derive a formal model to characterize this latency behavior and propose a dual-metric evaluation framework (ASR and ASR$_{delay}$) to empirically measure the delay effect. Extensive experiments on four (natural language processing)NLP benchmarks validate the core capabilities of DND: it remains dormant for a controllable duration, sustains high clean accuracy ($\ge$94\%), and achieves near-perfect post-activation attack success rates ($\approx$99\%, The average of other methods is below 95\%.). Moreover, DND exhibits resilience against several state-of-the-art defenses. This study provides the first empirical evidence that the temporal dimension constitutes a viable yet unprotected attack surface in PTMs, underscoring the need for next-generation, stateful, and time-aware defense mechanisms.

标签

Backdoor Attack Pre-trained Models Temporal Dimension Security

arXiv 分类

cs.CR cs.AI