On Protecting Agentic Systems' Intellectual Property via Watermarking
AI 摘要
提出AGENTWM框架,通过在Agent动作序列中嵌入水印,保护Agentic系统知识产权。
主要贡献
- 设计了首个针对Agentic模型的水印框架AGENTWM
- 利用动作序列的语义等价性,通过微调工具执行路径注入水印
- 开发了自动生成鲁棒水印方案的pipeline以及验证的统计假设检验流程
方法论
通过在功能等价的动作序列中选择特定路径来嵌入水印,并设计验证流程检测水印。
原文摘要
The evolution of Large Language Models (LLMs) into agentic systems that perform autonomous reasoning and tool use has created significant intellectual property (IP) value. We demonstrate that these systems are highly vulnerable to imitation attacks, where adversaries steal proprietary capabilities by training imitation models on victim outputs. Crucially, existing LLM watermarking techniques fail in this domain because real-world agentic systems often operate as grey boxes, concealing the internal reasoning traces required for verification. This paper presents AGENTWM, the first watermarking framework designed specifically for agentic models. AGENTWM exploits the semantic equivalence of action sequences, injecting watermarks by subtly biasing the distribution of functionally identical tool execution paths. This mechanism allows AGENTWM to embed verifiable signals directly into the visible action trajectory while remaining indistinguishable to users. We develop an automated pipeline to generate robust watermark schemes and a rigorous statistical hypothesis testing procedure for verification. Extensive evaluations across three complex domains demonstrate that AGENTWM achieves high detection accuracy with negligible impact on agent performance. Our results confirm that AGENTWM effectively protects agentic IP against adaptive adversaries, who cannot remove the watermarks without severely degrading the stolen model's utility.