LLM Memory & RAG 相关度: 8/10

Towards Anytime-Valid Statistical Watermarking

Baihe Huang, Eric Xu, Kannan Ramchandran, Jiantao Jiao, Michael I. Jordan
arXiv: 2602.17608v1 发布: 2026-02-19 更新: 2026-02-19

AI 摘要

提出了基于e-value的水印框架,实现了LLM生成内容的高效、可随时停止的统计水印检测。

主要贡献

  • 提出了基于e-value的Anchor E-Watermarking框架
  • 实现了最优采样与随时有效的推断的统一
  • 通过锚定分布逼近目标模型,推导出最优e-value和预期停止时间

方法论

构建检测过程的test supermartingale,利用锚定分布逼近目标模型,优化e-value和停止时间,并进行实验验证。

原文摘要

The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promising solution, existing methods suffer from two critical limitations: the lack of a principled approach for selecting sampling distributions and the reliance on fixed-horizon hypothesis testing, which precludes valid early stopping. In this paper, we bridge this gap by developing the first e-value-based watermarking framework, Anchored E-Watermarking, that unifies optimal sampling with anytime-valid inference. Unlike traditional approaches where optional stopping invalidates Type-I error guarantees, our framework enables valid, anytime-inference by constructing a test supermartingale for the detection process. By leveraging an anchor distribution to approximate the target model, we characterize the optimal e-value with respect to the worst-case log-growth rate and derive the optimal expected stopping time. Our theoretical claims are substantiated by simulations and evaluations on established benchmarks, showing that our framework can significantly enhance sample efficiency, reducing the average token budget required for detection by 13-15% relative to state-of-the-art baselines.

标签

统计水印 大语言模型 可随时停止推断 e-value

arXiv 分类

cs.LG cs.AI stat.ML