LLM Reasoning 相关度: 8/10

$k$NNProxy: Efficient Training-Free Proxy Alignment for Black-Box Zero-Shot LLM-Generated Text Detection

Kahim Wong, Kemou Li, Haiwei Wu, Jiantao Zhou
arXiv: 2604.02008v1 发布: 2026-04-02 更新: 2026-04-02

AI 摘要

提出一种无需训练的零样本LLM生成文本检测方法$k$NNProxy,通过$k$NN检索实现代理模型对齐。

主要贡献

  • 提出$k$NNProxy,一种无需训练和高效查询的代理对齐框架
  • 利用$k$NN-LM检索机制作为固定代理LLM的领域适配器
  • 提出MoP扩展,提高领域迁移下的鲁棒性

方法论

构建目标导向LGT语料库的数据存储,通过近邻检索获得的token级别预测分布与代理输出插值,实现代理对齐。

原文摘要

LLM-generated text (LGT) detection is essential for reliable forensic analysis and for mitigating LLM misuse. Existing LGT detectors can generally be categorized into two broad classes: learning-based approaches and zero-shot methods. Compared with learning-based detectors, zero-shot methods are particularly promising because they eliminate the need to train task-specific classifiers. However, the reliability of zero-shot methods fundamentally relies on the assumption that an off-the-shelf proxy LLM is well aligned with the often unknown source LLM, a premise that rarely holds in real-world black-box scenarios. To address this discrepancy, existing proxy alignment methods typically rely on supervised fine-tuning of the proxy or repeated interactions with commercial APIs, thereby increasing deployment costs, exposing detectors to silent API changes, and limiting robustness under domain shift. Motivated by these limitations, we propose the $k$-nearest neighbor proxy ($k$NNProxy), a training-free and query-efficient proxy alignment framework that repurposes the $k$NN language model ($k$NN-LM) retrieval mechanism as a domain adapter for a fixed proxy LLM. Specifically, a lightweight datastore is constructed once from a target-reflective LGT corpus, either via fixed-budget querying or from existing datasets. During inference, nearest-neighbor evidence induces a token-level predictive distribution that is interpolated with the proxy output, yielding an aligned prediction without proxy fine-tuning or per-token API outputs. To improve robustness under domain shift, we extend $k$NNProxy into a mixture of proxies (MoP) that routes each input to a domain-specific datastore for domain-consistent retrieval. Extensive experiments demonstrate strong detection performance of our method.

标签

LLM生成文本检测 零样本学习 代理对齐 kNN-LM

arXiv 分类

cs.CL