LLM Memory & RAG 相关度: 9/10

Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation

Julia Belikova, Danila Rozhevskii, Dennis Svirin, Konstantin Polev, Alexander Panchenko
arXiv: 2602.12235v1 发布: 2026-02-12 更新: 2026-02-12

AI 摘要

论文研究了压缩表征在RAG中信息溢出的问题,并提出了检测方法,以提高长文本处理能力。

主要贡献

  • 定义了token overflow的概念
  • 提出了检测token overflow的方法论
  • 证明了query-aware检测器能有效缓解压缩带来的误差

方法论

论文提出了基于饱和度统计和轻量级探测分类器的两种检测方法,并验证了query信息的有效性。

原文摘要

Efficient long-context processing remains a crucial challenge for contemporary large language models (LLMs), especially in resource-constrained environments. Soft compression architectures promise to extend effective context length by replacing long token sequences with smaller sets of learned compressed tokens. Yet, the limits of compressibility -- and when compression begins to erase task-relevant content -- remain underexplored. In this paper, we define \emph{token overflow} as a regime in which compressed representations no longer contain sufficient information to answer a given query, and propose a methodology to characterize and detect it. In the xRAG soft-compression setting, we find that query-agnostic saturation statistics reliably separate compressed from uncompressed token representations, providing a practical tool for identifying compressed tokens but showing limited overflow detection capability. Lightweight probing classifiers over both query and context xRAG representations detect overflow with 0.72 AUC-ROC on average on HotpotQA, SQuADv2, and TriviaQA datasets, demonstrating that incorporating query information improves detection performance. These results advance from query-independent diagnostics to query-aware detectors, enabling low-cost pre-LLM gating to mitigate compression-induced errors.

标签

RAG 长文本处理 信息压缩 Overflow Detection

arXiv 分类

cs.CL