LLM Memory & RAG 相关度: 9/10

Context Compression via Explicit Information Transmission

Jiangnan Ye, Hanqi Yan, Zhenyi Shen, Heng Chang, Ye Mao, Yulan He
arXiv: 2602.03784v1 发布: 2026-02-03 更新: 2026-02-03

AI 摘要

ComprExIT通过显式信息传输实现高效LLM上下文压缩,解决了传统自注意力压缩的局限性。

主要贡献

  • 提出ComprExIT框架,解耦压缩和LLM内部自注意力。
  • 引入深度和宽度方向的信息传输机制。
  • 实验证明ComprExIT优于现有上下文压缩方法,参数量更少。

方法论

ComprExIT利用冻结LLM隐层状态,通过深度信息传输选择性传递信息,再通过宽度信息传输优化信息分配到少量token槽中。

原文摘要

Long-context inference with Large Language Models (LLMs) is costly due to quadratic attention and growing key-value caches, motivating context compression. In this work, we study soft context compression, where a long context is condensed into a small set of continuous representations. Existing methods typically re-purpose the LLM itself as a trainable compressor, relying on layer-by-layer self-attention to iteratively aggregate information. We argue that this paradigm suffers from two structural limitations: (i) progressive representation overwriting across layers (ii) uncoordinated allocation of compression capacity across tokens. We propose ComprExIT (Context Compression via Explicit Information Transmission), a lightweight framework that formulates soft compression into a new paradigm: explicit information transmission over frozen LLM hidden states. This decouples compression from the model's internal self-attention dynamics. ComprExIT performs (i) depth-wise transmission to selectively transmit multi-layer information into token anchors, mitigating progressive overwriting, and (ii) width-wise transmission to aggregate anchors into a small number of slots via a globally optimized transmission plan, ensuring coordinated allocation of information. Across six question-answering benchmarks, ComprExIT consistently outperforms state-of-the-art context compression methods while introducing only ~1% additional parameters, demonstrating that explicit and coordinated information transmission enables more effective and robust long-context compression.

标签

LLM 上下文压缩 信息传输 长文本 效率

arXiv 分类

cs.CL