Context Compression via Explicit Information Transmission
AI 摘要
ComprExIT通过显式信息传输实现高效LLM上下文压缩,解决了传统自注意力压缩的局限性。
主要贡献
- 提出ComprExIT框架,解耦压缩和LLM内部自注意力。
- 引入深度和宽度方向的信息传输机制。
- 实验证明ComprExIT优于现有上下文压缩方法,参数量更少。
方法论
ComprExIT利用冻结LLM隐层状态,通过深度信息传输选择性传递信息,再通过宽度信息传输优化信息分配到少量token槽中。
原文摘要
Long-context inference with Large Language Models (LLMs) is costly due to quadratic attention and growing key-value caches, motivating context compression. In this work, we study soft context compression, where a long context is condensed into a small set of continuous representations. Existing methods typically re-purpose the LLM itself as a trainable compressor, relying on layer-by-layer self-attention to iteratively aggregate information. We argue that this paradigm suffers from two structural limitations: (i) progressive representation overwriting across layers (ii) uncoordinated allocation of compression capacity across tokens. We propose ComprExIT (Context Compression via Explicit Information Transmission), a lightweight framework that formulates soft compression into a new paradigm: explicit information transmission over frozen LLM hidden states. This decouples compression from the model's internal self-attention dynamics. ComprExIT performs (i) depth-wise transmission to selectively transmit multi-layer information into token anchors, mitigating progressive overwriting, and (ii) width-wise transmission to aggregate anchors into a small number of slots via a globally optimized transmission plan, ensuring coordinated allocation of information. Across six question-answering benchmarks, ComprExIT consistently outperforms state-of-the-art context compression methods while introducing only ~1% additional parameters, demonstrating that explicit and coordinated information transmission enables more effective and robust long-context compression.