Large Language Model as Token Compressor and Decompressor
AI 摘要
提出了一种利用LLM作为token压缩和解压缩器的新方法,实现显著的token数量缩减和高效的长文本处理。
主要贡献
- 提出了基于LLM的自编码token压缩框架
- 实现了高达18倍的token缩减,同时保持了重构保真度
- 支持在压缩后的Z-token空间进行推理和生成
方法论
使用自表达自编码学习框架,通过微调预训练LLM,将长文本转换为离散、变长的Z-token,并从Z-token重建原始文本。
原文摘要
In this paper, we establish the novel insight that an off-the-shelf LLM can function as an excellent token compressor and decompressor. To demonstrate, we design a self-expressive autoencoding learning framework fine-tunes a pretrained LLM to translate long texts into a compact internal language of discrete, variable-length latent codes, termed Z-tokens, and to reconstruct the original text exactly from them. The resulting representation is content-adaptive: semantically dense segments receive more Z-tokens, while redundant or predictable regions are aggressively compressed, via lightweight LoRA-based adapter heads. Empirically, our method achieves up to 18 times token reduction on Wikipedia, CNN/DailyMail, HotpotQA, and Qulac-style long-query datasets, while preserving reconstruction fidelity and downstream performance. This simple yet effective design supports applications including prompt compression and autoregressive generation directly in the Z-token space, offering a potential pathway toward token-efficient long-context reasoning.