LLM Reasoning 相关度: 8/10

Large Language Model as Token Compressor and Decompressor

Wenbing Li, Zikai Song, Jielei Zhang, Tianhao Zhao, Junkai Lin, Yiran Wang, Wei Yang
arXiv: 2603.25340v1 发布: 2026-03-26 更新: 2026-03-26

AI 摘要

提出了一种利用LLM作为token压缩和解压缩器的新方法,实现显著的token数量缩减和高效的长文本处理。

主要贡献

  • 提出了基于LLM的自编码token压缩框架
  • 实现了高达18倍的token缩减,同时保持了重构保真度
  • 支持在压缩后的Z-token空间进行推理和生成

方法论

使用自表达自编码学习框架,通过微调预训练LLM,将长文本转换为离散、变长的Z-token,并从Z-token重建原始文本。

原文摘要

In this paper, we establish the novel insight that an off-the-shelf LLM can function as an excellent token compressor and decompressor. To demonstrate, we design a self-expressive autoencoding learning framework fine-tunes a pretrained LLM to translate long texts into a compact internal language of discrete, variable-length latent codes, termed Z-tokens, and to reconstruct the original text exactly from them. The resulting representation is content-adaptive: semantically dense segments receive more Z-tokens, while redundant or predictable regions are aggressively compressed, via lightweight LoRA-based adapter heads. Empirically, our method achieves up to 18 times token reduction on Wikipedia, CNN/DailyMail, HotpotQA, and Qulac-style long-query datasets, while preserving reconstruction fidelity and downstream performance. This simple yet effective design supports applications including prompt compression and autoregressive generation directly in the Z-token space, offering a potential pathway toward token-efficient long-context reasoning.

标签

LLM Token Compression Autoencoding Long Context

arXiv 分类

cs.CL