Agent Tuning & Optimization 相关度: 9/10

TVCACHE: A Stateful Tool-Value Cache for Post-Training LLM Agents

Abhishek Vijaya Kumar, Bhaskar Kataria, Byungsoo Oh, Emaad Manzoor, Rachee Singh
arXiv: 2602.10986v1 发布: 2026-02-11 更新: 2026-02-11

AI 摘要

TVCACHE通过状态感知的缓存技术加速LLM智能体的工具调用,显著提升训练效率。

主要贡献

  • 提出了TVCACHE,一种状态感知的工具值缓存。
  • 采用最长前缀匹配算法保证缓存命中的环境状态一致性。
  • 在多个任务上验证了TVCACHE的有效性,实现了显著的加速。

方法论

TVCACHE维护工具调用序列树,通过最长前缀匹配确定缓存命中,避免状态不一致导致的错误。

原文摘要

In RL post-training of LLM agents, calls to external tools take several seconds or even minutes, leaving allocated GPUs idle and inflating post-training time and cost. While many tool invocations repeat across parallel rollouts and could in principle be cached, naively caching their outputs for reuse is incorrect since tool outputs depend on the environment state induced by prior agent interactions. We present TVCACHE, a stateful tool-value cache for LLM agent post-training. TVCACHE maintains a tree of observed tool-call sequences and performs longest-prefix matching for cache lookups: a hit occurs only when the agent's full tool history matches a previously executed sequence, guaranteeing identical environment state. On three diverse workloads-terminal-based tasks, SQL generation, and video understanding. TVCACHE achieves cache hit rates of up to 70% and reduces median tool call execution time by up to 6.9X, with no degradation in post-training reward accumulation.

标签

LLM Agent Caching Tool Use RL

arXiv 分类

cs.LG