AI Agents 相关度: 9/10

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

Yujie Zhao, Boqin Yuan, Junbo Huang, Haocheng Yuan, Zhongming Yu, Haozhou Xu, Lanxiang Hu, Abhilash Shankarampeta, Zimeng Huang, Wentao Ni, Yuandong Tian, Jishen Zhao

arXiv: 2602.22769v1 发布: 2026-02-26 更新: 2026-02-26

下载 PDF arXiv 页面

AI 摘要

提出了AMA-Bench用于评估LLM智能体长时记忆，发现现有记忆系统不足，并提出了改进的AMA-Agent。

主要贡献

提出了AMA-Bench基准，用于评估智能体长时记忆能力。
分析了现有记忆系统在真实智能体应用中的不足。
提出了AMA-Agent，一种基于因果图和工具增强检索的记忆系统。

方法论

构建真实和合成智能体轨迹数据集，配以人工和规则生成的QA，评估不同记忆系统的性能，并设计新的记忆架构。

原文摘要

Large Language Models (LLMs) are deployed as autonomous agents in increasingly complex applications, where enabling long-horizon memory is critical for achieving strong performance. However, a significant gap exists between practical applications and current evaluation standards for agent memory: existing benchmarks primarily focus on dialogue-centric, human-agent interactions. In reality, agent memory consists of a continuous stream of agent-environment interactions that are primarily composed of machine-generated representations. To bridge this gap, we introduce AMA-Bench (Agent Memory with Any length), which evaluates long-horizon memory for LLMs in real agentic applications. It features two key components: (1) a set of real-world agentic trajectories across representative agentic applications, paired with expert-curated QA, and (2) a set of synthetic agentic trajectories that scale to arbitrary horizons, paired with rule-based QA. Our comprehensive study shows that existing memory systems underperform on AMA-Bench primarily because they lack causality and objective information and are constrained by the lossy nature of similarity-based retrieval employed by many memory systems. To address these limitations, we propose AMA-Agent, an effective memory system featuring a causality graph and tool-augmented retrieval. Our results demonstrate that AMA-Agent achieves 57.22% average accuracy on AMA-Bench, surpassing the strongest memory system baselines by 11.16%.

arXiv 分类

cs.AI cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类