AI Agents 相关度: 9/10

Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents

Yifei Li, Weidong Guo, Lingling Zhang, Rongman Xu, Muye Huang, Hui Liu, Lijiao Xu, Yu Xu, Jun Liu
arXiv: 2602.10715v1 发布: 2026-02-11 更新: 2026-02-11

AI 摘要

LoCoMo-Plus提出一个评估LLM智能体认知记忆的新基准,关注长程对话中隐性约束的应用。

主要贡献

  • 提出了LoCoMo-Plus基准,用于评估LLM在语义不连贯的提示下的认知记忆能力。
  • 指出传统评价指标和显式任务提示不适用于评估认知记忆。
  • 提出了基于约束一致性的统一评估框架。

方法论

设计新的对话场景,要求模型记住并应用对话中隐式的约束条件,并通过约束一致性来评估模型的表现。

原文摘要

Long-term conversational memory is a core capability for LLM-based dialogue systems, yet existing benchmarks and evaluation protocols primarily focus on surface-level factual recall. In realistic interactions, appropriate responses often depend on implicit constraints such as user state, goals, or values that are not explicitly queried later. To evaluate this setting, we introduce \textbf{LoCoMo-Plus}, a benchmark for assessing cognitive memory under cue--trigger semantic disconnect, where models must retain and apply latent constraints across long conversational contexts. We further show that conventional string-matching metrics and explicit task-type prompting are misaligned with such scenarios, and propose a unified evaluation framework based on constraint consistency. Experiments across diverse backbone models, retrieval-based methods, and memory systems demonstrate that cognitive memory remains challenging and reveals failures not captured by existing benchmarks. Our code and evaluation framework are publicly available at: https://github.com/xjtuleeyf/Locomo-Plus.

标签

LLM Cognitive Memory Evaluation Framework

arXiv 分类

cs.CL cs.AI