LLM Memory & RAG 相关度: 9/10

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

Shuochen Liu, Junyi Zhu, Long Shu, Junda Lin, Yuhao Chen, Haotian Zhang, Chao Zhang, Derong Xu, Jia Li, Bo Tang, Zhiyu Li, Feiyu Xiong, Enhong Chen, Tong Xu
arXiv: 2603.23231v1 发布: 2026-03-24 更新: 2026-03-24

AI 摘要

PERMA基准测试个性化记忆代理,关注事件驱动的偏好演变和真实任务环境。

主要贡献

  • 提出了PERMA基准,评估个性化记忆代理的长期一致性。
  • 设计了时间相关的交互事件,模拟真实用户偏好的演变。
  • 实验表明现有记忆系统在时间深度和跨领域干扰方面存在不足。

方法论

通过构建多轮对话数据集,包含时间序列的交互事件,并设计多项选择和交互式任务来评估模型对用户画像的理解。

原文摘要

Empowering large language models with long-term memory is crucial for building agents that adapt to users' evolving needs. However, prior evaluations typically interleave preference-related dialogues with irrelevant conversations, reducing the task to needle-in-a-haystack retrieval while ignoring relationships between events that drive the evolution of user preferences. Such settings overlook a fundamental characteristic of real-world personalization: preferences emerge gradually and accumulate across interactions within noisy contexts. To bridge this gap, we introduce PERMA, a benchmark designed to evaluate persona consistency over time beyond static preference recall. Additionally, we incorporate (1) text variability and (2) linguistic alignment to simulate erratic user inputs and individual idiolects in real-world data. PERMA consists of temporally ordered interaction events spanning multiple sessions and domains, with preference-related queries inserted over time. We design both multiple-choice and interactive tasks to probe the model's understanding of persona along the interaction timeline. Experiments demonstrate that by linking related interactions, advanced memory systems can extract more precise preferences and reduce token consumption, outperforming traditional semantic retrieval of raw dialogues. Nevertheless, they still struggle to maintain a coherent persona across temporal depth and cross-domain interference, highlighting the need for more robust personalized memory management in agents. Our code and data are open-sourced at https://github.com/PolarisLiu1/PERMA.

标签

memory agent benchmark personalization

arXiv 分类

cs.AI