LLM Memory & RAG 相关度: 8/10

BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs

Sangyeon Yoon, Sunkyoung Kim, Hyesoo Hong, Wonje Jeung, Yongil Kim, Wooseok Seo, Heuiyeen Yeen, Albert No

arXiv: 2603.16557v1 发布: 2026-03-17 更新: 2026-03-17

下载 PDF arXiv 页面

AI 摘要

BenchPreS评估LLMs在不同语境下对个性化偏好的选择性应用能力。

主要贡献

提出了BenchPreS基准测试
定义了Misapplication Rate和Appropriate Application Rate两个指标
揭示了现有LLMs在上下文感知偏好应用方面的不足

方法论

构建包含多种语境的对话数据集，并使用MR和AAR指标评估LLMs在不同语境下的偏好应用情况。

原文摘要

Large language models (LLMs) increasingly store user preferences in persistent memory to support personalization across interactions. However, in third-party communication settings governed by social and institutional norms, some user preferences may be inappropriate to apply. We introduce BenchPreS, which evaluates whether memory-based user preferences are appropriately applied or suppressed across communication contexts. Using two complementary metrics, Misapplication Rate (MR) and Appropriate Application Rate (AAR), we find even frontier LLMs struggle to apply preferences in a context-sensitive manner. Models with stronger preference adherence exhibit higher rates of over-application, and neither reasoning capability nor prompt-based defenses fully resolve this issue. These results suggest current LLMs treat personalized preferences as globally enforceable rules rather than as context-dependent normative signals.

arXiv 分类

cs.AI cs.CL

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类