LLM Reasoning 相关度: 9/10

How often do Answers Change? Estimating Recency Requirements in Question Answering

Bhawna Piryani, Zehra Mert, Adam Jatowt
arXiv: 2603.16544v1 发布: 2026-03-17 更新: 2026-03-17

AI 摘要

论文提出RecencyQA数据集,用于评估LLM在时间敏感问题上的表现,关注答案时效性和上下文依赖性。

主要贡献

  • 提出recency-stationarity分类法
  • 构建RecencyQA数据集
  • 分析LLM在非平稳问题上的挑战

方法论

人工标注4031个开放域问题,使用recency和stationarity标签进行分类,并通过LLM进行实验评估。

原文摘要

Large language models (LLMs) often rely on outdated knowledge when answering time-sensitive questions, leading to confident yet incorrect responses. Without explicit signals indicating whether up-to-date information is required, models struggle to decide when to retrieve external evidence, how to reason about stale facts, and how to rank answers by their validity. Existing benchmarks either periodically refresh answers or rely on fixed templates, but they do not reflect on how frequently answers change or whether a question inherently requires up-to-date information. To address this gap, we introduce a recency-stationarity taxonomy that categorizes questions by how often their answers change and whether this change frequency is time-invariant or context-dependent. Building on this taxonomy, we present RecencyQA, a dataset of 4,031 open-domain questions annotated with recency and stationarity labels. Through human evaluation and empirical analysis, we show that non-stationary questions, i.e., those where context changes the recency requirement, are significantly more challenging for LLMs, with difficulty increasing as update frequency rises. By explicitly modeling recency and context dependence, RecencyQA enables fine-grained benchmarking and analysis of temporal reasoning beyond binary notions of freshness, and provides a foundation for developing recency-aware and context-sensitive question answering systems.

标签

问答系统 时间推理 数据集 语言模型

arXiv 分类

cs.CL