LLM Memory & RAG 相关度: 9/10

MTRAG-UN: A Benchmark for Open Challenges in Multi-Turn RAG Conversations

Sara Rosenthal, Yannis Katsis, Vraj Shah, Lihong He, Lucian Popa, Marina Danilevsky
arXiv: 2602.23184v1 发布: 2026-02-26 更新: 2026-02-26

AI 摘要

MTRAG-UN是一个多轮RAG对话评测基准,用于评估模型在不可回答、不明确等问题上的表现。

主要贡献

  • 提出了MTRAG-UN基准数据集
  • 包含6个领域超过2800轮对话
  • 强调了现有模型在处理复杂多轮对话中的不足

方法论

构建了一个包含多种具有挑战性对话类型的评测基准,并使用现有检索和生成模型进行了实验评估。

原文摘要

We present MTRAG-UN, a benchmark for exploring open challenges in multi-turn retrieval augmented generation, a popular use of large language models. We release a benchmark of 666 tasks containing over 2,800 conversation turns across 6 domains with accompanying corpora. Our experiments show that retrieval and generation models continue to struggle on conversations with UNanswerable, UNderspecified, and NONstandalone questions and UNclear responses. Our benchmark is available at https://github.com/IBM/mt-rag-benchmark

标签

RAG 多轮对话 评测基准 LLM

arXiv 分类

cs.CL