LLM Memory & RAG 相关度: 9/10

MTRAG-UN: A Benchmark for Open Challenges in Multi-Turn RAG Conversations

Sara Rosenthal, Yannis Katsis, Vraj Shah, Lihong He, Lucian Popa, Marina Danilevsky

arXiv: 2602.23184v1 发布: 2026-02-26 更新: 2026-02-26

下载 PDF arXiv 页面

AI 摘要

MTRAG-UN是一个多轮RAG对话评测基准，用于评估模型在不可回答、不明确等问题上的表现。

主要贡献

提出了MTRAG-UN基准数据集
包含6个领域超过2800轮对话
强调了现有模型在处理复杂多轮对话中的不足

方法论

构建了一个包含多种具有挑战性对话类型的评测基准，并使用现有检索和生成模型进行了实验评估。

原文摘要

We present MTRAG-UN, a benchmark for exploring open challenges in multi-turn retrieval augmented generation, a popular use of large language models. We release a benchmark of 666 tasks containing over 2,800 conversation turns across 6 domains with accompanying corpora. Our experiments show that retrieval and generation models continue to struggle on conversations with UNanswerable, UNderspecified, and NONstandalone questions and UNclear responses. Our benchmark is available at https://github.com/IBM/mt-rag-benchmark

arXiv 分类

cs.CL

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类