LLM Reasoning 相关度: 9/10

No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural Understanding

Vynska Amalia Permadi, Xingwei Tan, Nafise Sadat Moosavi, Nikos Aletras

arXiv: 2602.03709v1 发布: 2026-02-03 更新: 2026-02-03

下载 PDF arXiv 页面

AI 摘要

提出了ID-MoCQA，一个用于评估LLM文化理解能力的大规模多跳印尼文化问答数据集。

主要贡献

构建了大规模印尼文化多跳问答数据集ID-MoCQA
提出了将单跳问题转换为多跳推理链的框架
设计了多阶段验证流程确保数据集质量

方法论

通过专家评审和LLM筛选，系统性地将单跳文化问题转化为包含六种线索类型的多跳推理链。

原文摘要

Understanding culture requires reasoning across context, tradition, and implicit social knowledge, far beyond recalling isolated facts. Yet most culturally focused question answering (QA) benchmarks rely on single-hop questions, which may allow models to exploit shallow cues rather than demonstrate genuine cultural reasoning. In this work, we introduce ID-MoCQA, the first large-scale multi-hop QA dataset for assessing the cultural understanding of large language models (LLMs), grounded in Indonesian traditions and available in both English and Indonesian. We present a new framework that systematically transforms single-hop cultural questions into multi-hop reasoning chains spanning six clue types (e.g., commonsense, temporal, geographical). Our multi-stage validation pipeline, combining expert review and LLM-as-a-judge filtering, ensures high-quality question-answer pairs. Our evaluation across state-of-the-art models reveals substantial gaps in cultural reasoning, particularly in tasks requiring nuanced inference. ID-MoCQA provides a challenging and essential benchmark for advancing the cultural competency of LLMs.

arXiv 分类

cs.CL

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类