LLM Reasoning 相关度: 8/10

Physical Commonsense Reasoning for Lower-Resourced Languages and Dialects: a Study on Basque

Jaione Bengoetxea, Itziar Gonzalez-Dios, Rodrigo Agerri
arXiv: 2602.14812v1 发布: 2026-02-16 更新: 2026-02-16

AI 摘要

论文构建了巴斯克语的物理常识推理数据集BasPhyCo,并评估了LLM在低资源语言上的表现。

主要贡献

  • 构建了巴斯克语物理常识推理数据集BasPhyCo
  • 评估了LLM在巴斯克语,尤其是方言变体上的物理常识推理能力
  • 提出了一个分层结构的常识理解评估方法

方法论

基于意大利语GITA数据集,创建巴斯克语数据集,并通过准确率、一致性和可验证性三个指标评估LLM的性能。

原文摘要

Physical commonsense reasoning represents a fundamental capability of human intelligence, enabling individuals to understand their environment, predict future events, and navigate physical spaces. Recent years have witnessed growing interest in reasoning tasks within Natural Language Processing (NLP). However, no prior research has examined the performance of Large Language Models (LLMs) on non-question-answering (non-QA) physical commonsense reasoning tasks in low-resource languages such as Basque. Taking the Italian GITA as a starting point, this paper addresses this gap by presenting BasPhyCo, the first non-QA physical commonsense reasoning dataset for Basque, available in both standard and dialectal variants. We evaluate model performance across three hierarchical levels of commonsense understanding: (1) distinguishing between plausible and implausible narratives (accuracy), (2) identifying the conflicting element that renders a narrative implausible (consistency), and (3) determining the specific physical state that creates the implausibility (verifiability). These tasks were assessed using multiple multilingual LLMs as well as models pretrained specifically for Italian and Basque. Results indicate that, in terms of verifiability, LLMs exhibit limited physical commonsense capabilities in low-resource languages such as Basque, especially when processing dialectal variants.

标签

物理常识推理 低资源语言 巴斯克语 数据集 LLM评估

arXiv 分类

cs.CL