LLM Reasoning 相关度: 8/10

Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions

Léo Labat, Etienne Ollion, François Yvon

arXiv: 2602.05932v1 发布: 2026-02-05 更新: 2026-02-05

下载 PDF arXiv 页面

AI 摘要

研究多语言LLM在价值观问题上的一致性，发现语言会影响LLM的回答。

主要贡献

发布了新的多语言价值观调查数据集MEVS
研究了多语言LLM在价值观问题上的语言依赖性
发现instruction-tuned模型存在语言特定行为

方法论

使用MEVS数据集，对多个多语言LLM进行价值观选择题测试，控制提示语变量，分析回答一致性。

原文摘要

Multiple-Choice Questions (MCQs) are often used to assess knowledge, reasoning abilities, and even values encoded in large language models (LLMs). While the effect of multilingualism has been studied on LLM factual recall, this paper seeks to investigate the less explored question of language-induced variation in value-laden MCQ responses. Are multilingual LLMs consistent in their responses across languages, i.e. behave like theoretical polyglots, or do they answer value-laden MCQs depending on the language of the question, like a multitude of monolingual models expressing different values through a single model? We release a new corpus, the Multilingual European Value Survey (MEVS), which, unlike prior work relying on machine translation or ad hoc prompts, solely comprises human-translated survey questions aligned in 8 European languages. We administer a subset of those questions to over thirty multilingual LLMs of various sizes, manufacturers and alignment-fine-tuning status under comprehensive, controlled prompt variations including answer order, symbol type, and tail character. Our results show that while larger, instruction-tuned models display higher overall consistency, the robustness of their responses varies greatly across questions, with certain MCQs eliciting total agreement within and across models while others leave LLM answers split. Language-specific behavior seems to arise in all consistent, instruction-fine-tuned models, but only on certain questions, warranting a further study of the selective effect of preference fine-tuning.

arXiv 分类

cs.CL

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类