Once Correct, Still Wrong: Counterfactual Hallucination in Multilingual Vision-Language Models
AI 摘要
论文揭示了多语言视觉-语言模型在非西方文化背景下的反事实幻觉问题,并提出了新的评估基准。
主要贡献
- 提出了M2CQA基准测试,用于评估中东北非文化背景下的多语言视觉-语言模型的反事实幻觉
- 提出了CounterFactual Hallucination Rate (CFHR)指标,用于衡量模型在正确回答真实语句后接受反事实语句的可能性
- 揭示了现有视觉-语言模型在阿拉伯语(特别是方言)中存在显著的反事实幻觉问题,尤其是在reasoning-first prompting下
方法论
构建包含17个中东北非国家图像和对比性语句的数据集,评估模型在不同prompt策略下的准确率和CFHR。
原文摘要
Vision-language models (VLMs) can achieve high accuracy while still accepting culturally plausible but visually incorrect interpretations. Existing hallucination benchmarks rarely test this failure mode, particularly outside Western contexts and English. We introduce M2CQA, a culturally grounded multimodal benchmark built from images spanning 17 MENA countries, paired with contrastive true and counterfactual statements in English, Arabic, and its dialects. To isolate hallucination beyond raw accuracy, we propose the CounterFactual Hallucination Rate (CFHR), which measures counterfactual acceptance conditioned on correctly answering the true statement. Evaluating state-of-the-art VLMs under multiple prompting strategies, we find that CFHR rises sharply in Arabic, especially in dialects, even when true-statement accuracy remains high. Moreover, reasoning-first prompting consistently increases counterfactual hallucination, while answering before justifying improves robustness. We will make the experimental resources and dataset publicly available for the community.