LLM Reasoning 相关度: 7/10

AfrIFact: Cultural Information Retrieval, Evidence Extraction and Fact Checking for African Languages

Israel Abebe Azime, Jesujoba Oluwadara Alabi, Crystina Zhang, Iffat Maab, Atnafu Lambebo Tonja, Tadesse Destaw Belay, Folasade Peace Alabi, Salomey Osei, Saminu Mohammad Aliyu, Nkechinyere Faith Aguobi, Bontu Fufa Balcha, Blessing Kudzaishe Sibanda, Davis David, Mouhamadane Mboup, Daud Abolade, Neo Putini, Philipp Slusallek, David Ifeoluwa Adelani, Dietrich Klakow
arXiv: 2604.00706v1 发布: 2026-04-01 更新: 2026-04-01

AI 摘要

AfrIFact数据集促进非洲语言的自动事实核查研究,揭示了跨语言检索和LLM在多语言事实验证方面的挑战。

主要贡献

  • 构建了包含十种非洲语言和英语的事实核查数据集AfrIFact
  • 评估了嵌入模型在跨语言检索方面的能力
  • 评估了LLM在非洲语言事实验证方面的能力

方法论

构建数据集,评估现有模型(embedding models, LLMs)在信息检索和事实核查任务上的表现,并尝试few-shot prompting和fine-tuning。

原文摘要

Assessing the veracity of a claim made online is a complex and important task with real-world implications. When these claims are directed at communities with limited access to information and the content concerns issues such as healthcare and culture, the consequences intensify, especially in low-resource languages. In this work, we introduce AfrIFact, a dataset that covers the necessary steps for automatic fact-checking (i.e., information retrieval, evidence extraction, and fact checking), in ten African languages and English. Our evaluation results show that even the best embedding models lack cross-lingual retrieval capabilities, and that cultural and news documents are easier to retrieve than healthcare-domain documents, both in large corpora and in single documents. We show that LLMs lack robust multilingual fact-verification capabilities in African languages, while few-shot prompting improves performance by up to 43% in AfriqueQwen-14B, and task-specific fine-tuning further improves fact-checking accuracy by up to 26%. These findings, along with our release of the AfrIFact dataset, encourage work on low-resource information retrieval, evidence retrieval, and fact checking.

标签

Fact Checking African Languages Information Retrieval Low-Resource Languages Multilingual

arXiv 分类

cs.CL