AfrIFact: Cultural Information Retrieval, Evidence Extraction and Fact Checking for African Languages
AI 摘要
AfrIFact数据集促进非洲语言的自动事实核查研究,揭示了跨语言检索和LLM在多语言事实验证方面的挑战。
主要贡献
- 构建了包含十种非洲语言和英语的事实核查数据集AfrIFact
- 评估了嵌入模型在跨语言检索方面的能力
- 评估了LLM在非洲语言事实验证方面的能力
方法论
构建数据集,评估现有模型(embedding models, LLMs)在信息检索和事实核查任务上的表现,并尝试few-shot prompting和fine-tuning。
原文摘要
Assessing the veracity of a claim made online is a complex and important task with real-world implications. When these claims are directed at communities with limited access to information and the content concerns issues such as healthcare and culture, the consequences intensify, especially in low-resource languages. In this work, we introduce AfrIFact, a dataset that covers the necessary steps for automatic fact-checking (i.e., information retrieval, evidence extraction, and fact checking), in ten African languages and English. Our evaluation results show that even the best embedding models lack cross-lingual retrieval capabilities, and that cultural and news documents are easier to retrieve than healthcare-domain documents, both in large corpora and in single documents. We show that LLMs lack robust multilingual fact-verification capabilities in African languages, while few-shot prompting improves performance by up to 43% in AfriqueQwen-14B, and task-specific fine-tuning further improves fact-checking accuracy by up to 26%. These findings, along with our release of the AfrIFact dataset, encourage work on low-resource information retrieval, evidence retrieval, and fact checking.