Probing Cultural Signals in Large Language Models through Author Profiling
AI 摘要
该论文研究了大型语言模型中存在的文化偏见,通过歌词作者身份推断评估其文化倾向。
主要贡献
- 揭示了LLM在作者身份推断中存在的文化偏见
- 提出了量化文化差异的公平性指标MAD和RD
- 评估了多个开源LLM的文化倾向,并进行了比较分析
方法论
通过零样本学习,评估LLM基于歌词推断歌手性别和种族的能力,并分析模型的预测分布和生成理由。
原文摘要
Large language models (LLMs) are increasingly deployed in applications with societal impact, raising concerns about the cultural biases they encode. We probe these representations by evaluating whether LLMs can perform author profiling from song lyrics in a zero-shot setting, inferring singers' gender and ethnicity without task-specific fine-tuning. Across several open-source models evaluated on more than 10,000 lyrics, we find that LLMs achieve non-trivial profiling performance but demonstrate systematic cultural alignment: most models default toward North American ethnicity, while DeepSeek-1.5B aligns more strongly with Asian ethnicity. This finding emerges from both the models' prediction distributions and an analysis of their generated rationales. To quantify these disparities, we introduce two fairness metrics, Modality Accuracy Divergence (MAD) and Recall Divergence (RD), and show that Ministral-8B displays the strongest ethnicity bias among the evaluated models, whereas Gemma-12B shows the most balanced behavior. Our code is available on GitHub (https://github.com/ValentinLafargue/CulturalProbingLLM).