Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts
AI 摘要
大型语言模型对人类专家和算法代理表现出不一致的偏见,需谨慎评估其可靠性。
主要贡献
- 揭示了LLM在信任人类专家和算法代理方面的不一致性
- 通过实验证明了LLM在stated preferences和revealed preferences下行为差异
- 强调了任务呈现形式对LLM行为的影响,并讨论了AI安全评估的鲁棒性
方法论
采用行为经济学实验范式,评估LLM在信任度和任务决策中对人类专家和算法代理的偏见。
原文摘要
Large language models are increasingly used in decision-making tasks that require them to process information from a variety of sources, including both human experts and other algorithmic agents. How do LLMs weigh the information provided by these different sources? We consider the well-studied phenomenon of algorithm aversion, in which human decision-makers exhibit bias against predictions from algorithms. Drawing upon experimental paradigms from behavioural economics, we evaluate how eightdifferent LLMs delegate decision-making tasks when the delegatee is framed as a human expert or an algorithmic agent. To be inclusive of different evaluation formats, we conduct our study with two task presentations: stated preferences, modeled through direct queries about trust towards either agent, and revealed preferences, modeled through providing in-context examples of the performance of both agents. When prompted to rate the trustworthiness of human experts and algorithms across diverse tasks, LLMs give higher ratings to the human expert, which correlates with prior results from human respondents. However, when shown the performance of a human expert and an algorithm and asked to place an incentivized bet between the two, LLMs disproportionately choose the algorithm, even when it performs demonstrably worse. These discrepant results suggest that LLMs may encode inconsistent biases towards humans and algorithms, which need to be carefully considered when they are deployed in high-stakes scenarios. Furthermore, we discuss the sensitivity of LLMs to task presentation formats that should be broadly scrutinized in evaluation robustness for AI safety.