Traces of Social Competence in Large Language Models
AI 摘要
研究表明LLM在心理理论测试中表现出与模型大小、训练方式相关的社会认知能力。
主要贡献
- 使用平衡的FBT变体评估LLM的社会认知能力
- 发现模型规模和训练方式影响FBT表现
- 揭示了与心理状态相关的刻板反应模式
- 使用向量引导验证了“think”向量的因果作用
方法论
使用贝叶斯Logistic回归分析17个开源模型的FBT测试结果,并进行案例研究和向量引导。
原文摘要
The False Belief Test (FBT) has been the main method for assessing Theory of Mind (ToM) and related socio-cognitive competencies. For Large Language Models (LLMs), the reliability and explanatory potential of this test have remained limited due to issues like data contamination, insufficient model details, and inconsistent controls. We address these issues by testing 17 open-weight models on a balanced set of 192 FBT variants (Trott et al. 2023) using Bayesian Logistic regression to identify how model size and post-training affect socio-cognitive competence. We find that scaling model size benefits performance, but not strictly. A cross-over effect reveals that explicating propositional attitudes (X thinks) fundamentally alters response patterns. Instruction tuning partially mitigates this effect, but further reasoning-oriented finetuning amplifies it. In a case study analysing social reasoning ability throughout OLMo 2 training, we show that this cross-over effect emerges during pre-training, suggesting that models acquire stereotypical response patterns tied to mental-state vocabulary that can outweigh other scenario semantics. Finally, vector steering allows us to isolate a think vector as the causal driver of observed FBT behaviour.