Evaluating Counterfactual Strategic Reasoning in Large Language Models
arXiv: 2603.19167v1
发布: 2026-03-19
更新: 2026-03-19
AI 摘要
评估大语言模型在反事实博弈中的策略推理能力,揭示其在策略泛化和激励敏感性方面的局限性。
主要贡献
- 提出了反事实博弈的概念来评估LLM的策略推理能力
- 设计了多指标评估框架,对比了默认和反事实环境下的LLM表现
- 揭示了LLM在激励敏感性、结构泛化和策略推理方面的局限性
方法论
通过在囚徒困境和石头剪刀布游戏中引入反事实变体,评估LLM在改变博弈规则后的策略表现。
原文摘要
We evaluate Large Language Models (LLMs) in repeated game-theoretic settings to assess whether strategic performance reflects genuine reasoning or reliance on memorized patterns. We consider two canonical games, Prisoner's Dilemma (PD) and Rock-Paper-Scissors (RPS), upon which we introduce counterfactual variants that alter payoff structures and action labels, breaking familiar symmetries and dominance relations. Our multi-metric evaluation framework compares default and counterfactual instantiations, showcasing LLM limitations in incentive sensitivity, structural generalization and strategic reasoning within counterfactual environments.