LLM Reasoning 相关度: 9/10

AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Multi-Turn Dialogue

Adib Sakhawat, Fardeen Sadab, Rakin Shahriar

arXiv: 2602.17443v1 发布: 2026-02-19 更新: 2026-02-19

下载 PDF arXiv 页面

AI 摘要

AIDG框架评估LLM在多轮对话中信息提取与包含的不对称性，揭示其推理瓶颈。

主要贡献

提出了AIDG评估框架，用于评估LLM的战略推理能力
设计了AIDG-I和AIDG-II两个任务，分别测量社交推理和约束满足
发现LLM在信息包含方面优于信息提取，存在能力不对称

方法论

设计博弈论框架AIDG，包含社交演绎和“20问”两个任务，通过与LLM进行多轮对话，分析其在信息提取和包含方面的能力差异。

原文摘要

Evaluating the strategic reasoning capabilities of Large Language Models (LLMs) requires moving beyond static benchmarks to dynamic, multi-turn interactions. We introduce AIDG (Adversarial Information Deduction Game), a game-theoretic framework that probes the asymmetry between information extraction (active deduction) and information containment (state maintenance) in dialogue. We propose two complementary tasks: AIDG-I, measuring pragmatic strategy in social deduction, and AIDG-II, measuring constraint satisfaction in a structured "20 Questions" setting. Across 439 games with six frontier LLMs, we observe a clear capability asymmetry: models perform substantially better at containment than deduction, with a 350 ELO advantage on defense;(Cohen's d = 5.47). We identify two bottlenecks driving this gap: (1) Information Dynamics, where confirmation strategies are 7.75x more effective than blind deduction (p < 0.00001), and (2) Constraint Adherence, where instruction-following degrades under conversational load, accounting for 41.3% of deductive failures. These findings suggest that while LLMs excel at local defensive coherence, they struggle with the global state tracking required for strategic inquiry.

arXiv 分类

cs.CL

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类