Linear Reasoning vs. Proof by Cases: Obstacles for Large Language Models in FOL Problem Solving
AI 摘要
论文提出了一个关注基于案例推理的FOL数据集,并分析了LLM在此类问题上的表现差距。
主要贡献
- 提出了新的FOL数据集PC-FOL,专注于基于案例的推理。
- 实验表明LLM在线性推理和基于案例推理问题上存在显著的性能差距。
- 提供了一个基于图模型的理论分析,解释了这种差距的原因。
方法论
构建数据集并使用领先的LLM进行实验,随后进行图模型理论分析,以解释实验结果。
原文摘要
To comprehensively evaluate the mathematical reasoning capabilities of Large Language Models (LLMs), researchers have introduced abundant mathematical reasoning datasets. However, most existing datasets primarily focus on linear reasoning, neglecting other parts such as proof by contradiction and proof by cases, which are crucial for investigating LLMs' reasoning abilities. To address this limitation, we first introduce a novel first-order logic (FOL) dataset named PC-FOL, annotated by professional mathematicians, focusing on case-based reasoning problems. All instances in this dataset are equipped with a manually written natural language proof, clearly distinguishing it from conventional linear reasoning datasets. Our experimental results over leading LLMs demonstrate a substantial performance gap between linear reasoning and case-based reasoning problems. To further investigate this phenomenon, we provide a theoretical analysis grounded in graphical model, which provides an explanation for the observed disparity between the two types of reasoning problems. We hope this work can reveal the core challenges in the field of automated natural language mathematical proof generation, paving the way for future research.