Feature Resemblance: On the Theoretical Understanding of Analogical Reasoning in Transformers
AI 摘要
该论文理论分析了Transformer中类比推理的涌现,揭示了表征对齐对推理能力的重要性。
主要贡献
- 证明了基于相似性和属性的联合训练能够通过表征对齐实现类比推理
- 揭示了顺序训练中学习顺序的重要性,即先学习相似性结构
- 指出两跳推理可简化为具有身份桥梁的类比推理,需在训练数据中显式存在
方法论
该研究采用理论证明和实验验证相结合的方法,分析Transformer中类比推理的机制。
原文摘要
Understanding reasoning in large language models is complicated by evaluations that conflate multiple reasoning types. We isolate analogical reasoning (inferring shared properties between entities based on known similarities) and analyze its emergence in transformers. We theoretically prove three key results: (1) Joint training on similarity and attribution premises enables analogical reasoning through aligned representations; (2) Sequential training succeeds only when similarity structure is learned before specific attributes, revealing a necessary curriculum; (3) Two-hop reasoning ($a \to b, b \to c \implies a \to c$) reduces to analogical reasoning with identity bridges ($b = b$), which must appear explicitly in training data. These results reveal a unified mechanism: transformers encode entities with similar properties into similar representations, enabling property transfer through feature alignment. Experiments with architectures up to 1.5B parameters validate our theory and demonstrate how representational geometry shapes inductive reasoning capabilities.