LLM Reasoning 相关度: 8/10

Can Large Language Models Generalize Procedures Across Representations?

Fangru Lin, Valentin Hofmann, Xingchen Wan, Weixing Wang, Zifeng Ding, Anthony G. Cohn, Janet B. Pierrehumbert

arXiv: 2602.03542v1 发布: 2026-02-03 更新: 2026-02-03

下载 PDF arXiv 页面

AI 摘要

研究LLM在代码、图和自然语言等表示之间的泛化能力，并提出一种两阶段数据课程。

主要贡献

揭示了LLM在不同表示形式之间泛化的局限性
提出了一种有效的两阶段数据课程训练方法
证明了该方法显著提升了跨表示的泛化能力

方法论

通过对比不同训练策略下LLM在同构任务上的表现，验证两阶段数据课程的有效性。

原文摘要

Large language models (LLMs) are trained and tested extensively on symbolic representations such as code and graphs, yet real-world user tasks are often specified in natural language. To what extent can LLMs generalize across these representations? Here, we approach this question by studying isomorphic tasks involving procedures represented in code, graphs, and natural language (e.g., scheduling steps in planning). We find that training LLMs with popular post-training methods on graphs or code data alone does not reliably generalize to corresponding natural language tasks, while training solely on natural language can lead to inefficient performance gains. To address this gap, we propose a two-stage data curriculum that first trains on symbolic, then natural language data. The curriculum substantially improves model performance across model families and tasks. Remarkably, a 1.5B Qwen model trained by our method can closely match zero-shot GPT-4o in naturalistic planning. Finally, our analysis suggests that successful cross-representation generalization can be interpreted as a form of generative analogy, which our curriculum effectively encourages.

arXiv 分类

cs.CL cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类