Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training
AI 摘要
TRIT通过整合翻译训练提升多语言长推理能力,无需额外数据,效果显著。
主要贡献
- 提出TRIT框架,整合翻译训练到多语言推理中
- 提升多语言问题理解和响应生成能力
- 在MMATH数据集上显著优于基线模型
方法论
TRIT是一种自提升框架,联合训练翻译和多语言推理,无需外部反馈或额外多语言数据。
原文摘要
Long reasoning models often struggle in multilingual settings: they tend to reason in English for non-English questions; when constrained to reasoning in the question language, accuracies drop substantially. The struggle is caused by the limited abilities for both multilingual question understanding and multilingual reasoning. To address both problems, we propose TRIT (Translation-Reasoning Integrated Training), a self-improving framework that integrates the training of translation into multilingual reasoning. Without external feedback or additional multilingual data, our method jointly enhances multilingual question understanding and response generation. On MMATH, our method outperforms multiple baselines by an average of 7 percentage points, improving both answer correctness and language consistency. Further analysis reveals that integrating translation training improves cross-lingual question alignment by over 10 percentage points and enhances translation quality for both mathematical questions and general-domain text, with gains up to 8.4 COMET points on FLORES-200.