Agent Tuning & Optimization 相关度: 9/10

KLong: Training LLM Agent for Extremely Long-horizon Tasks

Yue Liu, Zhiyuan Hu, Flood Sung, Jiaheng Zhang, Bryan Hooi
arXiv: 2602.17547v1 发布: 2026-02-19 更新: 2026-02-19

AI 摘要

KLong通过轨迹分割SFT和渐进式RL训练,提升LLM Agent的超长时程任务解决能力。

主要贡献

  • 提出轨迹分割SFT方法
  • 提出渐进式RL训练方法
  • 构建Research-Factory自动化数据生成流程

方法论

使用Research-Factory构建数据,采用轨迹分割SFT进行预训练,然后通过渐进式RL进行微调。

原文摘要

This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a comprehensive SFT recipe. Then, we introduce Research-Factory, an automated pipeline that generates high-quality training data by collecting research papers and constructing evaluation rubrics. Using this pipeline, we build thousands of long-horizon trajectories distilled from Claude 4.5 Sonnet (Thinking). To train with these extremely long trajectories, we propose a new trajectory-splitting SFT, which preserves early context, progressively truncates later context, and maintains overlap between sub-trajectories. In addition, to further improve long-horizon task-solving capability, we propose a novel progressive RL, which schedules training into multiple stages with progressively extended timeouts. Experiments demonstrate the superiority and generalization of KLong, as shown in Figure 1. Notably, our proposed KLong (106B) surpasses Kimi K2 Thinking (1T) by 11.28% on PaperBench, and the performance improvement generalizes to other coding benchmarks like SWE-bench Verified and MLE-bench.

标签

LLM Agent Long-horizon Task Reinforcement Learning

arXiv 分类

cs.AI cs.CL