CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion
AI 摘要
提出CLI-Gym方法,通过模拟环境历史生成大规模CLI任务,并提升Agent在终端环境的表现。
主要贡献
- 提出CLI-Gym方法,可扩展地生成环境密集型任务
- 构建了包含1655个任务的数据集,是目前最大的同类数据集
- 通过微调模型LiberCoder,在Terminal-Bench上取得显著提升
方法论
通过Agent模拟和探索环境历史,逆转健康环境状态至失败状态,并将故障状态和错误信息打包成任务。
原文摘要
Agentic coding requires agents to effectively interact with runtime environments, e.g., command line interfaces (CLI), so as to complete tasks like resolving dependency issues, fixing system problems, etc. But it remains underexplored how such environment-intensive tasks can be obtained at scale to enhance agents' capabilities. To address this, based on an analogy between the Dockerfile and the agentic task, we propose to employ agents to simulate and explore environment histories, guided by execution feedback. By tracing histories of a healthy environment, its state can be inverted to an earlier one with runtime failures, from which a task can be derived by packing the buggy state and the corresponding error messages. With our method, named CLI-Gym, a total of 1,655 environment-intensive tasks are derived, being the largest collection of its kind. Moreover, with curated successful trajectories, our fine-tuned model, named LiberCoder, achieves substantial absolute improvements of +21.1% (to 46.1%) on Terminal-Bench, outperforming various strong baselines. To our knowledge, this is the first public pipeline for scalable derivation of environment-intensive tasks.