AI Agents 相关度: 9/10

On Data Engineering for Scaling LLM Terminal Capabilities

Renjie Pi, Grace Lam, Mohammad Shoeybi, Pooya Jannaty, Bryan Catanzaro, Wei Ping

arXiv: 2602.21193v1 发布: 2026-02-24 更新: 2026-02-24

下载 PDF arXiv 页面

AI 摘要

该论文研究了数据工程方法，用于提升LLM在终端任务中的能力，并开源了数据集和模型。

主要贡献

提出 Terminal-Task-Gen 合成任务生成流程
构建大规模终端任务开源数据集 Terminal-Corpus
训练并开源了 Nemotron-Terminal 模型，并在 Terminal-Bench 2.0 上取得显著提升

方法论

通过构建合成数据生成流程，生成大规模数据集，并结合过滤、课程学习等策略训练模型。

原文摘要

Despite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a systematic study of data engineering practices for terminal agents, making two key contributions: (1) Terminal-Task-Gen, a lightweight synthetic task generation pipeline that supports seed-based and skill-based task construction, and (2) a comprehensive analysis of data and training strategies, including filtering, curriculum learning, long context training, and scaling behavior. Our pipeline yields Terminal-Corpus, a large-scale open-source dataset for terminal tasks. Using this dataset, we train Nemotron-Terminal, a family of models initialized from Qwen3(8B, 14B, 32B) that achieve substantial gains on Terminal-Bench 2.0: Nemotron-Terminal-8B improves from 2.5% to 13.0% Nemotron-Terminal-14B improves from 4.0% to 20.2%, and Nemotron-Terminal-32B improves from 3.4% to 27.4%, matching the performance of significantly larger models. To accelerate research in this domain, we open-source our model checkpoints and most of our synthetic datasets at https://huggingface.co/collections/nvidia/nemotron-terminal.

arXiv 分类

cs.CL

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类