ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget
AI 摘要
ORBIT提出了一种低成本、可验证的搜索智能体训练数据生成框架,并证明了其有效性。
主要贡献
- 提出了ORBIT数据集生成框架,无需付费API
- 构建了包含20K推理密集型查询的数据集
- 验证了ORBIT数据集在训练搜索智能体上的有效性
方法论
ORBIT框架包含种子创建、问答对生成、自验证和外部验证四个阶段,确保生成数据的质量和可信度。
原文摘要
Search agents, which integrate language models (LMs) with web search, are becoming crucial for answering complex user queries. Constructing training datasets for deep research tasks, involving multi-step retrieval and reasoning, remains challenging due to expensive human annotation, or cumbersome prerequisites. In this work, we introduce ORBIT, a training dataset with 20K reasoning-intensive queries with short verifiable answers, generated using a frugal framework without relying on paid API services. The modular framework relies on four stages: seed creation, question--answer pair generation, and two stages of verification: self and external. ORBIT spans 15 domains and each training pair requires 4--5 reasoning steps, with external search verification required from the complete web. We train Qwen3-4B as the base model on ORBIT using GRPO and evaluate it on Wikipedia question answering tasks. Extensive experiment results demonstrate that ORBIT-4B achieves strong performance among sub-4B LLMs as search agents, proving the utility of synthetic datasets. Our framework, code and datasets are open-sourced and available publicly.