UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience
AI 摘要
UI-Voyager提出了一种自进化的移动GUI代理,通过RFT和GRSD提高学习效率。
主要贡献
- 提出了Rejection Fine-Tuning (RFT)
- 提出了Group Relative Self-Distillation (GRSD)
- 在AndroidWorld上超越了现有方法和人类水平
方法论
使用RFT进行数据和模型协同进化,然后使用GRSD从成功轨迹中提取知识来纠正失败轨迹。
原文摘要
Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI tasks. To that end, we propose UI-Voyager, a novel two-stage self-evolving mobile GUI agent. In the first stage, we employ Rejection Fine-Tuning (RFT), which enables the continuous co-evolution of data and models in a fully autonomous loop. The second stage introduces Group Relative Self-Distillation (GRSD), which identifies critical fork points in group rollouts and constructs dense step-level supervision from successful trajectories to correct failed ones. Extensive experiments on AndroidWorld show that our 4B model achieves an 81.0% Pass@1 success rate, outperforming numerous recent baselines and exceeding human-level performance. Ablation and case studies further verify the effectiveness of GRSD. Our method represents a significant leap toward efficient, self-evolving, and high-performance mobile GUI automation without expensive manual data annotation.