AI Agents 相关度: 9/10

SecAgent: Efficient Mobile GUI Agent with Semantic Context

Yiping Xie, Song Chen, Jingxuan Xing, Wei Jiang, Zekun Zhu, Yingyao Wang, Pi Bu, Jun Song, Yuning Jiang, Bo Zheng
arXiv: 2603.08533v1 发布: 2026-03-09 更新: 2026-03-09

AI 摘要

SecAgent提出了基于语义上下文的3B规模高效移动GUI代理,并构建了中文数据集和基准。

主要贡献

  • 构建了高质量中文移动GUI数据集和基准
  • 提出了基于语义上下文的历史表示方法,降低计算成本
  • 实现了优于同等规模模型,媲美更大规模模型的性能

方法论

构建中文GUI数据集,使用语义上下文压缩历史信息,通过监督和强化学习微调3B模型。

原文摘要

Mobile Graphical User Interface (GUI) agents powered by multimodal large language models have demonstrated promising capabilities in automating complex smartphone tasks. However, existing approaches face two critical limitations: the scarcity of high-quality multilingual datasets, particularly for non-English ecosystems, and inefficient history representation methods. To address these challenges, we present SecAgent, an efficient mobile GUI agent at 3B scale. We first construct a human-verified Chinese mobile GUI dataset with 18k grounding samples and 121k navigation steps across 44 applications, along with a Chinese navigation benchmark featuring multi-choice action annotations. Building upon this dataset, we propose a semantic context mechanism that distills history screenshots and actions into concise, natural language summaries, significantly reducing computational costs while preserving task-relevant information. Through supervised and reinforcement fine-tuning, SecAgent outperforms similar-scale baselines and achieves performance comparable to 7B-8B models on our and public navigation benchmarks. We will open-source the training dataset, benchmark, model, and code to advance research in multilingual mobile GUI automation.

标签

Mobile GUI Automation Multimodal LLM Semantic Context Chinese Dataset

arXiv 分类

cs.CV