Agent Tuning & Optimization 相关度: 9/10

Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks

Shuo He, Lang Feng, Qi Wei, Xin Cheng, Lei Feng, Bo An
arXiv: 2602.22817v1 发布: 2026-02-26 更新: 2026-02-26

AI 摘要

HGPO通过层级分组优化解决长时程Agent任务中上下文不一致导致的优势估计偏差问题。

主要贡献

  • 提出Hierarchy-of-Groups Policy Optimization (HGPO)
  • 解决了stepwise优势估计中的上下文不一致问题
  • 通过层级分组实现偏差-方差的平衡

方法论

HGPO将每一步骤分配到多个层级分组中,在每个组内计算优势,并自适应加权聚合,从而优化优势估计。

原文摘要

Group-based reinforcement learning (RL), such as GRPO, has advanced the capabilities of large language models on long-horizon agentic tasks. To enable more fine-grained policy updates, recent research has increasingly shifted toward stepwise group-based policy optimization, which treats each step in a rollout trajectory independently while using a memory module to retain historical context. However, we find a key issue in estimating stepwise relative advantages, namely context inconsistency, where steps within the same group may differ in their historical contexts. Empirically, we reveal that this issue can lead to severely biased advantage estimation, thereby degrading policy optimization significantly. To address the issue, in this paper, we propose Hierarchy-of-Groups Policy Optimization (HGPO) for long-horizon agentic tasks. Specifically, within a group of rollout trajectories, HGPO assigns each step to multiple hierarchical groups according to the consistency of historical contexts. Then, for each step, HGPO computes distinct advantages within each group and aggregates them with an adaptive weighting scheme. In this way, HGPO can achieve a favorable bias-variance trade-off in stepwise advantage estimation, without extra models or rollouts. Evaluations on two challenging agentic tasks, ALFWorld and WebShop with Qwen2.5-1.5B-Instruct and Qwen2.5-7B-Instruct, show that HGPO significantly outperforms existing agentic RL methods under the same computational constraints. Code is available at https://github.com/langfengQ/verl-agent/tree/master/recipe/hgpo.

标签

强化学习 智能体 策略优化 上下文一致性

arXiv 分类

cs.LG cs.AI