AI Agents 相关度: 7/10

Data-Efficient Hierarchical Goal-Conditioned Reinforcement Learning via Normalizing Flows

Shaswat Garg, Matin Moezzi, Brandon Da Silva

arXiv: 2602.11142v1 发布: 2026-02-11 更新: 2026-02-11

下载 PDF arXiv 页面

AI 摘要

提出NF-HIQL，利用Normalizing Flow增强H-GCRL数据效率和策略表达能力，解决长时程任务难题。

主要贡献

提出基于Normalizing Flow的层级隐式Q学习框架NF-HIQL
为RealNVP策略推导出显式KL散度界限和PAC样本效率结果
在多种长时程任务中验证了NF-HIQL的优越性和鲁棒性

方法论

使用Normalizing Flow替换H-GCRL中的高斯策略，提高策略表达能力，并推导理论保证，优化数据效率。

原文摘要

Hierarchical goal-conditioned reinforcement learning (H-GCRL) provides a powerful framework for tackling complex, long-horizon tasks by decomposing them into structured subgoals. However, its practical adoption is hindered by poor data efficiency and limited policy expressivity, especially in offline or data-scarce regimes. In this work, Normalizing flow-based hierarchical implicit Q-learning (NF-HIQL), a novel framework that replaces unimodal gaussian policies with expressive normalizing flow policies at both the high- and low-levels of the hierarchy is introduced. This design enables tractable log-likelihood computation, efficient sampling, and the ability to model rich multimodal behaviors. New theoretical guarantees are derived, including explicit KL-divergence bounds for Real-valued non-volume preserving (RealNVP) policies and PAC-style sample efficiency results, showing that NF-HIQL preserves stability while improving generalization. Empirically, NF-HIQL is evaluted across diverse long-horizon tasks in locomotion, ball-dribbling, and multi-step manipulation from OGBench. NF-HIQL consistently outperforms prior goal-conditioned and hierarchical baselines, demonstrating superior robustness under limited data and highlighting the potential of flow-based architectures for scalable, data-efficient hierarchical reinforcement learning.

arXiv 分类

cs.RO cs.AI cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类