Agent Tuning & Optimization 相关度: 8/10

Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization

Ruijie Hao, Longfei Zhang, Yang Dai, Yang Ma, Xingxing Liang, Guangquan Cheng

arXiv: 2604.00977v1 发布: 2026-04-01 更新: 2026-04-01

下载 PDF arXiv 页面

AI 摘要

提出FP-DRL算法，结合Flow匹配和Distributional RL，解决传统RL多模态分布建模不足的问题。

主要贡献

提出Flow-based Policy模型
结合Distributional RL优化返回值分布
在MuJoCo基准测试上达到SOTA性能

方法论

使用Flow匹配建模策略，学习复杂分布，同时使用Distributional RL建模和优化返回值分布，引导多模态策略更新。

原文摘要

Reinforcement Learning (RL) has proven highly effective in addressing complex control and decision-making tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution, which constrains the policy from capturing multimodal distributions, making it difficult to cover the full range of optimal solutions in multi-solution problems, and the return is reduced to a mean value, losing its multimodal nature and thus providing insufficient guidance for policy updates. In response to these problems, we propose a RL algorithm termed flow-based policy with distributional RL (FP-DRL). This algorithm models the policy using flow matching, which offers both computational efficiency and the capacity to fit complex distributions. Additionally, it employs a distributional RL approach to model and optimize the entire return distribution, thereby more effectively guiding multimodal policy updates and improving agent performance. Experimental trails on MuJoCo benchmarks demonstrate that the FP-DRL algorithm achieves state-of-the-art (SOTA) performance in most MuJoCo control tasks while exhibiting superior representation capability of the flow policy.

arXiv 分类

cs.LG cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类