Knowledge-Guided Manipulation Using Multi-Task Reinforcement Learning
AI 摘要
提出KG-M3PO框架,融合感知、知识和策略,提升机器人操作任务的泛化性和鲁棒性。
主要贡献
- 提出 Knowledge Graph based Massively Multi-task Model-based Policy Optimization (KG-M3PO)框架
- 使用在线3D场景图增强机器人视觉感知
- 通过RL目标端到端训练图神经网络编码器,使关系特征与控制性能对齐
方法论
构建基于知识图谱的多任务强化学习框架,通过动态关系机制更新场景图,使用图神经网络编码多模态信息,并利用轻量级图查询进行决策。
原文摘要
This paper introduces Knowledge Graph based Massively Multi-task Model-based Policy Optimization (KG-M3PO), a framework for multi-task robotic manipulation in partially observable settings that unifies Perception, Knowledge, and Policy. The method augments egocentric vision with an online 3D scene graph that grounds open-vocabulary detections into a metric, relational representation. A dynamic-relation mechanism updates spatial, containment, and affordance edges at every step, and a graph neural encoder is trained end-to-end through the RL objective so that relational features are shaped directly by control performance. Multiple observation modalities (visual, proprioceptive, linguistic, and graph-based) are encoded into a shared latent space, upon which the RL agent operates to drive the control loop. The policy conditions on lightweight graph queries alongside visual and proprioceptive inputs, yielding a compact, semantically informed state for decision making. Experiments on a suite of manipulation tasks with occlusions, distractors, and layout shifts demonstrate consistent gains over strong baselines: the knowledge-conditioned agent achieves higher success rates, improved sample efficiency, and stronger generalization to novel objects and unseen scene configurations. These results support the premise that structured, continuously maintained world knowledge is a powerful inductive bias for scalable, generalizable manipulation: when the knowledge module participates in the RL computation graph, relational representations align with control, enabling robust long-horizon behavior under partial observability.