RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy
AI 摘要
RPMS通过规则增强和记忆协同,显著提升LLM在具身环境中的规划能力。
主要贡献
- 提出RPMS架构,解决LLM在具身环境中的无效动作生成和状态漂移问题
- 引入结构化规则检索来保证动作可行性
- 使用轻量级信念状态来控制记忆的适用性
方法论
RPMS采用规则优先的仲裁机制,通过规则检索保证动作可行性,并用信念状态过滤记忆,解决冲突。
原文摘要
LLM agents often fail in closed-world embodied environments because actions must satisfy strict preconditions -- such as location, inventory, and container states -- and failure feedback is sparse. We identify two structurally coupled failure modes: (P1) invalid action generation and (P2) state drift, each amplifying the other in a degenerative cycle. We present RPMS, a conflict-managed architecture that enforces action feasibility via structured rule retrieval, gates memory applicability via a lightweight belief state, and resolves conflicts between the two sources via rules-first arbitration. On ALFWorld (134 unseen tasks), RPMS achieves 59.7% single-trial success with Llama 3.1 8B (+23.9 pp over baseline) and 98.5% with Claude Sonnet 4.5 (+11.9 pp); of the 8B gain, rule retrieval alone contributes +14.9 pp (statistically significant), making it the dominant factor. A key finding is that episodic memory is conditionally useful: it harms performance on some task types when used without grounding, but becomes a stable net positive once filtered by current state and constrained by explicit action rules. Adapting RPMS to ScienceWorld with GPT-4 yields consistent gains across all ablation conditions (avg. score 54.0 vs. 44.9 for the ReAct baseline), providing transfer evidence that the core mechanisms hold across structurally distinct environments.