AI Agents 相关度: 9/10

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

Yining Hong, Huang Huang, Manling Li, Li Fei-Fei, Jiajun Wu, Yejin Choi
arXiv: 2602.21198v1 发布: 2026-02-24 更新: 2026-02-24

AI 摘要

提出了Reflective Test-Time Planning,通过反思改进具身LLM的决策,提升任务完成能力。

主要贡献

  • 引入Reflection-in-action和Reflection-on-action两种反思模式
  • 提出Retrospective Reflection,实现长时程信用分配
  • 设计了Long-Horizon Household和MuJoCo Cupboard Fitting benchmark

方法论

通过测试时缩放生成候选动作、执行后更新模型,并结合回顾性反思,实现智能体在试错中学习。

原文摘要

Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials where mistakes repeat rather than accumulate into experience. Drawing upon human reflective practitioners, we introduce Reflective Test-Time Planning, which integrates two modes of reflection: \textit{reflection-in-action}, where the agent uses test-time scaling to generate and score multiple candidate actions using internal reflections before execution; and \textit{reflection-on-action}, which uses test-time training to update both its internal reflection model and its action policy based on external reflections after execution. We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight for proper long-horizon credit assignment. Experiments on our newly-designed Long-Horizon Household benchmark and MuJoCo Cupboard Fitting benchmark show significant gains over baseline models, with ablative studies validating the complementary roles of reflection-in-action and reflection-on-action. Qualitative analyses, including real-robot trials, highlight behavioral correction through reflection.

标签

Embodied LLM Reflection Test-Time Planning Robotics

arXiv 分类

cs.LG cs.AI cs.CL cs.CV cs.RO