Retrospective In-Context Learning for Temporal Credit Assignment with Large Language Models
AI 摘要
利用LLM进行回顾性上下文学习,实现高效的时间信用分配,提升强化学习样本效率。
主要贡献
- 提出回顾性上下文学习(RICL)方法,利用LLM进行优势函数估计
- 提出在线学习框架RICOL,迭代优化策略
- 实验证明RICL具有高效的样本利用率和良好的泛化性
方法论
利用LLM将稀疏奖励转化为密集的优势函数,并通过在线学习框架迭代优化策略。
原文摘要
Learning from self-sampled data and sparse environmental feedback remains a fundamental challenge in training self-evolving agents. Temporal credit assignment mitigates this issue by transforming sparse feedback into dense supervision signals. However, previous approaches typically depend on learning task-specific value functions for credit assignment, which suffer from poor sample efficiency and limited generalization. In this work, we propose to leverage pretrained knowledge from large language models (LLMs) to transform sparse rewards into dense training signals (i.e., the advantage function) through retrospective in-context learning (RICL). We further propose an online learning framework, RICOL, which iteratively refines the policy based on the credit assignment results from RICL. We empirically demonstrate that RICL can accurately estimate the advantage function with limited samples and effectively identify critical states in the environment for temporal credit assignment. Extended evaluation on four BabyAI scenarios show that RICOL achieves comparable convergent performance with traditional online RL algorithms with significantly higher sample efficiency. Our findings highlight the potential of leveraging LLMs for temporal credit assignment, paving the way for more sample-efficient and generalizable RL paradigms.