When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning
AI 摘要
ASK结合小语言模型与强化学习,通过不确定性门控提升强化学习在OOD场景下的泛化能力。
主要贡献
- 提出ASK方法,提升RL在OOD场景下的泛化能力
- 利用Monte Carlo Dropout评估不确定性,选择性调用LM
- 验证了有效神经符号集成需要精细编排
方法论
ASK方法使用Monte Carlo Dropout评估不确定性,当超过阈值时,查询LM获取动作建议,否则使用RL策略。
原文摘要
Reinforcement learning (RL) agents often struggle with out-of-distribution (OOD) scenarios, leading to high uncertainty and random behavior. While language models (LMs) contain valuable world knowledge, larger ones incur high computational costs, hindering real-time use, and exhibit limitations in autonomous planning. We introduce Adaptive Safety through Knowledge (ASK), which combines smaller LMs with trained RL policies to enhance OOD generalization without retraining. ASK employs Monte Carlo Dropout to assess uncertainty and queries the LM for action suggestions only when uncertainty exceeds a set threshold. This selective use preserves the efficiency of existing policies while leveraging the language model's reasoning in uncertain situations. In experiments on the FrozenLake environment, ASK shows no improvement in-domain, but demonstrates robust navigation in transfer tasks, achieving a reward of 0.95. Our findings indicate that effective neuro-symbolic integration requires careful orchestration rather than simple combination, highlighting the need for sufficient model scale and effective hybridization mechanisms for successful OOD generalization.