Agent Tuning & Optimization 相关度: 8/10

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin
arXiv: 2603.11653v1 发布: 2026-03-12 更新: 2026-03-12

AI 摘要

简单微调方法结合低秩适应LoRA,在大规模VLA模型的持续强化学习中表现出色。

主要贡献

  • 证明了简单序列微调(Seq. FT)结合LoRA在VLA模型的持续强化学习中有效。
  • 揭示了大规模预训练模型、参数高效适配和在线强化学习之间的协同作用。
  • 提出了Seq. FT作为VLA模型持续强化学习的强大方法。

方法论

使用序列微调(Seq. FT)和低秩适应(LoRA)来适应预训练VLA模型,并结合在线强化学习进行持续学习。

原文摘要

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five challenging lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.

标签

Continual Learning Reinforcement Learning Vision-Language-Action LoRA

arXiv 分类

cs.LG cs.RO