AI Agents 相关度: 8/10

MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings

Shahil Shaik, Aditya Parameshwaran, Anshul Nayak, Jonathon M. Smereka, Yue Wang
arXiv: 2603.15418v1 发布: 2026-03-16 更新: 2026-03-16

AI 摘要

提出MA-VLCM,利用预训练VLM作为多智能体强化学习的critic,提升样本效率和泛化性。

主要贡献

  • 使用VLM作为多智能体强化学习的critic
  • 提升了多智能体强化学习的样本效率
  • 实现了更好的零样本泛化能力

方法论

使用预训练的视觉-语言模型,通过微调使其能够评估多智能体行为,代替传统强化学习中的critic。

原文摘要

Multi-agent reinforcement learning (MARL) commonly relies on a centralized critic to estimate the value function. However, learning such a critic from scratch is highly sample-inefficient and often lacks generalization across environments. At the same time, large vision-language-action models (VLAs) trained on internet-scale data exhibit strong multimodal reasoning and zero-shot generalization capabilities, yet directly deploying them for robotic execution remains computationally prohibitive, particularly in heterogeneous multi-robot systems with diverse embodiments and resource constraints. To address these challenges, we propose Multi-Agent Vision-Language-Critic Models (MA-VLCM), a framework that replaces the learned centralized critic in MARL with a pretrained vision-language model fine-tuned to evaluate multi-agent behavior. MA-VLCM acts as a centralized critic conditioned on natural language task descriptions, visual trajectory observations, and structured multi-agent state information. By eliminating critic learning during policy optimization, our approach significantly improves sample efficiency while producing compact execution policies suitable for deployment on resource-constrained robots. Results show good zero-shot return estimation on models with differing VLM backbones on in-distribution and out-of-distribution scenarios in multi-agent team settings

标签

多智能体强化学习 视觉语言模型 强化学习

arXiv 分类

cs.RO cs.AI