AI Agents 相关度: 9/10

ActionParty: Multi-Subject Action Binding in Generative Video Games

Alexander Pondaven, Ziyi Wu, Igor Gilitschenski, Philip Torr, Sergey Tulyakov, Fabio Pizzati, Aliaksandr Siarohin
arXiv: 2604.02330v1 发布: 2026-04-02 更新: 2026-04-02

AI 摘要

ActionParty提出了一种多主体行动绑定的视频生成模型,可控制多个智能体在视频游戏中互动。

主要贡献

  • 提出了ActionParty模型,用于解决视频扩散模型中多主体行动绑定问题
  • 引入主体状态令牌,持久捕获场景中每个主体的状态
  • 实现了首个可在46个环境中同时控制多达7个玩家的视频世界模型

方法论

通过空间偏置机制联合建模状态令牌和视频潜在变量,分离全局视频帧渲染和个体行动控制的主体更新。

原文摘要

Recent advances in video diffusion have enabled the development of "world models" capable of simulating interactive environments. However, these models are largely restricted to single-agent settings, failing to control multiple agents simultaneously in a scene. In this work, we tackle a fundamental issue of action binding in existing video diffusion models, which struggle to associate specific actions with their corresponding subjects. For this purpose, we propose ActionParty, an action controllable multi-subject world model for generative video games. It introduces subject state tokens, i.e. latent variables that persistently capture the state of each subject in the scene. By jointly modeling state tokens and video latents with a spatial biasing mechanism, we disentangle global video frame rendering from individual action-controlled subject updates. We evaluate ActionParty on the Melting Pot benchmark, demonstrating the first video world model capable of controlling up to seven players simultaneously across 46 diverse environments. Our results show significant improvements in action-following accuracy and identity consistency, while enabling robust autoregressive tracking of subjects through complex interactions.

标签

视频生成 世界模型 多智能体 行动控制 强化学习

arXiv 分类

cs.CV cs.AI cs.LG