Multimodal Learning 相关度: 8/10

Code2Worlds: Empowering Coding LLMs for 4D World Generation

Yi Zhang, Yunshuang Wang, Zeyu Zhang, Hao Tang

arXiv: 2602.11757v1 发布: 2026-02-12 更新: 2026-02-12

下载 PDF arXiv 页面

AI 摘要

Code2Worlds框架利用编码LLM生成具有物理规律的动态4D世界，解决多尺度和语义物理鸿沟问题。

主要贡献

提出了双流架构解耦对象生成与环境编排
建立了物理感知闭环机制迭代优化模拟代码
在Code4D基准测试上超越现有方法

方法论

利用双流架构和物理感知闭环机制，将4D生成建模为语言到模拟代码的生成过程。

原文摘要

Achieving spatial intelligence requires moving beyond visual plausibility to build world simulators grounded in physical laws. While coding LLMs have advanced static 3D scene generation, extending this paradigm to 4D dynamics remains a critical frontier. This task presents two fundamental challenges: multi-scale context entanglement, where monolithic generation fails to balance local object structures with global environmental layouts; and a semantic-physical execution gap, where open-loop code generation leads to physical hallucinations lacking dynamic fidelity. We introduce Code2Worlds, a framework that formulates 4D generation as language-to-simulation code generation. First, we propose a dual-stream architecture that disentangles retrieval-augmented object generation from hierarchical environmental orchestration. Second, to ensure dynamic fidelity, we establish a physics-aware closed-loop mechanism in which a PostProcess Agent scripts dynamics, coupled with a VLM-Motion Critic that performs self-reflection to iteratively refine simulation code. Evaluations on the Code4D benchmark show Code2Worlds outperforms baselines with a 41% SGS gain and 49% higher Richness, while uniquely generating physics-aware dynamics absent in prior static methods. Code: https://github.com/AIGeeksGroup/Code2Worlds. Website: https://aigeeksgroup.github.io/Code2Worlds.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类