Text-to-Stage: Spatial Layouts from Long-form Narratives
AI 摘要
论文研究了利用语言模型从文本推断舞台布局,并提出了一种训练和评估方法。
主要贡献
- 提出了一种从非结构化文本生成舞台布局的方法
- 设计了一个受戏剧启发的可验证评估套件
- 结合拒绝SFT和RL的训练策略
方法论
使用Best-of-N采样进行拒绝SFT,并通过GRPO从可验证的奖励中进行RL,优化模型。
原文摘要
In this work, we probe the ability of a language model to demonstrate spatial reasoning from unstructured text, mimicking human capabilities and automating a process that benefits many downstream media applications. Concretely, we study the narrative-to-play task: inferring stage-play layouts (scenes, speaker positions, movements, and room types) from text that lacks explicit spatial, positional, or relational cues. We then introduce a dramaturgy-inspired deterministic evaluation suite and, finally, a training and inference recipe that combines rejection SFT using Best-of-N sampling with RL from verifiable rewards via GRPO. Experiments on a text-only corpus of classical English literature demonstrate improvements over vanilla models across multiple metrics (character attribution, spatial plausibility, and movement economy), as well as alignment with an LLM-as-a-judge and subjective human preferences.