Multimodal Learning 相关度: 9/10

EmoSpace: Fine-Grained Emotion Prototype Learning for Immersive Affective Content Generation

Bingyuan Wang, Xingbei Chen, Zongyang Qiu, Linping Yuan, Zeyu Wang

arXiv: 2602.11658v1 发布: 2026-02-12 更新: 2026-02-12

下载 PDF arXiv 页面

AI 摘要

EmoSpace提出了一种基于视觉-语言对齐的细粒度情感原型学习框架，用于生成沉浸式情感内容。

主要贡献

提出EmoSpace框架，实现情感感知内容生成
引入动态、可解释的情感原型，实现细粒度情感控制
用户研究验证VR环境对情感感知的影响

方法论

采用分层情感表示，学习动态情感原型，通过多原型引导、时间融合和注意力重加权实现可控生成。

原文摘要

Emotion is important for creating compelling virtual reality (VR) content. Although some generative methods have been applied to lower the barrier to creating emotionally rich content, they fail to capture the nuanced emotional semantics and the fine-grained control essential for immersive experiences. To address these limitations, we introduce EmoSpace, a novel framework for emotion-aware content generation that learns dynamic, interpretable emotion prototypes through vision-language alignment. We employ a hierarchical emotion representation with rich learnable prototypes that evolve during training, enabling fine-grained emotional control without requiring explicit emotion labels. We develop a controllable generation pipeline featuring multi-prototype guidance, temporal blending, and attention reweighting that supports diverse applications, including emotional image outpainting, stylized generation, and emotional panorama generation for VR environments. Our experiments demonstrate the superior performance of EmoSpace over existing methods in both qualitative and quantitative evaluations. Additionally, we present a comprehensive user study investigating how VR environments affect emotional perception compared to desktop settings. Our work facilitates immersive visual content generation with fine-grained emotion control and supports applications like therapy, education, storytelling, artistic creation, and cultural preservation. Code and models will be made publicly available.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类