Reading the Mood Behind Words: Integrating Prosody-Derived Emotional Context into Socially Responsive VR Agents
AI 摘要
提出一种情感感知的VR交互管线,提升虚拟代理对话质量。
主要贡献
- 将语音情感作为对话上下文
- 优化LLM驱动的VR代理响应
- 实验验证显著提升用户体验
方法论
构建实时语音情感识别模型,将情感标签注入LLM对话上下文,并进行VR实验评估。
原文摘要
In VR interactions with embodied conversational agents, users' emotional intent is often conveyed more by how something is said than by what is said. However, most VR agent pipelines rely on speech-to-text processing, discarding prosodic cues and often producing emotionally incongruent responses despite correct semantics. We propose an emotion-context-aware VR interaction pipeline that treats vocal emotion as explicit dialogue context in an LLM-based conversational agent. A real-time speech emotion recognition model infers users' emotional states from prosody, and the resulting emotion labels are injected into the agent's dialogue context to shape response tone and style. Results from a within-subjects VR study (N=30) show significant improvements in dialogue quality, naturalness, engagement, rapport, and human-likeness, with 93.3% of participants preferring the emotion-aware agent.