Multimodal Learning 相关度: 8/10

BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation

Zihao Zhu, Ruotong Wang, Siwei Lyu, Min Zhang, Baoyuan Wu

arXiv: 2603.02816v1 发布: 2026-03-03 更新: 2026-03-03

下载 PDF arXiv 页面

AI 摘要

BrandFusion提出一个多智能体框架，用于在文生视频中无缝集成品牌，提升商业价值。

主要贡献

提出了在文生视频中无缝集成品牌的新任务
提出了BrandFusion多智能体框架，包含离线品牌知识库构建和在线提示优化阶段
实验证明BrandFusion在多个指标上优于基线方法

方法论

构建品牌知识库，利用多个智能体迭代优化用户提示，结合知识库和上下文跟踪，确保品牌可见性和语义一致性。

原文摘要

The rapid advancement of text-to-video (T2V) models has revolutionized content creation, yet their commercial potential remains largely untapped. We introduce, for the first time, the task of seamless brand integration in T2V: automatically embedding advertiser brands into prompt-generated videos while preserving semantic fidelity to user intent. This task confronts three core challenges: maintaining prompt fidelity, ensuring brand recognizability, and achieving contextually natural integration. To address them, we propose BrandFusion, a novel multi-agent framework comprising two synergistic phases. In the offline phase (advertiser-facing), we construct a Brand Knowledge Base by probing model priors and adapting to novel brands via lightweight fine-tuning. In the online phase (user-facing), five agents jointly refine user prompts through iterative refinement, leveraging the shared knowledge base and real-time contextual tracking to ensure brand visibility and semantic alignment. Experiments on 18 established and 2 custom brands across multiple state-of-the-art T2V models demonstrate that BrandFusion significantly outperforms baselines in semantic preservation, brand recognizability, and integration naturalness. Human evaluations further confirm higher user satisfaction, establishing a practical pathway for sustainable T2V monetization.

arXiv 分类

cs.CV cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类