CORAL: Correspondence Alignment for Improved Virtual Try-On
AI 摘要
CORAL通过显式对齐人-物对应关系提升虚拟试穿效果,改善细节保留。
主要贡献
- 分析了Diffusion Transformer中3D attention的对应关系
- 提出了Correspondence Alignment (CORAL) 框架
- 提出了基于VLM的评估协议
方法论
CORAL通过对应关系蒸馏损失对齐注意力,并利用熵最小化损失锐化注意力分布,从而提升虚拟试穿效果。
原文摘要
Existing methods for Virtual Try-On (VTON) often struggle to preserve fine garment details, especially in unpaired settings where accurate person-garment correspondence is required. These methods do not explicitly enforce person-garment alignment and fail to explain how correspondence emerges within Diffusion Transformers (DiTs). In this paper, we first analyze full 3D attention in DiT-based architecture and reveal that the person-garment correspondence critically depends on precise person-garment query-key matching within the full 3D attention. Building on this insight, we then introduce CORrespondence ALignment (CORAL), a DiT-based framework that explicitly aligns query-key matching with robust external correspondences. CORAL integrates two complementary components: a correspondence distillation loss that aligns reliable matches with person-garment attention, and an entropy minimization loss that sharpens the attention distribution. We further propose a VLM-based evaluation protocol to better reflect human preference. CORAL consistently improves over the baseline, enhancing both global shape transfer and local detail preservation. Extensive ablations validate our design choices.