Multimodal Learning 相关度: 9/10

FlowComposer: Composable Flows for Compositional Zero-Shot Learning

Zhenqi He, Lin Li, Long Chen

arXiv: 2603.16641v1 发布: 2026-03-17 更新: 2026-03-17

下载 PDF arXiv 页面

AI 摘要

FlowComposer提出了一种基于流匹配的CZSL框架，显式融合属性和对象特征，提升模型泛化能力。

主要贡献

提出了FlowComposer框架，显式融合属性和对象特征。
设计了泄漏引导的增强方案，利用残余特征。
在多个CZSL基准测试中取得了显著改进。

方法论

通过学习属性和对象文本嵌入的原始流，以及可学习的Composer融合速度场，构建组合流，进行特征融合。

原文摘要

Compositional zero-shot learning (CZSL) aims to recognize unseen attribute-object compositions by recombining primitives learned from seen pairs. Recent CZSL methods built on vision-language models (VLMs) typically adopt parameter-efficient fine-tuning (PEFT). They apply visual disentanglers for decomposition and manipulate token-level prompts or prefixes to encode compositions. However, such PEFT-based designs suffer from two fundamental limitations: (1) Implicit Composition Construction, where composition is realized only via token concatenation or branch-wise prompt tuning rather than an explicit operation in the embedding space; (2) Remained Feature Entanglement, where imperfect disentanglement leaves attribute, object, and composition features mutually contaminated. Together, these issues limit the generalization ability of current CZSL models. In this paper, we are the first to systematically study flow matching for CZSL and introduce FlowComposer, a model-agnostic framework that learns two primitive flows to transport visual features toward attribute and object text embeddings, and a learnable Composer that explicitly fuses their velocity fields into a composition flow. To exploit the inevitable residual entanglement, we further devise a leakage-guided augmentation scheme that reuses leaked features as auxiliary signals. We thoroughly evaluate FlowComposer on three public CZSL benchmarks by integrating it as a plug-and-play component into various baselines, consistently achieving significant improvements.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类