Multimodal Learning 相关度: 9/10

WARM-CAT: : Warm-Started Test-Time Comprehensive Knowledge Accumulation for Compositional Zero-Shot Learning

Xudong Yan, Songhe Feng, Jiaxin Wang, Xin Su, Yi Jin
arXiv: 2602.23114v1 发布: 2026-02-26 更新: 2026-02-26

AI 摘要

WARM-CAT通过积累无监督知识和动态调整原型,解决组合零样本学习中的分布偏移问题。

主要贡献

  • 提出 Warm-Started Test-Time Comprehensive Knowledge Accumulation (WARM-CAT) 方法
  • 设计自适应更新权重控制原型调整程度,灵活适应分布偏移
  • 引入动态优先级队列,利用历史图像获取视觉原型
  • 提出新的 CZSL 基准数据集 C-Fashion 并改进 MIT-States

方法论

利用无监督数据积累文本和视觉知识,更新多模态原型;使用自适应权重和动态优先级队列优化原型。

原文摘要

Compositional Zero-Shot Learning (CZSL) aims to recognize novel attribute-object compositions based on the knowledge learned from seen ones. Existing methods suffer from performance degradation caused by the distribution shift of label space at test time, which stems from the inclusion of unseen compositions recombined from attributes and objects. To overcome the challenge, we propose a novel approach that accumulates comprehensive knowledge in both textual and visual modalities from unsupervised data to update multimodal prototypes at test time. Building on this, we further design an adaptive update weight to control the degree of prototype adjustment, enabling the model to flexibly adapt to distribution shift during testing. Moreover, a dynamic priority queue is introduced that stores high-confidence images to acquire visual prototypes from historical images for inference. Since the model tends to favor compositions already stored in the queue during testing, we warm-start the queue by initializing it with training images for visual prototypes of seen compositions and generating unseen visual prototypes using the mapping learned between seen and unseen textual prototypes. Considering the semantic consistency of multimodal knowledge, we align textual and visual prototypes by multimodal collaborative representation learning. To provide a more reliable evaluation for CZSL, we introduce a new benchmark dataset, C-Fashion, and refine the widely used but noisy MIT-States dataset. Extensive experiments indicate that our approach achieves state-of-the-art performance on four benchmark datasets under both closed-world and open-world settings. The source code and datasets are available at https://github.com/xud-yan/WARM-CAT .

标签

Compositional Zero-Shot Learning Test-Time Adaptation Multimodal Learning Knowledge Accumulation

arXiv 分类

cs.CV