Multimodal Learning 相关度: 9/10

DQE-CIR: Distinctive Query Embeddings through Learnable Attribute Weights and Target Relative Negative Sampling in Composed Image Retrieval

Geon Park, Ji-Hoon Park, Seong-Whan Lee

arXiv: 2603.04037v1 发布: 2026-03-04 更新: 2026-03-04

下载 PDF arXiv 页面

AI 摘要

针对组合图像检索的判别性查询嵌入，提出可学习属性权重和目标相对负采样。

主要贡献

提出可学习的属性权重，强调与修改文本相关的视觉特征。
引入目标相对负采样，选择信息量更大的负样本。
提高查询判别力，减少语义相似但无关候选造成的混淆。

方法论

通过可学习属性权重和目标相对负采样，显式建模目标相对相关性，学习判别性查询嵌入。

原文摘要

Composed image retrieval (CIR) addresses the task of retrieving a target image by jointly interpreting a reference image and a modification text that specifies the intended change. Most existing methods are still built upon contrastive learning frameworks that treat the ground truth image as the only positive instance and all remaining images as negatives. This strategy inevitably introduces relevance suppression, where semantically related yet valid images are incorrectly pushed away, and semantic confusion, where different modification intents collapse into overlapping regions of the embedding space. As a result, the learned query representations often lack discriminativeness, particularly at fine-grained attribute modifications. To overcome these limitations, we propose distinctive query embeddings through learnable attribute weights and target relative negative sampling (DQE-CIR), a method designed to learn distinctive query embeddings by explicitly modeling target relative relevance during training. DQE-CIR incorporates learnable attribute weighting to emphasize distinctive visual features conditioned on the modification text, enabling more precise feature alignment between language and vision. Furthermore, we introduce target relative negative sampling, which constructs a target relative similarity distribution and selects informative negatives from a mid-zone region that excludes both easy negatives and ambiguous false negatives. This strategy enables more reliable retrieval for fine-grained attribute changes by improving query discriminativeness and reducing confusion caused by semantically similar but irrelevant candidates.

arXiv 分类

cs.CV cs.AI cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类