Multimodal Learning 相关度: 9/10

NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries

Kanon Amemiya, Daichi Yashima, Kei Katsumata, Takumi Komatsu, Ryosuke Korekata, Seitaro Otsuki, Komei Sugiura
arXiv: 2603.05446v1 发布: 2026-03-05 更新: 2026-03-05

AI 摘要

NaiLIA提出了一种多模态美甲设计检索方法,能更好地理解复杂的用户意图和颜色偏好。

主要贡献

  • 提出NaiLIA多模态检索方法
  • 引入基于置信度得分的松弛损失
  • 构建包含详细标注的美甲设计数据集

方法论

利用深度学习模型,结合用户提供的文本描述和颜色信息,进行美甲图像的检索。

原文摘要

We focus on the task of retrieving nail design images based on dense intent descriptions, which represent multi-layered user intent for nail designs. This is challenging because such descriptions specify unconstrained painted elements and pre-manufactured embellishments as well as visual characteristics, themes, and overall impressions. In addition to these descriptions, we assume that users provide palette queries by specifying zero or more colors via a color picker, enabling the expression of subtle and continuous color nuances. Existing vision-language foundation models often struggle to incorporate such descriptions and palettes. To address this, we propose NaiLIA, a multimodal retrieval method for nail design images, which comprehensively aligns with dense intent descriptions and palette queries during retrieval. Our approach introduces a relaxed loss based on confidence scores for unlabeled images that can align with the descriptions. To evaluate NaiLIA, we constructed a benchmark consisting of 10,625 images collected from people with diverse cultural backgrounds. The images were annotated with long and dense intent descriptions given by over 200 annotators. Experimental results demonstrate that NaiLIA outperforms standard methods.

标签

多模态学习 图像检索 视觉语言 美甲设计

arXiv 分类

cs.CV