Multimodal Learning 相关度: 9/10

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

Tongfei Chen, Shuo Yang, Yuguang Yang, Linlin Yang, Runtang Guo, Changbai Li, He Long, Chunyu Xie, Dawei Leng, Baochang Zhang
arXiv: 2602.22740v1 发布: 2026-02-26 更新: 2026-02-26

AI 摘要

AMLRIS通过对齐感知掩码学习提升指代图像分割效果,关注可信线索。

主要贡献

  • 提出Alignment-Aware Masked Learning (AML)训练策略
  • 显式估计像素级视觉-语言对齐
  • 过滤对齐不良区域,关注可信线索

方法论

通过估计视觉-语言对齐,屏蔽对齐不良区域,优化指代图像分割模型。

原文摘要

Referring Image Segmentation (RIS) aims to segment an object in an image identified by a natural language expression. The paper introduces Alignment-Aware Masked Learning (AML), a training strategy to enhance RIS by explicitly estimating pixel-level vision-language alignment, filtering out poorly aligned regions during optimization, and focusing on trustworthy cues. This approach results in state-of-the-art performance on RefCOCO datasets and also enhances robustness to diverse descriptions and scenarios

标签

Referring Image Segmentation Vision-Language Alignment Masked Learning

arXiv 分类

cs.CV cs.AI