Multimodal Learning 相关度: 9/10

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

Tongfei Chen, Shuo Yang, Yuguang Yang, Linlin Yang, Runtang Guo, Changbai Li, He Long, Chunyu Xie, Dawei Leng, Baochang Zhang

arXiv: 2602.22740v1 发布: 2026-02-26 更新: 2026-02-26

下载 PDF arXiv 页面

AI 摘要

AMLRIS通过对齐感知掩码学习提升指代图像分割效果，关注可信线索。

主要贡献

提出Alignment-Aware Masked Learning (AML)训练策略
显式估计像素级视觉-语言对齐
过滤对齐不良区域，关注可信线索

方法论

通过估计视觉-语言对齐，屏蔽对齐不良区域，优化指代图像分割模型。

原文摘要

Referring Image Segmentation (RIS) aims to segment an object in an image identified by a natural language expression. The paper introduces Alignment-Aware Masked Learning (AML), a training strategy to enhance RIS by explicitly estimating pixel-level vision-language alignment, filtering out poorly aligned regions during optimization, and focusing on trustworthy cues. This approach results in state-of-the-art performance on RefCOCO datasets and also enhances robustness to diverse descriptions and scenarios

arXiv 分类

cs.CV cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类