AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation
arXiv: 2602.22740v1
发布: 2026-02-26
更新: 2026-02-26
AI 摘要
AMLRIS通过对齐感知掩码学习提升指代图像分割效果,关注可信线索。
主要贡献
- 提出Alignment-Aware Masked Learning (AML)训练策略
- 显式估计像素级视觉-语言对齐
- 过滤对齐不良区域,关注可信线索
方法论
通过估计视觉-语言对齐,屏蔽对齐不良区域,优化指代图像分割模型。
原文摘要
Referring Image Segmentation (RIS) aims to segment an object in an image identified by a natural language expression. The paper introduces Alignment-Aware Masked Learning (AML), a training strategy to enhance RIS by explicitly estimating pixel-level vision-language alignment, filtering out poorly aligned regions during optimization, and focusing on trustworthy cues. This approach results in state-of-the-art performance on RefCOCO datasets and also enhances robustness to diverse descriptions and scenarios