Multimodal Learning 相关度: 10/10

Mitigating Object Hallucinations in LVLMs via Attention Imbalance Rectification

Han Sun, Qin Li, Peixin Wang, Min Zhang

arXiv: 2603.24058v1 发布: 2026-03-25 更新: 2026-03-25

下载 PDF arXiv 页面

AI 摘要

论文通过纠正视觉语言模型中的注意力失衡来减轻对象幻觉问题。

主要贡献

提出注意力失衡概念，量化并可视化注意力差异。
提出注意力失衡校正(AIR)方法，通过重分配注意力权重缓解幻觉。
在主流LVLM和基准测试中验证了AIR的有效性，提升了性能。

方法论

通过经验研究发现注意力失衡与对象幻觉的因果关系，提出AIR方法，在解码阶段重分配注意力权重。

原文摘要

Object hallucination in Large Vision-Language Models (LVLMs) severely compromises their reliability in real-world applications, posing a critical barrier to their deployment in high-stakes scenarios such as autonomous driving and medical image analysis. Through systematic empirical investigation, we identify that the imbalanced attention allocation, both across modalities (i.e., vision and language) and within modalities (among individual tokens), exhibits a strong causal correlation with the occurrence of object hallucination. Leveraging this insight, we introduce a novel concept termed attention imbalance, which not only quantifies the degree of attention disparity but also visually delineates the underlying patterns (e.g., over-attentiveness to irrelevant language tokens or under-attentiveness to discriminative visual features) that drive object hallucination. To mitigate object hallucination, we further propose Attention Imbalance Rectification (AIR), a lightweight decoding-time intervention method that reallocates attention weights and adjusts attention distributions to rectify modality-wise and token-wise imbalances. Extensive evaluations on four mainstream LVLMs and three benchmarks (CHAIR, POPE, and MM-Vet) with seven baselines demonstrate that AIR consistently reduces object hallucination rates, achieving up to a 35.1% reduction compared to the baselines, while improving up to 15.9% of LVLMs' general capability across diverse vision-language tasks.

arXiv 分类

cs.CV cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类