Multimodal Learning 相关度: 9/10

AdaIAT: Adaptively Increasing Attention to Generated Text to Alleviate Hallucinations in LVLM

Li'an Zhong, Ziqiang He, Jibin Zheng, Jin Li, Z. Jane Wang, Xiangui Kang
arXiv: 2603.04908v1 发布: 2026-03-05 更新: 2026-03-05

AI 摘要

AdaIAT通过自适应地增强生成文本的注意力来减轻LVLM中的幻觉问题,并保持语言连贯性。

主要贡献

  • 提出Attention to Generated Text (IAT)方法,减轻幻觉。
  • 提出Adaptive IAT (AdaIAT),自适应控制干预时间和幅度。
  • 实验证明AdaIAT有效降低幻觉率,同时保持语言性能和预测能力。

方法论

分析注意力模式发现真实对象比幻觉对象分配更高注意力给生成文本,据此设计IAT和AdaIAT自适应增强文本注意力。

原文摘要

Hallucination has been a significant impediment to the development and application of current Large Vision-Language Models (LVLMs). To mitigate hallucinations, one intuitive and effective way is to directly increase attention weights to image tokens during inference. Although this effectively reduces the hallucination rate, it often induces repetitive descriptions. To address this, we first conduct an analysis of attention patterns and reveal that real object tokens tend to assign higher attention to the generated text than hallucinated ones. This inspires us to leverage the generated text, which contains instruction-related visual information and contextual knowledge, to alleviate hallucinations while maintaining linguistic coherence. We therefore propose Attention to Generated Text (IAT) and demonstrate that it significantly reduces the hallucination rate while avoiding repetitive descriptions. To prevent naive amplification from impairing the inherent prediction capabilities of LVLMs, we further explore Adaptive IAT (AdaIAT) that employs a layer-wise threshold to control intervention time and fine-grained amplification magnitude tailored to the characteristics of each attention head. Both analysis and experiments demonstrate the effectiveness of AdaIAT. Results of several LVLMs show that AdaIAT effectively alleviates hallucination (reducing hallucination rates $C_S$ and $C_I$ on LLaVA-1.5 by 35.8% and 37.1%, respectively) while preserving linguistic performance and prediction capability, achieving an attractive trade-off.

标签

LVLM Hallucination Attention Mechanism Multimodal Learning

arXiv 分类

cs.CV