Multimodal Learning 相关度: 9/10

Hallucination-aware intermediate representation edit in large vision-language models

Wei Suo, Hanzu Zhang, Lijun Zhang, Ji Ma, Peng Wang, Yanning Zhang
arXiv: 2603.29405v1 发布: 2026-03-31 更新: 2026-03-31

AI 摘要

提出一种幻觉感知的中间表示编辑框架,有效且高效地消除多模态大模型的幻觉问题。

主要贡献

  • 提出幻觉感知的中间表示检测和编辑框架
  • 在现有benchmark上取得SOTA性能
  • 计算成本低,具有强大的幻觉控制能力

方法论

动态检测幻觉表示,并对其进行消除幻觉的编辑,从而减少输出中的幻觉。

原文摘要

Large Vision-Language Models have demonstrated exceptional performance in multimodal reasoning and complex scene understanding. However, these models still face significant hallucination issues, where outputs contradict visual facts. Recent research on hallucination mitigation has focused on retraining methods and Contrastive Decoding (CD) methods. While both methods perform well, retraining methods require substantial training resources, and CD methods introduce dual inference overhead. These factors hinder their practical applicability. To address the above issue, we propose a framework for dynamically detecting hallucination representations and performing hallucination-eliminating edits on these representations. With minimal additional computational cost, we achieve state-of-the-art performance on existing benchmarks. Extensive experiments demonstrate the effectiveness of our approach, highlighting its efficient and robust hallucination elimination capability and its powerful controllability over hallucinations. Code is available at https://github.com/ASGO-MM/HIRE

标签

幻觉消除 多模态大模型 中间表示 视觉语言

arXiv 分类

cs.CV cs.AI