Multimodal Learning 相关度: 10/10

HulluEdit: Single-Pass Evidence-Consistent Subspace Editing for Mitigating Hallucinations in Large Vision-Language Models

Yangguang Lin, Quan Fang, Yufei Li, Jiachen Sun, Junyu Gao, Jitao Sang
arXiv: 2602.22727v1 发布: 2026-02-26 更新: 2026-02-26

AI 摘要

HulluEdit通过正交子空间编辑,单次推理有效减少大视觉语言模型中的对象幻觉,同时保持通用能力。

主要贡献

  • 提出HulluEdit,一种单次、无参考的幻觉缓解框架
  • 引入正交子空间编辑,将隐藏状态分解为视觉证据、先验冲突和残差不确定性
  • 数学保证编辑先验子空间不影响视觉成分

方法论

将模型隐藏状态分解为正交子空间,选择性抑制幻觉模式,同时保留视觉基础,单次推理实现。

原文摘要

Object hallucination in Large Vision-Language Models (LVLMs) significantly hinders their reliable deployment. Existing methods struggle to balance efficiency and accuracy: they often require expensive reference models and multiple forward passes, or apply static edits that risk suppressing genuine visual evidence. To address this, we introduce HulluEdit, a single-pass, reference-free intervention framework. Our core innovation is orthogonal subspace editing: we decompose the hidden states of the model into orthogonal subspaces - visual evidence, conflicting priors, and residual uncertainty - enabling selective suppression of hallucinatory patterns without interfering with visual grounding. This approach mathematically guarantees that edits applied to the prior subspace leave the visual component entirely unaffected. Extensive experiments show that HulluEdit achieves state-of-the-art hallucination reduction on benchmarks including POPE and CHAIR across diverse architectures, while preserving general capabilities on MME and maintaining efficient inference. Our method consistently outperforms contrastive decoding and static subspace editing baselines, offering a new pathway toward more trustworthy LVLMs.

标签

幻觉缓解 视觉语言模型 子空间编辑

arXiv 分类

cs.CV