Multimodal Learning 相关度: 9/10

Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models

Jianghao Yin, Qin Chen, Kedi Chen, Jie Zhou, Xingjiao Wu, Liang He

arXiv: 2602.21704v1 发布: 2026-02-25 更新: 2026-02-25

下载 PDF arXiv 页面

AI 摘要

提出动态多模态激活引导方法，通过语义感知的干预缓解大型视觉语言模型中的幻觉问题。

主要贡献

揭示LVLM中真实性和视觉感知能力激活模式的差异
提出动态多模态激活引导方法（Dynamic Multimodal Activation Steering）
构建基于语义的真实性引导向量数据库

方法论

分析LVLM激活模式，构建语义引导向量数据库和视觉感知引导向量，动态选择引导向量干预模型。

原文摘要

Large Vision-Language Models (LVLMs) exhibit outstanding performance on vision-language tasks but struggle with hallucination problems. Through in-depth analysis of LVLM activation patterns, we reveal two key findings: 1) truthfulness and visual perception capabilities predominantly engage different subsets of attention heads within the model architecture; and 2) truthfulness steering vectors vary significantly across different semantic contexts. Based on these observations, we propose Dynamic Multimodal Activation Steering, a training-free approach for hallucination mitigation. Our method constructs a semantic-based truthfulness steering vector database and computes visual perception steering vectors, enabling context-aware interventions during inference by dynamically selecting the most relevant steering vectors based on input semantic similarity and applying them to the most influential attention heads. We conduct comprehensive experiments across multiple models and datasets, demonstrating that our approach significantly enhances model performance, outperforming existing state-of-the-art methods.

arXiv 分类

cs.CV cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类