Multimodal Learning 相关度: 9/10

EAGLE: Expert-Augmented Attention Guidance for Tuning-Free Industrial Anomaly Detection in Multimodal Large Language Models

Xiaomeng Peng, Xilang Huang, Seon Han Choi
arXiv: 2602.17419v1 发布: 2026-02-19 更新: 2026-02-19

AI 摘要

EAGLE利用专家模型引导MLLM,无需微调即可提升工业异常检测的准确性和可解释性。

主要贡献

  • 提出EAGLE框架,无需微调即可提高MLLM异常检测性能
  • 利用专家模型指导MLLM关注异常区域,提升可解释性
  • 分析了EAGLE对MLLM内部注意力分布的影响

方法论

EAGLE通过集成专家模型的输出,引导MLLM的注意力机制,使其更关注异常区域,从而提高检测准确性和可解释性。

原文摘要

Industrial anomaly detection is important for smart manufacturing, but many deep learning approaches produce only binary decisions and provide limited semantic explanations. Multimodal large language models (MLLMs) can potentially generate fine-grained, language-based analyses, yet existing methods often require costly fine-tuning and do not consistently improve anomaly detection accuracy compared to lightweight specialist detectors. We propose expert-augmented attention guidance for industrial anomaly detection in MLLMs (EAGLE), a tuning-free framework that integrates outputs from expert model to guide MLLMs toward both accurate detection and interpretable anomaly descriptions. We further study how EAGLE affects MLLMs internals by examining the attention distribution of MLLMs to the anomalous image regions in the intermediate layers. We observe that successful anomaly detection is associated with increased attention concentration on anomalous regions, and EAGLE tends to encourage this alignment. Experiments on MVTec-AD and VisA show that EAGLE improves anomaly detection performance across multiple MLLMs without any parameter updates, achieving results comparable to fine-tuning based methods. Code is available at \href{https://github.com/shengtun/Eagle}{https://github.com/shengtun/Eagle}

标签

MLLM 异常检测 工业应用 注意力机制 无微调

arXiv 分类

cs.CV