Multimodal Learning 相关度: 9/10

Emergent Morphing Attack Detection in Open Multi-modal Large Language Models

Marija Ivanovska, Vitomir Štruc

arXiv: 2602.15461v1 发布: 2026-02-17 更新: 2026-02-17

下载 PDF arXiv 页面

AI 摘要

首次系统评估开源多模态大语言模型在人脸变形攻击检测中的零样本能力，效果显著。

主要贡献

首次系统性评估开源MLLM在人脸变形攻击检测中的零样本性能
证明了MLLM在无需微调的情况下具备检测人脸变形攻击的能力
LLaVA1.6-Mistral-7B 在此任务中表现超越了特定任务的基线模型

方法论

使用公开权重和标准化协议，对多种变形技术进行单图MAD的零样本评估。

原文摘要

Face morphing attacks threaten biometric verification, yet most morphing attack detection (MAD) systems require task-specific training and generalize poorly to unseen attack types. Meanwhile, open-source multimodal large language models (MLLMs) have demonstrated strong visual-linguistic reasoning, but their potential in biometric forensics remains underexplored. In this paper, we present the first systematic zero-shot evaluation of open-source MLLMs for single-image MAD, using publicly available weights and a standardized, reproducible protocol. Across diverse morphing techniques, many MLLMs show non-trivial discriminative ability without any fine-tuning or domain adaptation, and LLaVA1.6-Mistral-7B achieves state-of-the-art performance, surpassing highly competitive task-specific MAD baselines by at least 23% in terms of equal error rate (EER). The results indicate that multimodal pretraining can implicitly encode fine-grained facial inconsistencies indicative of morphing artifacts, enabling zero-shot forensic sensitivity. Our findings position open-source MLLMs as reproducible, interpretable, and competitive foundations for biometric security and forensic image analysis. This emergent capability also highlights new opportunities to develop state-of-the-art MAD systems through targeted fine-tuning or lightweight adaptation, further improving accuracy and efficiency while preserving interpretability. To support future research, all code and evaluation protocols will be released upon publication.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类