Multimodal Learning 相关度: 9/10

MLLM-based Textual Explanations for Face Comparison

Redwan Sony, Anil K Jain, Ross Arun

arXiv: 2603.16629v1 发布: 2026-03-17 更新: 2026-03-17

下载 PDF arXiv 页面

AI 摘要

分析了MLLM在人脸识别解释上的可靠性，发现其解释存在幻觉问题，并提出了评估框架。

主要贡献

系统分析MLLM生成的人脸识别解释的可靠性
揭示了MLLM解释中存在的幻觉问题
提出了基于似然比的解释评估框架

方法论

使用IJB-S数据集，通过实验分析MLLM生成的解释，并结合传统人脸识别系统信息进行评估。

原文摘要

Multimodal Large Language Models (MLLMs) have recently been proposed as a means to generate natural-language explanations for face recognition decisions. While such explanations facilitate human interpretability, their reliability on unconstrained face images remains underexplored. In this work, we systematically analyze MLLM-generated explanations for the unconstrained face verification task on the challenging IJB-S dataset, with a particular focus on extreme pose variation and surveillance imagery. Our results show that even when MLLMs produce correct verification decisions, the accompanying explanations frequently rely on non-verifiable or hallucinated facial attributes that are not supported by visual evidence. We further study the effect of incorporating information from traditional face recognition systems, viz., scores and decisions, alongside the input images. Although such information improves categorical verification performance, it does not consistently lead to faithful explanations. To evaluate the explanations beyond decision accuracy, we introduce a likelihood-ratio-based framework that measures the evidential strength of textual explanations. Our findings highlight fundamental limitations of current MLLMs for explainable face recognition and underscore the need for a principled evaluation of reliable and trustworthy explanations in biometric applications. Code is available at https://github.com/redwankarimsony/LR-MLLMFR-Explainability.

arXiv 分类

cs.CV cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类