Multimodal Learning 相关度: 9/10

Visual Distraction Undermines Moral Reasoning in Vision-Language Models

Xinyi Yang, Chenheng Xu, Weijun Hong, Ce Mo, Qian Wang, Fang Fang, Yixin Zhu
arXiv: 2603.16445v1 发布: 2026-03-17 更新: 2026-03-17

AI 摘要

视觉输入会干扰视觉语言模型的道德推理,绕过基于文本的安全机制,造成安全隐患。

主要贡献

  • 揭示了视觉输入对视觉语言模型道德推理的负面影响
  • 提出了多模态道德困境模拟(MDS)基准测试
  • 发现视觉模态会激活直觉式路径,覆盖基于文本的安全推理模式

方法论

构建多模态基准MDS,正交操控视觉和文本变量,评估视觉语言模型在道德推理上的表现,并分析其推理机制。

原文摘要

Moral reasoning is fundamental to safe Artificial Intelligence (AI), yet ensuring its consistency across modalities becomes critical as AI systems evolve from text-based assistants to embodied agents. Current safety techniques demonstrate success in textual contexts, but concerns remain about generalization to visual inputs. Existing moral evaluation benchmarks rely on textonly formats and lack systematic control over variables that influence moral decision-making. Here we show that visual inputs fundamentally alter moral decision-making in state-of-the-art (SOTA) Vision-Language Models (VLMs), bypassing text-based safety mechanisms. We introduce Moral Dilemma Simulation (MDS), a multimodal benchmark grounded in Moral Foundation Theory (MFT) that enables mechanistic analysis through orthogonal manipulation of visual and contextual variables. The evaluation reveals that the vision modality activates intuition-like pathways that override the more deliberate and safer reasoning patterns observed in text-only contexts. These findings expose critical fragilities where language-tuned safety filters fail to constrain visual processing, demonstrating the urgent need for multimodal safety alignment.

标签

视觉语言模型 道德推理 多模态学习 安全对齐

arXiv 分类

cs.AI