Membership Inference Attacks against Large Audio Language Models
AI 摘要
首次系统评估大型音频语言模型(LALM)的成员推断攻击(MIA),并提出了避免虚假相关性的评估方法。
主要贡献
- 揭示了音频数据中的分布偏移会导致LALM的虚假MIA性能。
- 提出了基于文本、频谱和韵律特征的多模态盲基线,用于评估分布偏移的影响。
- 证明了LALM的记忆是跨模态的,仅源于说话人的声音身份和文本的绑定。
方法论
通过多模态盲基线评估,识别分布匹配数据集,进行MIA方法基准测试,并进行模态解耦实验。
原文摘要
We present the first systematic Membership Inference Attack (MIA) evaluation of Large Audio Language Models (LALMs). As audio encodes non-semantic information, it induces severe train and test distribution shifts and can lead to spurious MIA performance. Using a multi-modal blind baseline based on textual, spectral, and prosodic features, we demonstrate that common speech datasets exhibit near-perfect train/test separability (AUC approximately 1.0) even without model inference, and the standard MIA scores strongly correlate with these blind acoustic artifacts (correlation greater than 0.7). Using this blind baseline, we identify that distribution-matched datasets enable reliable MIA evaluation without distribution shift confounds. We benchmark multiple MIA methods and conduct modality disentanglement experiments on these datasets. The results reveal that LALM memorization is cross-modal, arising only from binding a speaker's vocal identity with its text. These findings establish a principled standard for auditing LALMs beyond spurious correlations.