Multimodal Learning 相关度: 9/10

No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings

Joonsung Jeon, Woo Jae Kim, Suhyeon Ha, Sooel Son, Sung-Eui Yoon
arXiv: 2602.22689v1 发布: 2026-02-26 更新: 2026-02-26

AI 摘要

提出了一种无需真实caption的membership inference攻击方法MoFit,有效识别扩散模型训练集成员。

主要贡献

  • 提出MoFit框架,实现caption-free的membership inference攻击
  • 通过优化图像扰动,构建过拟合生成流形的surrogate
  • 利用surrogate提取model-fitted embedding,增强成员样本的损失响应

方法论

MoFit通过优化图像扰动生成surrogate,提取model-fitted embedding,作为不匹配的条件来放大成员样本的损失响应。

原文摘要

Latent diffusion models have achieved remarkable success in high-fidelity text-to-image generation, but their tendency to memorize training data raises critical privacy and intellectual property concerns. Membership inference attacks (MIAs) provide a principled way to audit such memorization by determining whether a given sample was included in training. However, existing approaches assume access to ground-truth captions. This assumption fails in realistic scenarios where only images are available and their textual annotations remain undisclosed, rendering prior methods ineffective when substituted with vision-language model (VLM) captions. In this work, we propose MoFit, a caption-free MIA framework that constructs synthetic conditioning inputs that are explicitly overfitted to the target model's generative manifold. Given a query image, MoFit proceeds in two stages: (i) model-fitted surrogate optimization, where a perturbation applied to the image is optimized to construct a surrogate in regions of the model's unconditional prior learned from member samples, and (ii) surrogate-driven embedding extraction, where a model-fitted embedding is derived from the surrogate and then used as a mismatched condition for the query image. This embedding amplifies conditional loss responses for member samples while leaving hold-outs relatively less affected, thereby enhancing separability in the absence of ground-truth captions. Our comprehensive experiments across multiple datasets and diffusion models demonstrate that MoFit consistently outperforms prior VLM-conditioned baselines and achieves performance competitive with caption-dependent methods.

标签

membership inference attack diffusion model privacy caption-free generative model

arXiv 分类

cs.CV cs.CR