LLM Memory & RAG 相关度: 5/10

Explanations Leak: Membership Inference with Differential Privacy and Active Learning Defense

Fatima Ezzeddine, Osama Zammar, Silvia Giordano, Omran Ayoub
arXiv: 2602.03611v1 发布: 2026-02-03 更新: 2026-02-03

AI 摘要

研究对抗性解释如何增强成员推理攻击,并提出差分隐私和主动学习结合的防御框架。

主要贡献

  • 分析了解释泄露对成员推理攻击的影响
  • 提出了基于差分隐私和主动学习的防御框架
  • 评估了隐私泄露、预测性能和解释质量之间的权衡

方法论

通过查询API暴露对抗解释,构建影子模型进行成员推理攻击,并使用DP和AL降低模型记忆。

原文摘要

Counterfactual explanations (CFs) are increasingly integrated into Machine Learning as a Service (MLaaS) systems to improve transparency; however, ML models deployed via APIs are already vulnerable to privacy attacks such as membership inference and model extraction, and the impact of explanations on this threat landscape remains insufficiently understood. In this work, we focus on the problem of how CFs expand the attack surface of MLaaS by strengthening membership inference attacks (MIAs), and on the need to design defense mechanisms that mitigate this emerging risk without undermining utility and explainability. First, we systematically analyze how exposing CFs through query-based APIs enables more effective shadow-based MIAs. Second, we propose a defense framework that integrates Differential Privacy (DP) with Active Learning (AL) to jointly reduce memorization and limit effective training data exposure. Finally, we conduct an extensive empirical evaluation to characterize the three-way trade-off between privacy leakage, predictive performance, and explanation quality. Our findings highlight the need to carefully balance transparency, utility, and privacy in the responsible deployment of explainable MLaaS systems.

标签

成员推理攻击 对抗性解释 差分隐私 主动学习 隐私保护

arXiv 分类

cs.LG