Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion
AI 摘要
提出一种“设计即遗忘”的机器卸载新范式,通过密钥删除实现高效、零样本的遗忘能力。
主要贡献
- 提出“设计即遗忘”的卸载范式
- 设计了MUNKEY模型,通过密钥删除实现卸载
- 实验证明MUNKEY在多种数据集上优于现有方法
方法论
设计一种内存增强的Transformer,将实例特定记忆与模型权重解耦,通过删除实例密钥实现卸载。
原文摘要
Machine unlearning is rapidly becoming a practical requirement, driven by privacy regulations, data errors, and the need to remove harmful or corrupted training samples. Despite this, most existing methods tackle the problem purely from a post-hoc perspective. They attempt to erase the influence of targeted training samples through parameter updates that typically require access to the full training data. This creates a mismatch with real deployment scenarios where unlearning requests can be anticipated, revealing a fundamental limitation of post-hoc approaches. We propose \textit{unlearning by design}, a novel paradigm in which models are directly trained to support forgetting as an inherent capability. We instantiate this idea with Machine UNlearning via KEY deletion (MUNKEY), a memory augmented transformer that decouples instance-specific memorization from model weights. Here, unlearning corresponds to removing the instance-identifying key, enabling direct zero-shot forgetting without weight updates or access to the original samples or labels. Across natural image benchmarks, fine-grained recognition, and medical datasets, MUNKEY outperforms all post-hoc baselines. Our results establish that unlearning by design enables fast, deployment-oriented unlearning while preserving predictive performance.