CounterFlowNet: From Minimal Changes to Meaningful Counterfactual Explanations
AI 摘要
CounterFlowNet利用GFlowNet生成高质量且满足约束的反事实解释,提升了解释的有效性、稀疏性和多样性。
主要贡献
- 提出CounterFlowNet,一种基于GFlowNet的反事实解释生成方法
- 利用序列特征修改生成稀疏的解释
- 统一行动空间支持异构数据,并通过行动掩码强制约束
方法论
将反事实解释生成建模为序列特征修改,利用条件生成流网络学习生成高质量的反事实解释,并使用奖励函数引导生成。
原文摘要
Counterfactual explanations (CFs) provide human-interpretable insights into model's predictions by identifying minimal changes to input features that would alter the model's output. However, existing methods struggle to generate multiple high-quality explanations that (1) affect only a small portion of the features, (2) can be applied to tabular data with heterogeneous features, and (3) are consistent with the user-defined constraints. We propose CounterFlowNet, a generative approach that formulates CF generation as sequential feature modification using conditional Generative Flow Networks (GFlowNet). CounterFlowNet is trained to sample CFs proportionally to a user-specified reward function that can encode key CF desiderata: validity, sparsity, proximity and plausibility, encouraging high-quality explanations. The sequential formulation yields highly sparse edits, while a unified action space seamlessly supports continuous and categorical features. Moreover, actionability constraints, such as immutability and monotonicity of features, can be enforced at inference time via action masking, without retraining. Experiments on eight datasets under two evaluation protocols demonstrate that CounterFlowNet achieves superior trade-offs between validity, sparsity, plausibility, and diversity with full satisfaction of the given constraints.