LLM Reasoning 相关度: 6/10

Interpretability-by-Design with Accurate Locally Additive Models and Conditional Feature Effects

Vasilis Gkolemis, Loukas Kavouras, Dimitrios Kyriakopoulos, Konstantinos Tsopelas, Dimitrios Rontogiannis, Giuseppe Casalicchio, Theodore Dalamagas, Christos Diou
arXiv: 2602.16503v1 发布: 2026-02-18 更新: 2026-02-18

AI 摘要

CALMs通过条件加性局部模型,在GAMs和GA^2Ms之间取得了预测精度和可解释性的平衡。

主要贡献

  • 提出了Conditionally Additive Local Models (CALMs)模型
  • 设计了基于知识蒸馏的训练流程,用于识别同质区域并拟合可解释的形状函数
  • 在多个任务上验证了CALMs的有效性

方法论

通过为每个特征定义多个单变量形状函数,这些函数在输入空间的不同区域(由逻辑条件定义)激活,实现局部可加性。

原文摘要

Generalized additive models (GAMs) offer interpretability through independent univariate feature effects but underfit when interactions are present in data. GA$^2$Ms add selected pairwise interactions which improves accuracy, but sacrifices interpretability and limits model auditing. We propose \emph{Conditionally Additive Local Models} (CALMs), a new model class, that balances the interpretability of GAMs with the accuracy of GA$^2$Ms. CALMs allow multiple univariate shape functions per feature, each active in different regions of the input space. These regions are defined independently for each feature as simple logical conditions (thresholds) on the features it interacts with. As a result, effects remain locally additive while varying across subregions to capture interactions. We further propose a principled distillation-based training pipeline that identifies homogeneous regions with limited interactions and fits interpretable shape functions via region-aware backfitting. Experiments on diverse classification and regression tasks show that CALMs consistently outperform GAMs and achieve accuracy comparable with GA$^2$Ms. Overall, CALMs offer a compelling trade-off between predictive accuracy and interpretability.

标签

可解释性 广义加性模型 模型蒸馏

arXiv 分类

cs.LG cs.AI