Efficient Variance-reduced Estimation from Generative EHR Models: The SCOPE and REACH Estimators
AI 摘要
提出了SCOPE和REACH两种新的EHR生成模型估计器,显著降低了计算成本和抽样方差。
主要贡献
- 提出了SCOPE和REACH两种新的无偏估计器
- 证明了REACH保证了方差缩减
- 在MIMIC-IV数据集上验证了方法的有效性,显著降低了推理成本
方法论
利用生成模型中被丢弃的next-token概率分布,设计了SCOPE和REACH估计器,并进行了理论分析和实验验证。
原文摘要
Generative models trained using self-supervision of tokenized electronic health record (EHR) timelines show promise for clinical outcome prediction. This is typically done using Monte Carlo simulation for future patient trajectories. However, existing approaches suffer from three key limitations: sparse estimate distributions that poorly differentiate patient risk levels, extreme computational costs, and high sampling variance. We propose two new estimators: the Sum of Conditional Outcome Probability Estimator (SCOPE) and Risk Estimation from Anticipated Conditional Hazards (REACH), that leverage next-token probability distributions discarded by standard Monte Carlo. We prove both estimators are unbiased and that REACH guarantees variance reduction over Monte Carlo sampling for any model and outcome. Empirically, on hospital mortality prediction in MIMIC-IV using the ETHOS-ARES framework, SCOPE and REACH match 100-sample Monte Carlo performance using only 10-11 samples (95% CI: [9,11]), representing a ~10x reduction in inference cost without degrading calibration. For ICU admission prediction, efficiency gains are more modest (~1.2x), which we attribute to the outcome's lower "spontaneity," a property we characterize theoretically and empirically. These methods substantially improve the feasibility of deploying generative EHR models in resource-constrained clinical settings.