Retrieval-aligned Tabular Foundation Models Enable Robust Clinical Risk Prediction in Electronic Health Records Under Real-world Constraints
AI 摘要
提出AWARE框架,通过任务对齐的检索提升表格数据上下文学习在电子病历临床风险预测中的鲁棒性。
主要贡献
- 提出AWARE框架,优化表格数据的检索
- 在多队列EHR基准上评估TICL模型
- 分析检索质量和检索-推理对齐对临床预测的影响
方法论
构建基于监督嵌入学习和轻量级适配器的任务对齐检索框架AWARE,并在EHR数据集上进行实验评估。
原文摘要
Clinical prediction from structured electronic health records (EHRs) is challenging due to high dimensionality, heterogeneity, class imbalance, and distribution shift. While tabular in-context learning (TICL) and retrieval-augmented methods perform well on generic benchmarks, their behavior in clinical settings remains unclear. We present a multi-cohort EHR benchmark comparing classical, deep tabular, and TICL models across varying data scale, feature dimensionality, outcome rarity, and cross-cohort generalization. PFN-based TICL models are sample-efficient in low-data regimes but degrade under naive distance-based retrieval as heterogeneity and imbalance increase. We propose AWARE, a task-aligned retrieval framework using supervised embedding learning and lightweight adapters. AWARE improves AUPRC by up to 12.2% under extreme imbalance, with gains increasing with data complexity. Our results identify retrieval quality and retrieval-inference alignment as key bottlenecks for deploying tabular in-context learning in clinical prediction.