End-to-End Compression for Tabular Foundation Models
AI 摘要
提出TACO模型,通过压缩训练数据集在潜在空间中加速和压缩tabular foundation model。
主要贡献
- 提出了一种端到端表格数据压缩模型TACO
- 实现了更快的推理速度和更低的内存消耗
- 在TabArena基准测试中验证了TACO的有效性
方法论
通过在潜在空间中压缩训练数据集,减少模型复杂度和存储需求,提高推理效率。
原文摘要
The long-standing dominance of gradient-boosted decision trees for tabular data has recently been challenged by in-context learning tabular foundation models. In-context learning methods fit and predict in one forward pass without parameter updates by leveraging the training data as context for predicting on query test points. While recent tabular foundation models achieve state-of-the-art performance, their transformer architecture based on the attention mechanism has quadratic complexity regarding dataset size, which in turn increases the overhead on training and inference time, and limits the capacity of the models to handle large-scale datasets. In this work, we propose TACO, an end-to-end tabular compression model that compresses the training dataset in a latent space. We test our method on the TabArena benchmark, where our proposed method is up to 94x faster in inference time, while consuming up to 97\% less memory compared to the state-of-the-art tabular transformer architecture, all while retaining performance without significant degradation. Lastly, our method not only scales better with increased dataset sizes, but it also achieves better performance compared to other baselines.