LLM Reasoning 相关度: 7/10

Breaking the Simplification Bottleneck in Amortized Neural Symbolic Regression

Paul Saegert, Ullrich Köthe
arXiv: 2602.08885v1 发布: 2026-02-09 更新: 2026-02-09

AI 摘要

论文提出SimpliPy加速符号回归简化,提升了Amortized SR的效率和准确性。

主要贡献

  • 设计了快速的规则化表达式简化引擎SimpliPy
  • 提出了Flash-ANSR框架,显著提升了Amortized SR的性能
  • 实现了训练集去污染,避免测试集表达式的等价性问题

方法论

通过规则化引擎SimpliPy加速表达式简化,集成到Flash-ANSR框架中,优化Amortized SR的训练和推理。

原文摘要

Symbolic regression (SR) aims to discover interpretable analytical expressions that accurately describe observed data. Amortized SR promises to be much more efficient than the predominant genetic programming SR methods, but currently struggles to scale to realistic scientific complexity. We find that a key obstacle is the lack of a fast reduction of equivalent expressions to a concise normalized form. Amortized SR has addressed this by general-purpose Computer Algebra Systems (CAS) like SymPy, but the high computational cost severely limits training and inference speed. We propose SimpliPy, a rule-based simplification engine achieving a 100-fold speed-up over SymPy at comparable quality. This enables substantial improvements in amortized SR, including scalability to much larger training sets, more efficient use of the per-expression token budget, and systematic training set decontamination with respect to equivalent test expressions. We demonstrate these advantages in our Flash-ANSR framework, which achieves much better accuracy than amortized baselines (NeSymReS, E2E) on the FastSRB benchmark. Moreover, it performs on par with state-of-the-art direct optimization (PySR) while recovering more concise instead of more complex expressions with increasing inference budget.

标签

符号回归 表达式简化 机器学习 AI加速 神经网络

arXiv 分类

cs.LG cs.AI cs.SC