LLM Reasoning 相关度: 8/10

The Unreasonable Effectiveness of Scaling Laws in AI

Chien-Ping Lu
arXiv: 2603.28507v1 发布: 2026-03-30 更新: 2026-03-30

AI 摘要

探讨AI缩放定律的有效性,分析其背后的逻辑和对未来效率提升的意义。

主要贡献

  • 解释了缩放定律的有效性
  • 提出了逻辑计算的概念
  • 强调了效率提升的重要性

方法论

通过实证观察和理论分析,解释了缩放定律的现象和深层原因。

原文摘要

Classical AI scaling laws, especially for pre-training, describe how training loss decreases with compute in a power-law form. Their effectiveness has a basic and very practical sense: they make progress predictable, albeit at a declining rate. Yet their effectiveness is also unreasonable in two further senses. First, these laws are largely empirical and observational, but they appear repeatedly across model families and increasingly across training-adjacent regimes. Second, despite the diminishing returns they predict, progress in practice has often continued through rapidly improving efficiency, visible for example in falling cost per token. This paper argues that both features arise from the same source: scaling laws are unusually effective because they abstract away from many realization details. The compute variable is best understood as logical compute, an implementation-agnostic notion of model-side work, while the practical burden of scaling depends on how efficiently real resources are converted into that compute. This abstraction helps explain both why the laws travel so well across settings and why they give rise to a persistent efficiency game in hardware, algorithms, and systems. Once efficiency is made explicit, the main practical question becomes how many efficiency doublings are required to keep scaling productive despite diminishing returns. Under that view, diminishing returns are not only a geometric flattening of the loss curve, but also rising pressure for cost reduction, system-level innovation, and the breakthroughs needed to sustain Moore-like efficiency doublings.

标签

scaling laws efficiency compute AI

arXiv 分类

cs.LG cs.AI