LLM Reasoning 相关度: 9/10

CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning

Siye Wu, Jian Xie, Yikai Zhang, Yanghua Xiao

arXiv: 2603.08659v1 发布: 2026-03-09 更新: 2026-03-09

下载 PDF arXiv 页面

AI 摘要

CODA通过难度感知的计算分配，动态调整推理深度，提高推理效率。

主要贡献

提出了一种难度感知的计算分配方法CODA
利用策略内部的难度信号来分配计算资源
在不同规模的模型和基准测试中验证了CODA的有效性

方法论

CODA通过组rollout估计难度，并将其映射到门控，调节奖励函数，从而控制推理长度，实现自适应推理。

原文摘要

The emergence of large reasoning models demonstrates that scaling inference-time compute significantly enhances performance on complex tasks. However, it often falls into another trap: overthinking simple problems, where repetitive rationales yield minimal accuracy gains at a disproportionately high cost. This motivates adaptive reasoning: dynamically aligning reasoning depth with instance difficulty. In this paper, we study adaptive reasoning from an optimality perspective, formalizing it as a utility maximization problem where tokens are allocated until the marginal accuracy gain falls below the incremental cost. Based on this, we propose CODA (Compute Allocation by Difficulty Awareness), a method that operationalizes this principle by allocating tokens via a policy-internal difficulty signal. Specifically, CODA estimates difficulty via group-based rollouts and maps it to two non-negative gates that modulate a length-dependent shaping term on top of the binary base reward. The easy-side gate penalizes verbosity on simple instances, whereas the hard-side gate encourages more deliberative rollouts on challenging ones. Across model scales and benchmarks, CODA achieves adaptive reasoning without external annotations or user-provided budgets: on easy tasks, CODA reduces token costs by over 60% while maintaining strong accuracy, whereas on hard tasks it incentivizes more deliberative rollouts to maximize performance.

arXiv 分类

cs.CL

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类