LLM Reasoning 相关度: 8/10

Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference

Bo-Wei Chen, Chung-Chi Chen, An-Zi Yen
arXiv: 2602.22090v1 发布: 2026-02-25 更新: 2026-02-25

AI 摘要

提出一种基于置信度的多尺度模型选择策略,以降低LLM推理成本并保持准确率。

主要贡献

  • 提出置信度驱动的模型选择策略
  • 评估模型知识的可能性和响应的准确性
  • 在MMLU上验证了该方法,降低了计算成本并保持了准确率

方法论

基于模型置信度动态选择模型,高置信度任务保留,低置信度任务分配给更大模型,实现成本与准确率的平衡。

原文摘要

Large Language Models (LLMs) have revolutionized inference across diverse natural language tasks, with larger models performing better but at higher computational costs. We propose a confidence-driven strategy that dynamically selects the most suitable model based on confidence estimates. By assessing a model's confidence in handling the task and response accuracy, tasks that are likely to be solved correctly are retained, while more uncertain or complex cases are delegated to a larger model, ensuring reliability while minimizing computation. Specifically, we evaluate a model's likelihood of knowing the correct answer and the probability that its response is accurate. Experiments on the Massive Multitask Language Understanding (MMLU) benchmark show that our approach achieves accuracy comparable to the largest model while reducing computational costs by 20\% to 40\%. When applied to GPT-4o API calls, it reduces token usage by approximately 60\%, further improving cost efficiency. These findings indicate the potential of confidence-based model selection to enhance real-world LLM deployment, particularly in resource-constrained settings such as edge devices and commercial API applications.

标签

LLM 模型选择 置信度 推理效率 多尺度模型

arXiv 分类

cs.CL