AI Agents 相关度: 7/10

Reward-Based Online LLM Routing via NeuralUCB

Ming-Hua Tsai, Phat Tran

arXiv: 2603.30035v1 发布: 2026-03-31 更新: 2026-03-31

下载 PDF arXiv 页面

AI 摘要

提出基于NeuralUCB的奖励驱动的在线LLM路由方法，在成本敏感场景下表现优异。

主要贡献

提出基于NeuralUCB的LLM路由策略
在RouterBench上验证了该方法优于基线方法
在降低推理成本的同时保持了竞争力的奖励

方法论

采用NeuralUCB算法，在在线环境中模拟LLM路由，根据奖励选择最佳模型。

原文摘要

This study investigates the use of NeuralUCB for cost-aware large language model (LLM) routing. Existing routing approaches can be broadly grouped into supervised routing methods and partial-feedback methods, each with different tradeoffs in efficiency and adaptivity. We implement a NeuralUCB-based routing policy and evaluate it on RouterBench under a simulated online setting. Experimental results show that the proposed method consistently outperforms random and min-cost baselines in utility reward. Compared with the max-quality reference, our method achieves substantially lower inference cost while maintaining competitive reward. These findings suggest that NeuralUCB is a promising approach for cost-aware LLM routing, while also highlighting remaining challenges in action discrimination and exploration.

arXiv 分类

cs.LG cs.CL

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类