Reward-Based Online LLM Routing via NeuralUCB
AI 摘要
提出基于NeuralUCB的奖励驱动的在线LLM路由方法,在成本敏感场景下表现优异。
主要贡献
- 提出基于NeuralUCB的LLM路由策略
- 在RouterBench上验证了该方法优于基线方法
- 在降低推理成本的同时保持了竞争力的奖励
方法论
采用NeuralUCB算法,在在线环境中模拟LLM路由,根据奖励选择最佳模型。
原文摘要
This study investigates the use of NeuralUCB for cost-aware large language model (LLM) routing. Existing routing approaches can be broadly grouped into supervised routing methods and partial-feedback methods, each with different tradeoffs in efficiency and adaptivity. We implement a NeuralUCB-based routing policy and evaluate it on RouterBench under a simulated online setting. Experimental results show that the proposed method consistently outperforms random and min-cost baselines in utility reward. Compared with the max-quality reference, our method achieves substantially lower inference cost while maintaining competitive reward. These findings suggest that NeuralUCB is a promising approach for cost-aware LLM routing, while also highlighting remaining challenges in action discrimination and exploration.