AI Agents 相关度: 8/10

Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems

Wanxing Wu, He Zhu, Yixia Li, Lei Yang, Jiehui Zhao, Hongru Wang, Jian Yang, Benyou Wang, Bingyi Jing, Guanhua Chen
arXiv: 2602.11877v1 发布: 2026-02-12 更新: 2026-02-12

AI 摘要

提出RouterXBench评估框架和ProbeDirichlet路由方法,提升LLM协同系统中路由器的性能和鲁棒性。

主要贡献

  • 提出RouterXBench,一个多维度的路由器评估框架
  • 提出ProbeDirichlet,一种基于内部隐藏状态的轻量级路由器
  • 实验证明ProbeDirichlet在多种场景下优于现有方法

方法论

利用内部隐藏状态捕捉模型不确定性,通过可学习的Dirichlet分布聚合跨层隐藏状态,并进行概率训练。

原文摘要

Large language models (LLMs) have achieved success, but cost and privacy constraints necessitate deploying smaller models locally while offloading complex queries to cloud-based models. Existing router evaluations are unsystematic, overlooking scenario-specific requirements and out-of-distribution robustness. We propose RouterXBench, a principled evaluation framework with three dimensions: router ability, scenario alignment, and cross-domain robustness. Unlike prior work that relies on output probabilities or external embeddings, we utilize internal hidden states that capture model uncertainty before answer generation. We introduce ProbeDirichlet, a lightweight router that aggregates cross-layer hidden states via learnable Dirichlet distributions with probabilistic training. Trained on multi-domain data, it generalizes robustly across in-domain and out-of-distribution scenarios. Our results show ProbeDirichlet achieves 16.68% and 18.86% relative improvements over the best baselines in router ability and high-accuracy scenarios, with consistent performance across model families, model scales, heterogeneous tasks, and agentic workflows.

标签

LLM Router Evaluation Multi-domain Hidden States

arXiv 分类

cs.CL cs.AI