LLM Reasoning 相关度: 9/10

STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction

Xiaoxiao Wang, Chunxiao Li, Junying Wang, Yijin Guo, Zijian Chen, Chunyi Li, Xiaohong Liu, Zicheng Zhang, Guangtao Zhai
arXiv: 2602.12143v1 发布: 2026-02-12 更新: 2026-02-12

AI 摘要

STAR框架融合统计与Agent推理,提升大模型性能预测在数据稀疏情况下的准确性和可解释性。

主要贡献

  • 提出STAR框架,结合统计和Agent推理
  • 引入受约束概率矩阵分解(CPMF)和外部知识检索
  • 利用期望违背理论(EVT)进行推理优化

方法论

结合统计期望(CPMF, 知识检索)和Agent推理(EVT),对大模型性能进行预测和解释。

原文摘要

As comprehensive large model evaluation becomes prohibitively expensive, predicting model performance from limited observations has become essential. However, existing statistical methods struggle with pattern shifts, data sparsity, and lack of explanation, while pure LLM methods remain unreliable. We propose STAR, a framework that bridges data-driven STatistical expectations with knowledge-driven Agentic Reasoning. STAR leverages specialized retrievers to gather external knowledge and embeds semantic features into Constrained Probabilistic Matrix Factorization (CPMF) to generate statistical expectations with uncertainty. A reasoning module guided by Expectation Violation Theory (EVT) then refines predictions through intra-family analysis, cross-model comparison, and credibility-aware aggregation, producing adjustments with traceable explanations. Extensive experiments show that STAR consistently outperforms all baselines on both score-based and rank-based metrics, delivering a 14.46% gain in total score over the strongest statistical method under extreme sparsity, with only 1--2 observed scores per test model.

标签

大模型性能预测 统计方法 Agent推理 知识检索 期望违背理论

arXiv 分类

cs.AI cs.LG