LLM Reasoning 相关度: 8/10

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Arther Tian, Alex Ding, Frank Chen, Simon Wu, Aaron Chan

arXiv: 2603.04028v1 发布: 2026-03-04 更新: 2026-03-04

下载 PDF arXiv 页面

AI 摘要

提出了一种多维度LLM推理质量评分框架，并应用于去中心化推理网络的质量评估。

主要贡献

提出多维度质量评分框架，分解质量评估维度。
分析了各维度质量信号的可靠性，发现维度依赖于任务。
将组合评分应用于PoQ，提升了鲁棒性。

方法论

通过QA和摘要任务的输出，系统性地审计了不同维度质量信号的可靠性，并使用消融实验验证组合评分的有效性。

原文摘要

Decentralized large language model (LLM) inference networks can pool heterogeneous compute to scale serving, but they require lightweight and incentive-compatible mechanisms to assess output quality. Prior work introduced cost-aware Proof of Quality (PoQ) and adaptive robust PoQ to allocate rewards under evaluator heterogeneity and adversarial behavior. In this paper, we focus on the quality signal itself and propose a multi-dimensional quality scoring framework that decomposes output quality into modular dimensions, including model and cost priors, structure quality, semantic quality, query-output alignment, and agreement/uncertainty. Using logged outputs from QA and summarization tasks, we systematically audit dimension reliability and show that seemingly reasonable dimensions can be task-dependent and even negatively correlated with reference quality without calibration. While the default composite underperforms a strong single semantic evaluator, ablations reveal that removing unreliable dimensions and re-normalizing weights yields a calibrated composite that matches or exceeds the best single- evaluator and consensus baselines. Finally, we integrate the composite score as a drop-in quality signal in PoQ and demonstrate complementary benefits with robust aggregation and adaptive trust weighting under adversarial evaluator attacks.

arXiv 分类

cs.LG cs.AI cs.CR

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类