AI Agents 相关度: 6/10

Bridging the Evaluation Gap: Standardized Benchmarks for Multi-Objective Search

Hadar Peer, Carlos Hernandez, Sven Koenig, Ariel Felner, Oren Salzman

arXiv: 2603.24084v1 发布: 2026-03-25 更新: 2026-03-25

下载 PDF arXiv 页面

AI 摘要

该论文提出了一个多目标搜索的标准基准测试套件，以解决现有评估的碎片化问题。

主要贡献

构建了首个全面的多目标搜索标准基准测试套件
包含了结构多样的四个领域的数据集
提供了固定的图实例、标准化的起止点查询和参考Pareto最优解集

方法论

通过选择具有不同结构特征的领域并提供标准化的数据和查询，构建可复现的测试环境。

原文摘要

Empirical evaluation in multi-objective search (MOS) has historically suffered from fragmentation, relying on heterogeneous problem instances with incompatible objective definitions that make cross-study comparisons difficult. This standardization gap is further exacerbated by the realization that DIMACS road networks, a historical default benchmark for the field, exhibit highly correlated objectives that fail to capture diverse Pareto-front structures. To address this, we introduce the first comprehensive, standardized benchmark suite for exact and approximate MOS. Our suite spans four structurally diverse domains: real-world road networks, structured synthetic graphs, game-based grid environments, and high-dimensional robotic motion-planning roadmaps. By providing fixed graph instances, standardized start-goal queries, and both exact and approximate reference Pareto-optimal solution sets, this suite captures a full spectrum of objective interactions: from strongly correlated to strictly independent. Ultimately, this benchmark provides a common foundation to ensure future MOS evaluations are robust, reproducible, and structurally comprehensive.

arXiv 分类

cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类