AI Agents 相关度: 6/10

Almost Asymptotically Optimal Active Clustering Through Pairwise Observations

Rachel S. Y. Teo, P. N. Karthik, Ramya Korlakai Vinayak, Vincent Y. F. Tan
arXiv: 2602.05690v1 发布: 2026-02-05 更新: 2026-02-05

AI 摘要

提出了一种通过成对观测进行主动聚类的新框架,并设计了渐近最优算法。

主要贡献

  • 提出了主动聚类分析的新框架
  • 建立了聚类准确性的查询下界
  • 设计了渐近最优的主动聚类算法

方法论

利用改变测度技术,建立下界,并设计基于广义似然比(GLR)统计量的算法,通过与阈值比较来停止查询。

原文摘要

We propose a new analysis framework for clustering $M$ items into an unknown number of $K$ distinct groups using noisy and actively collected responses. At each time step, an agent is allowed to query pairs of items and observe bandit binary feedback. If the pair of items belongs to the same (resp.\ different) cluster, the observed feedback is $1$ with probability $p>1/2$ (resp.\ $q<1/2$). Leveraging the ubiquitous change-of-measure technique, we establish a fundamental lower bound on the expected number of queries needed to achieve a desired confidence in the clustering accuracy, formulated as a sup-inf optimization problem. Building on this theoretical foundation, we design an asymptotically optimal algorithm in which the stopping criterion involves an empirical version of the inner infimum -- the Generalized Likelihood Ratio (GLR) statistic -- being compared to a threshold. We develop a computationally feasible variant of the GLR statistic and show that its performance gap to the lower bound can be accurately empirically estimated and remains within a constant multiple of the lower bound.

标签

主动学习 聚类 成对比较 bandit反馈 广义似然比

arXiv 分类

cs.LG cs.IT