Exact Recovery in the Data Block Model
AI 摘要
研究数据块模型中的精确恢复问题,提出了新的阈值刻画和算法。
主要贡献
- 提出了用于数据块模型精确恢复的Chernoff--TV散度
- 刻画了数据块模型精确恢复的尖锐阈值
- 设计了一种达到该阈值的高效算法,并证明了下界的不可达性
方法论
理论分析,利用Chernoff--TV散度建立精确恢复的阈值,并通过算法设计和反证法进行验证。
原文摘要
Community detection in networks is a fundamental problem in machine learning and statistical inference, with applications in social networks, biological systems, and communication networks. The stochastic block model (SBM) serves as a canonical framework for studying community structure, and exact recovery, identifying the true communities with high probability, is a central theoretical question. While classical results characterize the phase transition for exact recovery based solely on graph connectivity, many real-world networks contain additional data, such as node attributes or labels. In this work, we study exact recovery in the Data Block Model (DBM), an SBM augmented with node-associated data, as formalized by Asadi, Abbe, and Verdú (2017). We introduce the Chernoff--TV divergence and use it to characterize a sharp exact recovery threshold for the DBM. We further provide an efficient algorithm that achieves this threshold, along with a matching converse result showing impossibility below the threshold. Finally, simulations validate our findings and demonstrate the benefits of incorporating vertex data as side information in community detection.