Learning Hierarchical Knowledge in Text-Rich Networks with Taxonomy-Informed Representation Learning
AI 摘要
TIER通过构建和利用文本富网络的层次结构,提升节点表示学习效果。
主要贡献
- 提出TIER模型,学习文本富网络中的层次知识
- 使用相似性引导对比学习构建聚类友好的嵌入空间
- 引入cophenetic相关系数正则化损失对齐嵌入与层次结构
方法论
利用相似性对比学习构建嵌入空间,层次K-Means聚类构建隐式分类,LLM优化分类,并用相关系数正则化嵌入。
原文摘要
Hierarchical knowledge structures are ubiquitous across real-world domains and play a vital role in organizing information from coarse to fine semantic levels. While such structures have been widely used in taxonomy systems, biomedical ontologies, and retrieval-augmented generation, their potential remains underexplored in the context of Text-Rich Networks (TRNs), where each node contains rich textual content and edges encode semantic relationships. Existing methods for learning on TRNs often focus on flat semantic modeling, overlooking the inherent hierarchical semantics embedded in textual documents. To this end, we propose TIER (Hierarchical \textbf{T}axonomy-\textbf{I}nformed R\textbf{E}presentation Learning on Text-\textbf{R}ich Networks), which first constructs an implicit hierarchical taxonomy and then integrates it into the learned node representations. Specifically, TIER employs similarity-guided contrastive learning to build a clustering-friendly embedding space, upon which it performs hierarchical K-Means followed by LLM-powered clustering refinement to enable semantically coherent taxonomy construction. Leveraging the resulting taxonomy, TIER introduces a cophenetic correlation coefficient-based regularization loss to align the learned embeddings with the hierarchical structure. By learning representations that respect both fine-grained and coarse-grained semantics, TIER enables more interpretable and structured modeling of real-world TRNs. We demonstrate that our approach significantly outperforms existing methods on multiple datasets across diverse domains, highlighting the importance of hierarchical knowledge learning for TRNs.