LLM Reasoning 相关度: 8/10

Text summarization via global structure awareness

Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Yibei Liu, Chenghao Li, Qigan Sun, Shuai Yuan, Fachrina Dewi Puspitasari, Dongshen Han, Guoqing Wang, Sung-Ho Bae, Yang Yang
arXiv: 2602.09821v1 发布: 2026-02-10 更新: 2026-02-10

AI 摘要

GloSA-sum通过拓扑数据分析实现全局结构感知,提升文本摘要的准确性和效率。

主要贡献

  • 提出GloSA-sum,首个基于TDA的全局结构感知摘要方法
  • 设计拓扑引导的迭代策略,平衡准确性和效率
  • 提出分层策略,增强长文本处理能力

方法论

构建语义加权图,利用持久同调识别核心语义和逻辑结构,采用轻量级代理指标近似句子重要性,并进行分层摘要。

原文摘要

Text summarization is a fundamental task in natural language processing (NLP), and the information explosion has made long-document processing increasingly demanding, making summarization essential. Existing research mainly focuses on model improvements and sentence-level pruning, but often overlooks global structure, leading to disrupted coherence and weakened downstream performance. Some studies employ large language models (LLMs), which achieve higher accuracy but incur substantial resource and time costs. To address these issues, we introduce GloSA-sum, the first summarization approach that achieves global structure awareness via topological data analysis (TDA). GloSA-sum summarizes text efficiently while preserving semantic cores and logical dependencies. Specifically, we construct a semantic-weighted graph from sentence embeddings, where persistent homology identifies core semantics and logical structures, preserved in a ``protection pool'' as the backbone for summarization. We design a topology-guided iterative strategy, where lightweight proxy metrics approximate sentence importance to avoid repeated high-cost computations, thus preserving structural integrity while improving efficiency. To further enhance long-text processing, we propose a hierarchical strategy that integrates segment-level and global summarization. Experiments on multiple datasets demonstrate that GloSA-sum reduces redundancy while preserving semantic and logical integrity, striking a balance between accuracy and efficiency, and further benefits LLM downstream tasks by shortening contexts while retaining essential reasoning chains.

标签

文本摘要 拓扑数据分析 全局结构感知 长文本处理

arXiv 分类

cs.CL cs.AI