LLM Reasoning 相关度: 7/10

Explicit Grammar Semantic Feature Fusion for Robust Text Classification

Azrin Sultana, Firoz Ahmed
arXiv: 2602.20749v1 发布: 2026-02-24 更新: 2026-02-24

AI 摘要

提出一种显式语法语义特征融合方法,用于构建轻量级的鲁棒文本分类模型。

主要贡献

  • 提出显式编码句法结构的语法向量。
  • 将语法向量与冻结的上下文嵌入融合。
  • 构建了一个轻量级的文本分类模型,性能优于基线模型。

方法论

通过显式编码语法结构,构建语法向量,并与语义信息融合,形成统一的特征表示,用于训练文本分类模型。

原文摘要

Natural Language Processing enables computers to understand human language by analysing and classifying text efficiently with deep-level grammatical and semantic features. Existing models capture features by learning from large corpora with transformer models, which are computationally intensive and unsuitable for resource-constrained environments. Therefore, our proposed study incorporates comprehensive grammatical rules alongside semantic information to build a robust, lightweight classification model without resorting to full parameterised transformer models or heavy deep learning architectures. The novelty of our approach lies in its explicit encoding of sentence-level grammatical structure, including syntactic composition, phrase patterns, and complexity indicators, into a compact grammar vector, which is then fused with frozen contextual embeddings. These heterogeneous elements unified a single representation that captures both the structural and semantic characteristics of the text. Deep learning models such as Deep Belief Networks (DBNs), Long Short-Term Memory (LSTMs), BiLSTMs, and transformer-based BERT and XLNET were used to train and evaluate the model, with the number of epochs varied. Based on experimental results, the unified feature representation model captures both the semantic and structural properties of text, outperforming baseline models by 2%-15%, enabling more effective learning across heterogeneous domains. Unlike prior syntax-aware transformer models that inject grammatical structure through additional attention layers, tree encoders, or full fine-tuning, the proposed framework treats grammar as an explicit inductive bias rather than a learnable module, resulting in a very lightweight model that delivers better performance on edge devices

标签

文本分类 语法特征 语义特征 特征融合 轻量级模型

arXiv 分类

cs.CL