LLM Reasoning 相关度: 9/10

An Exploration-Analysis-Disambiguation Reasoning Framework for Word Sense Disambiguation with Low-Parameter LLMs

Deshan Sumanathilaka, Nicholas Micallef, Julian Hough
arXiv: 2603.05400v1 发布: 2026-03-05 更新: 2026-03-05

AI 摘要

该论文探索了使用低参数LLM通过推理驱动的微调策略实现高性能词义消歧。

主要贡献

  • 证明低参数LLM通过CoT推理和邻词分析可媲美GPT-4-Turbo的WSD性能
  • 提出了适用于低参数LLM的推理驱动微调策略
  • 验证了模型在未见数据集上的跨领域泛化能力

方法论

使用Rationale-rich数据微调低参数LLM,结合CoT推理和邻词分析,在FEWS和“Fool Me If You Can”数据集上进行评估。

原文摘要

Word Sense Disambiguation (WSD) remains a key challenge in Natural Language Processing (NLP), especially when dealing with rare or domain-specific senses that are often misinterpreted. While modern high-parameter Large Language Models (LLMs) such as GPT-4-Turbo have shown state-of-the-art WSD performance, their computational and energy demands limit scalability. This study investigates whether low-parameter LLMs (<4B parameters) can achieve comparable results through fine-tuning strategies that emphasize reasoning-driven sense identification. Using the FEWS dataset augmented with semi-automated, rationale-rich annotations, we fine-tune eight small-scale open-source LLMs (e.g. Gemma and Qwen). Our results reveal that Chain-of-Thought (CoT)-based reasoning combined with neighbour-word analysis achieves performance comparable to GPT-4-Turbo in zero-shot settings. Importantly, Gemma-3-4B and Qwen-3-4B models consistently outperform all medium-parameter baselines and state-of-the-art models on FEWS, with robust generalization to unseen senses. Furthermore, evaluation on the unseen "Fool Me If You Can'' dataset confirms strong cross-domain adaptability without task-specific fine-tuning. This work demonstrates that with carefully crafted reasoning-centric fine-tuning, low-parameter LLMs can deliver accurate WSD while substantially reducing computational and energy demands.

标签

词义消歧 低参数LLM 推理 Chain-of-Thought 微调

arXiv 分类

cs.CL