NCL-UoR at SemEval-2026 Task 5: Embedding-Based Methods, Fine-Tuning, and LLMs for Word Sense Plausibility Rating
AI 摘要
论文比较了嵌入、微调和LLM三种方法,用于预测词义在短故事中的合理性。
主要贡献
- 比较嵌入、微调和LLM三种方法
- 提出基于结构化提示和决策规则的方法
- 发现结构化提示优于微调和嵌入方法
- 表明prompt设计比模型规模更重要
方法论
采用嵌入方法、transformer微调和LLM提示,重点研究了结构化提示和显式决策规则对词义合理性评级的影响。
原文摘要
Word sense plausibility rating requires predicting the human-perceived plausibility of a given word sense on a 1--5 scale in the context of short narrative stories containing ambiguous homonyms. This paper systematically compares three approaches: (1) embedding-based methods pairing sentence embeddings with standard regressors, (2) transformer fine-tuning with parameter-efficient adaptation, and (3) large language model (LLM) prompting with structured reasoning and explicit decision rules. The best-performing system employs a structured prompting strategy that decomposes evaluation into narrative components (precontext, target sentence, ending) and applies explicit decision rules for rating calibration. The analysis reveals that structured prompting with decision rules substantially outperforms both fine-tuned models and embedding-based approaches, and that prompt design matters more than model scale for this task. The code is publicly available at https://github.com/tongwu17/SemEval-2026-Task5.