Multimodal Learning 相关度: 7/10

N-gram Injection into Transformers for Dynamic Language Model Adaptation in Handwritten Text Recognition

Florent Meyer, Laurent Guichard, Denis Coquenet, Guillaume Gravier, Yann Soullard, Bertrand Coüasnon

arXiv: 2603.03930v1 发布: 2026-03-04 更新: 2026-03-04

下载 PDF arXiv 页面

AI 摘要

提出了一种n-gram注入Transformer解码器的方法，用于手写文本识别中的动态语言模型自适应，提升跨领域识别精度。

主要贡献

提出n-gram注入Transformer解码器的方法
实现了动态语言模型自适应
无需额外训练即可适应目标领域

方法论

在Transformer解码器早期注入n-gram语言模型，使网络学习利用文本数据，从而适应目标领域的语言分布。

原文摘要

Transformer-based encoder-decoder networks have recently achieved impressive results in handwritten text recognition, partly thanks to their auto-regressive decoder which implicitly learns a language model. However, such networks suffer from a large performance drop when evaluated on a target corpus whose language distribution is shifted from the source text seen during training. To retain recognition accuracy despite this language shift, we propose an external n-gram injection (NGI) for dynamic adaptation of the network's language modeling at inference time. Our method allows switching to an n-gram language model estimated on a corpus close to the target distribution, therefore mitigating bias without any extra training on target image-text pairs. We opt for an early injection of the n-gram into the transformer decoder so that the network learns to fully leverage text-only data at the low additional cost of n-gram inference. Experiments on three handwritten datasets demonstrate that the proposed NGI significantly reduces the performance gap between source and target corpora.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类