AI Agents 相关度: 7/10

An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously?

Jennifer D'Souza, Sameer Sadruddin, Maximilian Kähler, Andrea Salfinger, Luca Zaccagna, Francesca Incitti, Lauro Snidaro, Osma Suominen
arXiv: 2603.10876v1 发布: 2026-03-11 更新: 2026-03-11

AI 摘要

发布大规模双语文本分类数据集,用于知识库索引和辅助编目,旨在提升目录编目工作效率。

主要贡献

  • 发布大规模双语GND标注数据集
  • 提供机器可读的GND分类法
  • 提出基于知识库的文本分类方法

方法论

使用GND知识库对目录记录进行标注,构建多标签文本分类数据集,并评估了三个系统。

原文摘要

Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers' work.

标签

文本分类 多标签分类 知识库 GND 数字图书馆

arXiv 分类

cs.CL cs.AI cs.DL cs.IR