LLM Memory & RAG 相关度: 6/10

DEBISS: a Corpus of Individual, Semi-structured and Spoken Debates

Klaywert Danillo Ferreira de Souza, David Eduardo Pereira, Cláudio E. C. Campelo, Larissa Lucena Vasconcelos
arXiv: 2603.05459v1 发布: 2026-03-05 更新: 2026-03-05

AI 摘要

DEBISS语料库:一个包含口语、个人辩论和半结构化特征的辩论语料库,具有丰富的NLP任务标注。

主要贡献

  • 提出了DEBISS语料库,填补了辩论语料库的空白
  • 包含了口语、个人辩论和半结构化特征
  • 提供了speech-to-text、speaker diarization、argument mining等多种NLP任务的标注

方法论

通过收集个人辩论数据,并进行人工标注,构建了一个包含多种NLP任务标注的半结构化辩论语料库。

原文摘要

The process of debating is essential in our daily lives, whether in studying, work activities, simple everyday discussions, political debates on TV, or online discussions on social networks. The range of uses for debates is broad. Due to the diverse applications, structures, and formats of debates, developing corpora that account for these variations can be challenging, and the scarcity of debate corpora in the state of the art is notable. For this reason, the current research proposes the DEBISS corpus: a collection of spoken and individual debates with semi-structured features. With a broad range of NLP task annotations, such as speech-to-text, speaker diarization, argument mining, and debater quality assessment.

标签

语料库 辩论 自然语言处理 语音识别 论证挖掘

arXiv 分类

cs.CL cs.DB