LLM Reasoning 相关度: 7/10

LLM-based Triplet Extraction from Financial Reports

Dante Wesslund, Ville Stenström, Pontus Linde, Alexander Holmberg
arXiv: 2602.11886v1 发布: 2026-02-12 更新: 2026-02-12

AI 摘要

提出一种基于LLM的财务报告三元组抽取流水线,并使用本体驱动代理指标进行评估。

主要贡献

  • 提出基于LLM的财务报告三元组抽取流水线
  • 使用本体一致性和忠实度作为评估指标
  • 发现主语和宾语幻觉的不对称性

方法论

使用LLM进行三元组抽取,对比手动和自动构建的本体,并结合正则匹配和LLM判断进行验证。

原文摘要

Corporate financial reports are a valuable source of structured knowledge for Knowledge Graph construction, but the lack of annotated ground truth in this domain makes evaluation difficult. We present a semi-automated pipeline for Subject-Predicate-Object triplet extraction that uses ontology-driven proxy metrics, specifically Ontology Conformance and Faithfulness, instead of ground-truth-based evaluation. We compare a static, manually engineered ontology against a fully automated, document-specific ontology induction approach across different LLMs and two corporate annual reports. The automatically induced ontology achieves 100% schema conformance in all configurations, eliminating the ontology drift observed with the manual approach. We also propose a hybrid verification strategy that combines regex matching with an LLM-as-a-judge check, reducing apparent subject hallucination rates from 65.2% to 1.6% by filtering false positives caused by coreference resolution. Finally, we identify a systematic asymmetry between subject and object hallucinations, which we attribute to passive constructions and omitted agents in financial prose.

标签

LLM 三元组抽取 财务报告 知识图谱 本体

arXiv 分类

cs.CL