LLM Reasoning 相关度: 7/10

LLM-based Triplet Extraction from Financial Reports

Dante Wesslund, Ville Stenström, Pontus Linde, Alexander Holmberg

arXiv: 2602.11886v1 发布: 2026-02-12 更新: 2026-02-12

下载 PDF arXiv 页面

AI 摘要

提出一种基于LLM的财务报告三元组抽取流水线，并使用本体驱动代理指标进行评估。

主要贡献

提出基于LLM的财务报告三元组抽取流水线
使用本体一致性和忠实度作为评估指标
发现主语和宾语幻觉的不对称性

方法论

使用LLM进行三元组抽取，对比手动和自动构建的本体，并结合正则匹配和LLM判断进行验证。

原文摘要

Corporate financial reports are a valuable source of structured knowledge for Knowledge Graph construction, but the lack of annotated ground truth in this domain makes evaluation difficult. We present a semi-automated pipeline for Subject-Predicate-Object triplet extraction that uses ontology-driven proxy metrics, specifically Ontology Conformance and Faithfulness, instead of ground-truth-based evaluation. We compare a static, manually engineered ontology against a fully automated, document-specific ontology induction approach across different LLMs and two corporate annual reports. The automatically induced ontology achieves 100% schema conformance in all configurations, eliminating the ontology drift observed with the manual approach. We also propose a hybrid verification strategy that combines regex matching with an LLM-as-a-judge check, reducing apparent subject hallucination rates from 65.2% to 1.6% by filtering false positives caused by coreference resolution. Finally, we identify a systematic asymmetry between subject and object hallucinations, which we attribute to passive constructions and omitted agents in financial prose.

arXiv 分类

cs.CL

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类