Building evidence-based knowledge graphs from full-text literature for disease-specific biomedical reasoning
AI 摘要
EvidenceNet构建疾病特定知识图谱,增强生物医学推理和假设生成能力。
主要贡献
- 构建EvidenceNet数据集和框架
- 利用LLM抽取和结构化生物医学证据
- 下游任务验证,提升问答和连接预测性能
方法论
使用LLM辅助的pipeline抽取生物医学文献中的证据,规范化实体,评估质量,并通过语义关系连接。
原文摘要
Biomedical knowledge resources often either preserve evidence as unstructured text or compress it into flat triples that omit study design, provenance, and quantitative support. Here we present EvidenceNet, a framework and dataset for building disease-specific knowledge graphs from full-text biomedical literature. EvidenceNet uses a large language model (LLM)-assisted pipeline to extract experimentally grounded findings as structured evidence nodes, normalize biomedical entities, score evidence quality, and connect evidence records through typed semantic relations. We release two resources: EvidenceNet-HCC with 7,872 evidence records, 10,328 graph nodes, and 49,756 edges, and EvidenceNet-CRC with 6,622 records, 8,795 nodes, and 39,361 edges. Technical validation shows high component fidelity, including 98.3% field-level extraction accuracy, 100.0% high-confidence entity-link accuracy, 87.5% fusion integrity, and 90.0% semantic relation-type accuracy. In downstream evaluation, EvidenceNet improves internal and external retrieval-augmented question answering and retains structural signal for future link prediction and target prioritization. These results establish EvidenceNet as a disease-specific resource for evidence-aware biomedical reasoning and hypothesis generation.