LLM Memory & RAG 相关度: 9/10

Retrieval Augmented Generation of Literature-derived Polymer Knowledge: The Example of a Biodegradable Polymer Expert System

Sonakshi Gupta, Akhlak Mahmood, Wei Xiong, Rampi Ramprasad

arXiv: 2602.16650v1 发布: 2026-02-18 更新: 2026-02-18

下载 PDF arXiv 页面

AI 摘要

论文提出两种检索增强生成方法，用于从聚合物文献中提取知识，并构建可信赖的材料科学助手。

主要贡献

开发了两种检索流水线：VectorRAG和GraphRAG
构建了PHA文献的上下文保留段落嵌入和规范化知识图谱
验证了GraphRAG在精度和可解释性方面更优，VectorRAG在召回率方面更优

方法论

构建VectorRAG和GraphRAG两种检索流水线，结合大型语言模型，通过检索聚合物文献生成知识，并由领域专家进行验证。

原文摘要

Polymer literature contains a large and growing body of experimental knowledge, yet much of it is buried in unstructured text and inconsistent terminology, making systematic retrieval and reasoning difficult. Existing tools typically extract narrow, study-specific facts in isolation, failing to preserve the cross-study context required to answer broader scientific questions. Retrieval-augmented generation (RAG) offers a promising way to overcome this limitation by combining large language models (LLMs) with external retrieval, but its effectiveness depends strongly on how domain knowledge is represented. In this work, we develop two retrieval pipelines: a dense semantic vector-based approach (VectorRAG) and a graph-based approach (GraphRAG). Using over 1,000 polyhydroxyalkanoate (PHA) papers, we construct context-preserving paragraph embeddings and a canonicalized structured knowledge graph supporting entity disambiguation and multi-hop reasoning. We evaluate these pipelines through standard retrieval metrics, comparisons with general state-of-the-art systems such as GPT and Gemini, and qualitative validation by a domain chemist. The results show that GraphRAG achieves higher precision and interpretability, while VectorRAG provides broader recall, highlighting complementary trade-offs. Expert validation further confirms that the tailored pipelines, particularly GraphRAG, produce well-grounded, citation-reliable responses with strong domain relevance. By grounding every statement in evidence, these systems enable researchers to navigate the literature, compare findings across studies, and uncover patterns that are difficult to extract manually. More broadly, this work establishes a practical framework for building materials science assistants using curated corpora and retrieval design, reducing reliance on proprietary models while enabling trustworthy literature analysis at scale.

arXiv 分类

cs.CE cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类