LLM Memory & RAG 相关度: 8/10

Retrieval-Augmented Foundation Models for Matched Molecular Pair Transformations to Recapitulate Medicinal Chemistry Intuition

Bo Pan, Peter Zhiping Zhang, Hao-Wei Pang, Alex Zhu, Xiang Yu, Liying Zhang, Liang Zhao
arXiv: 2602.16684v1 发布: 2026-02-18 更新: 2026-02-18

AI 摘要

该论文提出了一种基于检索增强的基础模型,用于药物化学中匹配分子对转化,提升了药物设计的效率和可控性。

主要贡献

  • 提出基于大规模 MMPT 的基础模型
  • 引入可控的提示机制
  • 开发 MMPT-RAG 检索增强框架

方法论

使用变量到变量的生成方法,在大规模 MMPT 数据上训练基础模型,并通过 RAG 框架引入外部参考。

原文摘要

Matched molecular pairs (MMPs) capture the local chemical edits that medicinal chemists routinely use to design analogs, but existing ML approaches either operate at the whole-molecule level with limited edit controllability or learn MMP-style edits from restricted settings and small models. We propose a variable-to-variable formulation of analog generation and train a foundation model on large-scale MMP transformations (MMPTs) to generate diverse variables conditioned on an input variable. To enable practical control, we develop prompting mechanisms that let the users specify preferred transformation patterns during generation. We further introduce MMPT-RAG, a retrieval-augmented framework that uses external reference analogs as contextual guidance to steer generation and generalize from project-specific series. Experiments on general chemical corpora and patent-specific datasets demonstrate improved diversity, novelty, and controllability, and show that our method recovers realistic analog structures in practical discovery scenarios.

标签

药物化学 分子设计 生成模型 检索增强 MMP

arXiv 分类

cs.LG