Agent Tuning & Optimization 相关度: 6/10

Amortized Molecular Optimization via Group Relative Policy Optimization

Muhammad bin Javaid, Hasham Hussain, Ashima Khanna, Berke Kisin, Jonathan Pirnay, Alexander Mitsos, Dominik G. Grimm, Martin Grohe
arXiv: 2602.12162v1 发布: 2026-02-12 更新: 2026-02-12

AI 摘要

GRXForm通过组相对策略优化方法,提升了分子优化模型在未见结构上的泛化能力。

主要贡献

  • 提出GRXForm分子优化模型
  • 引入组相对策略优化(GRPO)方法
  • 实现了在多目标优化中与领先Instance Optimizer竞争的性能

方法论

使用预训练图Transformer模型,通过原子和键的添加进行分子优化,并采用GRPO进行目标导向微调。

原文摘要

Molecular design encompasses tasks ranging from de-novo design to structural alteration of given molecules or fragments. For the latter, state-of-the-art methods predominantly function as "Instance Optimizers'', expending significant compute restarting the search for every input structure. While model-based approaches theoretically offer amortized efficiency by learning a policy transferable to unseen structures, existing methods struggle to generalize. We identify a key failure mode: the high variance arising from the heterogeneous difficulty of distinct starting structures. To address this, we introduce GRXForm, adapting a pre-trained Graph Transformer model that optimizes molecules via sequential atom-and-bond additions. We employ Group Relative Policy Optimization (GRPO) for goal-directed fine-tuning to mitigate variance by normalizing rewards relative to the starting structure. Empirically, GRXForm generalizes to out-of-distribution molecular scaffolds without inference-time oracle calls or refinement, achieving scores in multi-objective optimization competitive with leading instance optimizers.

标签

分子设计 图神经网络 强化学习 泛化能力

arXiv 分类

cs.LG