Reinforced Reasoning for End-to-End Retrosynthetic Planning
AI 摘要
ReTriP是一个端到端生成框架,将逆合成路线规划转化为直接的CoT推理任务,并在长程规划中表现出色。
主要贡献
- 提出了ReTriP端到端生成框架
- 使用路径一致的分子表示
- 采用从推理蒸馏到强化学习的渐进式训练
方法论
将逆合成路线规划建模为CoT推理任务,利用强化学习优化路线,并采用渐进式训练策略。
原文摘要
Retrosynthetic planning is a fundamental task in organic chemistry, yet remains challenging due to its combinatorial complexity. To address this, conventional approaches typically rely on hybrid frameworks that combine single-step predictions with external search heuristics, inevitably fracturing the logical coherence between local molecular transformations and global planning objectives. To bridge this gap and embed sophisticated strategic foresight directly into the model's chemical reasoning, we introduce ReTriP, an end-to-end generative framework that reformulates retrosynthesis as a direct Chain-of-Thought reasoning task. We establish a path-coherent molecular representation and employ a progressive training curriculum that transitions from reasoning distillation to reinforcement learning with verifiable rewards, effectively aligning stepwise generation with practical route utility. Empirical evaluation on RetroBench demonstrates that ReTriP achieves state-of-the-art performance, exhibiting superior robustness in long-horizon planning compared to hybrid baselines.