Beyond Static Pipelines: Learning Dynamic Workflows for Text-to-SQL
AI 摘要
提出SquRL框架,利用强化学习动态构建Text-to-SQL工作流,提升复杂和分布外查询性能。
主要贡献
- 提出基于强化学习的动态工作流构建框架SquRL
- 设计了规则奖励函数和动态actor masking与伪奖励机制,提升训练效率
- 证明动态工作流优于静态工作流,尤其是在复杂和分布外查询上
方法论
使用强化学习,通过规则奖励函数和actor masking等技术,训练LLM构建自适应的Text-to-SQL工作流。
原文摘要
Text-to-SQL has recently achieved impressive progress, yet remains difficult to apply effectively in real-world scenarios. This gap stems from the reliance on single static workflows, fundamentally limiting scalability to out-of-distribution and long-tail scenarios. Instead of requiring users to select suitable methods through extensive experimentation, we attempt to enable systems to adaptively construct workflows at inference time. Through theoretical and empirical analysis, we demonstrate that optimal dynamic policies consistently outperform the best static workflow, with performance gains fundamentally driven by heterogeneity across candidate workflows. Motivated by this, we propose SquRL, a reinforcement learning framework that enhances LLMs' reasoning capability in adaptive workflow construction. We design a rule-based reward function and introduce two effective training mechanisms: dynamic actor masking to encourage broader exploration, and pseudo rewards to improve training efficiency. Experiments on widely-used Text-to-SQL benchmarks demonstrate that dynamic workflow construction consistently outperforms the best static workflow methods, with especially pronounced gains on complex and out-of-distribution queries. The codes are available at https://github.com/Satissss/SquRL