Towards a Neural Debugger for Python
AI 摘要
提出神经调试器,通过条件执行建模,使LLM能够模拟传统调试器进行代码调试和理解。
主要贡献
- 提出神经调试器概念,使LLM具备交互式代码调试能力
- 通过微调LLM或从头训练小模型实现神经调试器
- 验证了神经调试器在正向和反向执行建模方面的有效性
方法论
通过在Python执行轨迹上训练大型语言模型,并进行微调或从头训练,使其模拟传统调试器的功能。
原文摘要
Training large language models (LLMs) on Python execution traces grounds them in code execution and enables the line-by-line execution prediction of whole Python programs, effectively turning them into neural interpreters (FAIR CodeGen Team et al., 2025). However, developers rarely execute programs step by step; instead, they use debuggers to stop execution at certain breakpoints and step through relevant portions only while inspecting or modifying program variables. Existing neural interpreter approaches lack such interactive control. To address this limitation, we introduce neural debuggers: language models that emulate traditional debuggers, supporting operations such as stepping into, over, or out of functions, as well as setting breakpoints at specific source lines. We show that neural debuggers -- obtained via fine-tuning large LLMs or pre-training smaller models from scratch -- can reliably model both forward execution (predicting future states and outputs) and inverse execution (inferring prior states or inputs) conditioned on debugger actions. Evaluated on CruxEval, our models achieve strong performance on both output and input prediction tasks, demonstrating robust conditional execution modeling. Our work takes first steps towards future agentic coding systems in which neural debuggers serve as a world model for simulated debugging environments, providing execution feedback or enabling agents to interact with real debugging tools. This capability lays the foundation for more powerful code generation, program understanding, and automated debugging.