AI Agents 相关度: 9/10

TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis

Pepe Alonso

arXiv: 2603.17973v1 发布: 2026-03-18 更新: 2026-03-18

下载 PDF arXiv 页面

AI 摘要

TDAD工具通过图分析减少AI编码Agent的代码回归，提高代码修复成功率。

主要贡献

提出TDAD工具和benchmark，用于评估和降低AI coding agent的代码回归。
GraphRAG工作流显著降低了测试级别的回归，并提高了问题解决率。
发现提供上下文信息比提供程序指令更有效，尤其对小模型。

方法论

构建AST代码-测试图，通过加权影响分析，找出受代码变更影响最大的测试用例。

原文摘要

AI coding agents can resolve real-world software issues, yet they frequently introduce regressions, breaking tests that previously passed. Current benchmarks focus almost exclusively on resolution rate, leaving regression behavior under-studied. This paper presents TDAD (Test-Driven Agentic Development), an open-source tool and benchmark methodology that combines abstract-syntax-tree (AST) based code-test graph construction with weighted impact analysis to surface the tests most likely affected by a proposed change. Evaluated on SWE-bench Verified with two local models (Qwen3-Coder 30B on 100 instances and Qwen3.5-35B-A3B on 25 instances), TDAD's GraphRAG workflow reduced test-level regressions by 70% (6.08% to 1.82%) and improved resolution from 24% to 32% when deployed as an agent skill. A surprising finding is that TDD prompting alone increased regressions (9.94%), revealing that smaller models benefit more from contextual information (which tests to verify) than from procedural instructions (how to do TDD). An autonomous auto-improvement loop raised resolution from 12% to 60% on a 10-instance subset with 0% regression. These findings suggest that for AI agent tool design, surfacing contextual information outperforms prescribing procedural workflows. All code, data, and logs are publicly available at https://github.com/pepealonso95/TDAD.

arXiv 分类

cs.SE cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类