AI Agents 相关度: 9/10

ISD-Agent-Bench: A Comprehensive Benchmark for Evaluating LLM-based Instructional Design Agents

YoungHoon Jeon, Suwan Kim, Haein Son, Sookbun Lee, Yeil Jeong, Unggi Lee

arXiv: 2602.10620v1 发布: 2026-02-11 更新: 2026-02-11

下载 PDF arXiv 页面

AI 摘要

构建了一个评估LLM用于教学系统设计的综合基准，并验证了结合经典ISD理论的ReAct式Agent效果最佳。

主要贡献

提出了ISD-Agent-Bench基准
构建了基于Context Matrix框架的评估场景
验证了结合经典ISD理论的Agent性能

方法论

构建包含25,795个场景的ISD-Agent-Bench，使用多LLM评审确保评估可靠性，并对比不同ISD Agent的性能。

原文摘要

Large Language Model (LLM) agents have shown promising potential in automating Instructional Systems Design (ISD), a systematic approach to developing educational programs. However, evaluating these agents remains challenging due to the lack of standardized benchmarks and the risk of LLM-as-judge bias. We present ISD-Agent-Bench, a comprehensive benchmark comprising 25,795 scenarios generated via a Context Matrix framework that combines 51 contextual variables across 5 categories with 33 ISD sub-steps derived from the ADDIE model. To ensure evaluation reliability, we employ a multi-judge protocol using diverse LLMs from different providers, achieving high inter-judge reliability. We compare existing ISD agents with novel agents grounded in classical ISD theories such as ADDIE, Dick \& Carey, and Rapid Prototyping ISD. Experiments on 1,017 test scenarios demonstrate that integrating classical ISD frameworks with modern ReAct-style reasoning achieves the highest performance, outperforming both pure theory-based agents and technique-only approaches. Further analysis reveals that theoretical quality strongly correlates with benchmark performance, with theory-based agents showing significant advantages in problem-centered design and objective-assessment alignment. Our work provides a foundation for systematic LLM-based ISD research.

arXiv 分类

cs.SE cs.CL

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类