Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale
AI 摘要
Sandpiper是一个AI辅助教育对话数据分析系统,旨在提高研究效率和数据分析质量。
主要贡献
- 提出Sandpiper系统,桥接海量数据和专家分析
- 利用LLM进行自动化标注,并确保数据隐私
- 集成评估引擎,持续优化AI性能
方法论
混合主动系统,结合交互式仪表盘和LLM引擎,通过schema约束和codebook强制执行,实现可扩展的定性分析。
原文摘要
Digital educational environments are expanding toward complex AI and human discourse, providing researchers with an abundance of data that offers deep insights into learning and instructional processes. However, traditional qualitative analysis remains a labor-intensive bottleneck, severely limiting the scale at which this research can be conducted. We present Sandpiper, a mixed-initiative system designed to serve as a bridge between high-volume conversational data and human qualitative expertise. By tightly coupling interactive researcher dashboards with agentic Large Language Model (LLM) engines, the platform enables scalable analysis without sacrificing methodological rigor. Sandpiper addresses critical barriers to AI adoption in education by implementing context-aware, automated de-identification workflows supported by secure, university-housed infrastructure to ensure data privacy. Furthermore, the system employs schema-constrained orchestration to eliminate LLM hallucinations and enforces strict adherence to qualitative codebooks. An integrated evaluations engine allows for the continuous benchmarking of AI performance against human labels, fostering an iterative approach to model refinement and validation. We propose a user study to evaluate the system's efficacy in improving research efficiency, inter-rater reliability, and researcher trust in AI-assisted qualitative workflows.