Evaluating 5W3H Structured Prompting for Intent Alignment in Human-AI Interaction
AI 摘要
论文评估了基于5W3H结构的prompt方法PPS,以提升人机交互中意图对齐的效果,尤其在高歧义任务中。
主要贡献
- 提出了goal_alignment指标,用于评估AI输出与用户意图的对齐程度
- 验证了结构化prompt PPS在提升意图对齐方面的有效性,尤其是在高歧义任务中
- 揭示了标准LLM评估中存在的测量不对称性,并强调了结构化prompt的实际价值
方法论
通过控制变量的三组实验,在三个领域和三个LLM上测试不同prompt形式的效果,并由LLM裁判评估。
原文摘要
Natural language prompts often suffer from intent transmission loss: the gap between what users actually need and what they communicate to AI systems. We evaluate PPS (Prompt Protocol Specification), a 5W3H-based framework for structured intent representation in human-AI interaction. In a controlled three-condition study across 60 tasks in three domains (business, technical, and travel), three large language models (DeepSeek-V3, Qwen-Max, and Kimi), and three prompt conditions - (A) simple prompts, (B) raw PPS JSON, and (C) natural-language-rendered PPS - we collect 540 AI-generated outputs evaluated by an LLM judge. We introduce goal_alignment, a user-intent-centered evaluation dimension, and find that rendered PPS outperforms both simple prompts and raw JSON on this metric. PPS gains are task-dependent: gains are large in high-ambiguity business analysis tasks but reverse in low-ambiguity travel planning. We also identify a measurement asymmetry in standard LLM evaluation, where unconstrained prompts can inflate constraint adherence scores and mask the practical value of structured prompting. A preliminary retrospective survey (N = 20) further suggests a 66.1% reduction in follow-up prompts required, from 3.33 to 1.13 rounds. These findings suggest that structured intent representations can improve alignment and usability in human-AI interaction, especially in tasks where user intent is inherently ambiguous.