Agent Tuning & Optimization 相关度: 8/10

Evaluating 5W3H Structured Prompting for Intent Alignment in Human-AI Interaction

Peng Gang
arXiv: 2603.18976v1 发布: 2026-03-19 更新: 2026-03-19

AI 摘要

论文评估了基于5W3H结构的prompt方法PPS,以提升人机交互中意图对齐的效果,尤其在高歧义任务中。

主要贡献

  • 提出了goal_alignment指标,用于评估AI输出与用户意图的对齐程度
  • 验证了结构化prompt PPS在提升意图对齐方面的有效性,尤其是在高歧义任务中
  • 揭示了标准LLM评估中存在的测量不对称性,并强调了结构化prompt的实际价值

方法论

通过控制变量的三组实验,在三个领域和三个LLM上测试不同prompt形式的效果,并由LLM裁判评估。

原文摘要

Natural language prompts often suffer from intent transmission loss: the gap between what users actually need and what they communicate to AI systems. We evaluate PPS (Prompt Protocol Specification), a 5W3H-based framework for structured intent representation in human-AI interaction. In a controlled three-condition study across 60 tasks in three domains (business, technical, and travel), three large language models (DeepSeek-V3, Qwen-Max, and Kimi), and three prompt conditions - (A) simple prompts, (B) raw PPS JSON, and (C) natural-language-rendered PPS - we collect 540 AI-generated outputs evaluated by an LLM judge. We introduce goal_alignment, a user-intent-centered evaluation dimension, and find that rendered PPS outperforms both simple prompts and raw JSON on this metric. PPS gains are task-dependent: gains are large in high-ambiguity business analysis tasks but reverse in low-ambiguity travel planning. We also identify a measurement asymmetry in standard LLM evaluation, where unconstrained prompts can inflate constraint adherence scores and mask the practical value of structured prompting. A preliminary retrospective survey (N = 20) further suggests a 66.1% reduction in follow-up prompts required, from 3.33 to 1.13 rounds. These findings suggest that structured intent representations can improve alignment and usability in human-AI interaction, especially in tasks where user intent is inherently ambiguous.

标签

Prompt Engineering Intent Alignment Human-AI Interaction

arXiv 分类

cs.AI