Agent Tuning & Optimization 相关度: 9/10

To Write or to Automate Linguistic Prompts, That Is the Question

Marina Sánchez-Torrón, Daria Akselrod, Jason Rauchwerk
arXiv: 2603.25169v1 发布: 2026-03-26 更新: 2026-03-26

AI 摘要

论文对比了手工prompt、基础DSPy和GEPA优化DSPy在语言任务中的表现,结果依赖于具体任务。

主要贡献

  • 首次系统性对比手工prompt和自动prompt优化
  • 评估了不同模型配置下的prompt效果
  • 揭示了任务对prompt选择的依赖性

方法论

通过翻译、术语插入和语言质量评估三个任务,对比手工prompt、基础DSPy签名和GEPA优化DSPy签名,分析五种模型配置的结果。

原文摘要

LLM performance is highly sensitive to prompt design, yet whether automatic prompt optimization can replace expert prompt engineering in linguistic tasks remains unexplored. We present the first systematic comparison of hand-crafted zero-shot expert prompts, base DSPy signatures, and GEPA-optimized DSPy signatures across translation, terminology insertion, and language quality assessment, evaluating five model configurations. Results are task-dependent. In terminology insertion, optimized and manual prompts produce mostly statistically indistinguishable quality. In translation, each approach wins on different models. In LQA, expert prompts achieve stronger error detection while optimization improves characterization. Across all tasks, GEPA elevates minimal DSPy signatures, and the majority of expert-optimized comparisons show no statistically significant difference. We note that the comparison is asymmetric: GEPA optimization searches programmatically over gold-standard splits, whereas expert prompts require in principle no labeled data, relying instead on domain expertise and iterative refinement.

标签

Prompt Engineering Prompt Optimization DSPy LLM Evaluation GEPA

arXiv 分类

cs.CL