LLM Reasoning 相关度: 8/10

CREATE: Testing LLMs for Associative Creativity

Manya Wadhwa, Tiasa Singha Roy, Harvey Lederman, Junyi Jessy Li, Greg Durrett
arXiv: 2603.09970v1 发布: 2026-03-10 更新: 2026-03-10

AI 摘要

提出了CREATE基准,用于评估LLM的联想创造力,通过路径生成衡量概念间的连接。

主要贡献

  • 提出CREATE基准用于评估联想创造力
  • 定义了路径的specificity和diversity指标
  • 评估了前沿模型在CREATE上的表现

方法论

设计任务让LLM生成连接概念的路径集合,根据路径的specificity和diversity进行评分,评估模型在该任务上的表现。

原文摘要

A key component of creativity is associative reasoning: the ability to draw novel yet meaningful connections between concepts. We introduce CREATE, a benchmark designed to evaluate models' capacity for creative associative reasoning. CREATE requires models to generate sets of paths connecting concepts in a model's parametric knowledge. Paths should have high specificity (distinctiveness and closeness of the concept connection) and high diversity (dissimilarity from other paths), and models are scored more highly if they produce a larger set of strong, diverse paths. This task shares demands of real creativity tasks like hypothesis generation, including an extremely large search space, but enables collection of a sizable benchmark with objective answer grading. Evaluation of frontier models shows that the strongest models achieve higher creative utility than others, with the high multiplicity of answers and complexity of the search making benchmark saturation difficult to achieve. Furthermore, our results illustrate that thinking models are not always more effective on our task, even with high token budgets. Recent approaches for creative prompting give some but limited additional improvement. CREATE provides a sandbox for developing new methods to improve models' capacity for associative creativity.

标签

联想创造力 基准测试 LLM评估 知识表示

arXiv 分类

cs.CL