Multimodal Learning 相关度: 9/10

ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models

Manav Nitin Kapadnis, Lawanya Baghel, Atharva Naik, Carolyn Rosé
arXiv: 2602.15758v1 发布: 2026-02-17 更新: 2026-02-17

AI 摘要

提出了ChartEditBench基准,用于评估多模态大模型在多轮图表编辑中的能力。

主要贡献

  • 提出了 ChartEditBench 基准数据集
  • 设计了评估多轮图表编辑能力的框架
  • 分析了现有 MLLM 在该任务上的性能瓶颈

方法论

构建包含5000个图表编辑链的数据集,并结合执行、视觉相似度和代码验证来评估模型。

原文摘要

While Multimodal Large Language Models (MLLMs) perform strongly on single-turn chart generation, their ability to support real-world exploratory data analysis remains underexplored. In practice, users iteratively refine visualizations through multi-turn interactions that require maintaining common ground, tracking prior edits, and adapting to evolving preferences. We introduce ChartEditBench, a benchmark for incremental, visually grounded chart editing via code, comprising 5,000 difficulty-controlled modification chains and a rigorously human-verified subset. Unlike prior one-shot benchmarks, ChartEditBench evaluates sustained, context-aware editing. We further propose a robust evaluation framework that mitigates limitations of LLM-as-a-Judge metrics by integrating execution-based fidelity checks, pixel-level visual similarity, and logical code verification. Experiments with state-of-the-art MLLMs reveal substantial degradation in multi-turn settings due to error accumulation and breakdowns in shared context, with strong performance on stylistic edits but frequent execution failures on data-centric transformations. ChartEditBench, establishes a challenging testbed for grounded, intent-aware multimodal programming.

标签

多模态 图表编辑 基准测试 代码生成

arXiv 分类

cs.CL cs.AI