Multimodal Learning 相关度: 9/10

Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation

Minggui He, Mingchen Dai, Jian Zhang, Yilun Liu, Shimin Tao, Pufan Zeng, Osamu Yoshie, Yuya Ieiri
arXiv: 2602.10880v1 发布: 2026-02-11 更新: 2026-02-11

AI 摘要

提出Chart Specification,通过结构化表示和细粒度监督提升VLM图表转代码的结构保真度。

主要贡献

  • 提出Chart Specification结构化中间表示
  • 设计Spec-Align Reward进行结构正确性反馈
  • 实验证明在图表转代码任务中优于现有方法

方法论

构建结构平衡的训练集,利用Spec-Align Reward进行强化学习,强制执行一致的绘图逻辑。

原文摘要

Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images, yet achieving structural fidelity remains challenging. Existing approaches largely rely on supervised fine-tuning, encouraging surface-level token imitation rather than faithful modeling of underlying chart structure, which often leads to hallucinated or semantically inconsistent outputs. We propose Chart Specification, a structured intermediate representation that shifts training from text imitation to semantically grounded supervision. Chart Specification filters syntactic noise to construct a structurally balanced training set and supports a Spec-Align Reward that provides fine-grained, verifiable feedback on structural correctness, enabling reinforcement learning to enforce consistent plotting logic. Experiments on three public benchmarks show that our method consistently outperforms prior approaches. With only 3K training samples, we achieve strong data efficiency, surpassing leading baselines by up to 61.7% on complex benchmarks, and scaling to 4K samples establishes new state-of-the-art results across all evaluated metrics. Overall, our results demonstrate that precise structural supervision offers an efficient pathway to high-fidelity chart-to-code generation. Code and dataset are available at: https://github.com/Mighten/chart-specification-paper

标签

VLM 图表转代码 结构化表示 强化学习

arXiv 分类

cs.CV