Multimodal Learning 相关度: 10/10

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Jiawei Zhou, Chi Zhang, Xiang Feng, Qiming Zhang, Haibo Qiu, Lihuo He, Dengpan Ye, Xinbo Gao, Jing Zhang
arXiv: 2603.17508v1 发布: 2026-03-18 更新: 2026-03-18

AI 摘要

Omni-I2C是一个用于评估LMMs将图像转化为代码能力的高保真综合基准。

主要贡献

  • 提出了Omni-I2C基准,包含1080个高质量样本
  • 涵盖多种主题、图像模态和编程语言
  • 评估框架能细致分析视觉感知和代码生成精度

方法论

构建包含多种真实用户案例的数据集,并设计评估框架,从感知和符号精度两个方面评估LMMs性能。

原文摘要

We present Omni-I2C, a comprehensive benchmark designed to evaluate the capability of Large Multimodal Models (LMMs) in converting complex, structured digital graphics into executable code. We argue that this task represents a non-trivial challenge for the current generation of LMMs: it demands an unprecedented synergy between high-fidelity visual perception -- to parse intricate spatial hierarchies and symbolic details -- and precise generative expression -- to synthesize syntactically sound and logically consistent code. Unlike traditional descriptive tasks, Omni-I2C requires a holistic understanding where any minor perceptual hallucination or coding error leads to a complete failure in visual reconstruction. Omni-I2C features 1080 meticulously curated samples, defined by its breadth across subjects, image modalities, and programming languages. By incorporating authentic user-sourced cases, the benchmark spans a vast spectrum of digital content -- from scientific visualizations to complex symbolic notations -- each paired with executable reference code. To complement this diversity, our evaluation framework provides necessary depth; by decoupling performance into perceptual fidelity and symbolic precision, it transcends surface-level accuracy to expose the granular structural failures and reasoning bottlenecks of current LMMs. Our evaluation reveals a substantial performance gap among leading LMMs; even state-of-the-art models struggle to preserve structural integrity in complex scenarios, underscoring that multimodal code generation remains a formidable challenge. Data and code are available at https://github.com/MiliLab/Omni-I2C.

标签

LMM Image-to-Code Benchmark Multimodal Learning

arXiv 分类

cs.CV