Multimodal Learning 相关度: 9/10

Beyond Pixels: Vector-to-Graph Transformation for Reliable Schematic Auditing

Chengwei Ma, Zhen Tian, Zhou Zhou, Zhixian Xu, Xiaowei Zhu, Xia Hua, Si Shi, F. Richard Yu
arXiv: 2602.11678v1 发布: 2026-02-12 更新: 2026-02-12

AI 摘要

提出Vector-to-Graph方法,解决MLLM在工程图审核中结构盲视问题,提升审核准确率。

主要贡献

  • 提出Vector-to-Graph (V2G) 转换方法,将CAD图转换为属性图
  • 证明了像素方法在工程图理解上的局限性
  • 构建电气合规性检查诊断基准并开源

方法论

将CAD图转换为属性图,节点表示组件,边表示连接性,使结构依赖关系显式化并可审计。

原文摘要

Multimodal Large Language Models (MLLMs) have shown remarkable progress in visual understanding, yet they suffer from a critical limitation: structural blindness. Even state-of-the-art models fail to capture topology and symbolic logic in engineering schematics, as their pixel-driven paradigm discards the explicit vector-defined relations needed for reasoning. To overcome this, we propose a Vector-to-Graph (V2G) pipeline that converts CAD diagrams into property graphs where nodes represent components and edges encode connectivity, making structural dependencies explicit and machine-auditable. On a diagnostic benchmark of electrical compliance checks, V2G yields large accuracy gains across all error categories, while leading MLLMs remain near chance level. These results highlight the systemic inadequacy of pixel-based methods and demonstrate that structure-aware representations provide a reliable path toward practical deployment of multimodal AI in engineering domains. To facilitate further research, we release our benchmark and implementation at https://github.com/gm-embodied/V2G-Audit.

标签

Multimodal Learning Graph Representation Engineering Schematics CAD Reasoning

arXiv 分类

cs.AI cs.CV