Multimodal Learning 相关度: 9/10

Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models

Xuesong Wang, Caisheng Wang

arXiv: 2603.08069v1 发布: 2026-03-09 更新: 2026-03-09

下载 PDF arXiv 页面

AI 摘要

利用多模态大语言模型生成缺陷图像，提升电力线绝缘子缺陷检测效果。

主要贡献

提出基于MLLM的缺陷图像生成方法
使用双参考条件和人工验证提高图像多样性和标签准确性
通过嵌入选择规则过滤合成图像以提升模型性能

方法论

利用MLLM生成合成缺陷图像，通过人工验证和嵌入选择进行优化，扩充训练集，提升分类模型效果。

原文摘要

Utility companies increasingly rely on drone imagery for post-event and routine inspection, but training accurate defect-type classifiers remains difficult because defect examples are rare and inspection datasets are often limited or proprietary. We address this data-scarcity setting by using an off-the-shelf multimodal large language model (MLLM) as a training-free image generator to synthesize defect images from visual references and text prompts. Our pipeline increases diversity via dual-reference conditioning, improves label fidelity with lightweight human verification and prompt refinement, and filters the resulting synthetic pool using an embedding-based selection rule based on distances to class centroids computed from the real training split. We evaluate on ceramic insulator defect-type classification (shell vs. glaze) using a public dataset with a realistic low training-data regime (104 real training images; 152 validation; 308 test). Augmenting the 10% real training set with embedding-selected synthetic images improves test F1 score (harmonic mean of precision and recall) from 0.615 to 0.739 (20% relative), corresponding to an estimated 4--5x data-efficiency gain, and the gains persist with stronger backbone models and frozen-feature linear-probe baselines. These results suggest a practical, low-barrier path for improving defect recognition when collecting additional real defects is slow or infeasible.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类