Multimodal Learning 相关度: 9/10

Annotation Free Spacecraft Detection and Segmentation using Vision Language Models

Samet Hicsonmez, Jose Sosa, Dan Pineau, Inder Pal Singh, Arunkumar Rathinam, Abd El Rahman Shabayek, Djamila Aouada

arXiv: 2602.04699v1 发布: 2026-02-04 更新: 2026-02-04

下载 PDF arXiv 页面

AI 摘要

提出一种基于视觉语言模型（VLM）的无标注航天器检测与分割框架，显著提升了航天器图像处理性能。

主要贡献

提出annotation-free的航天器检测与分割流程
利用预训练VLM自动生成伪标签
应用teacher-student模型蒸馏框架
实验证明在多个数据集上性能提升

方法论

利用VLM生成伪标签，构建teacher-student框架，通过标签蒸馏训练轻量级模型，实现无标注航天器检测与分割。

原文摘要

Vision Language Models (VLMs) have demonstrated remarkable performance in open-world zero-shot visual recognition. However, their potential in space-related applications remains largely unexplored. In the space domain, accurate manual annotation is particularly challenging due to factors such as low visibility, illumination variations, and object blending with planetary backgrounds. Developing methods that can detect and segment spacecraft and orbital targets without requiring extensive manual labeling is therefore of critical importance. In this work, we propose an annotation-free detection and segmentation pipeline for space targets using VLMs. Our approach begins by automatically generating pseudo-labels for a small subset of unlabeled real data with a pre-trained VLM. These pseudo-labels are then leveraged in a teacher-student label distillation framework to train lightweight models. Despite the inherent noise in the pseudo-labels, the distillation process leads to substantial performance gains over direct zero-shot VLM inference. Experimental evaluations on the SPARK-2024, SPEED+, and TANGO datasets on segmentation tasks demonstrate consistent improvements in average precision (AP) by up to 10 points. Code and models are available at https://github.com/giddyyupp/annotation-free-spacecraft-segmentation.

arXiv 分类

cs.CV