Multimodal Learning 相关度: 9/10

Cross-modal learning for plankton recognition

Joona Kareinen, Veikka Immonen, Tuomas Eerola, Lumi Haraguchi, Lasse Lensu, Kaisa Kraft, Sanna Suikkanen, Heikki Kälviäinen

arXiv: 2603.16427v1 发布: 2026-03-17 更新: 2026-03-17

下载 PDF arXiv 页面

AI 摘要

提出一种基于自监督跨模态学习的浮游生物识别方法，有效利用图像和光学测量数据，减少标注需求。

主要贡献

提出基于对比学习的跨模态浮游生物识别方法
利用光学测量数据辅助图像识别，减少人工标注
验证了该方法在浮游生物识别上的有效性

方法论

使用对比学习预训练图像和光学测量数据的编码器，利用k-NN分类器进行浮游生物识别，结合少量标注数据。

原文摘要

This paper considers self-supervised cross-modal coordination as a strategy enabling utilization of multiple modalities and large volumes of unlabeled plankton data to build models for plankton recognition. Automated imaging instruments facilitate the continuous collection of plankton image data on a large scale. Current methods for automatic plankton image recognition rely primarily on supervised approaches, which require labeled training sets that are labor-intensive to collect. On the other hand, some modern plankton imaging instruments complement image information with optical measurement data, such as scatter and fluorescence profiles, which currently are not widely utilized in plankton recognition. In this work, we explore the possibility of using such measurement data to guide the learning process without requiring manual labeling. Inspired by the concepts behind Contrastive Language-Image Pre-training, we train encoders for both modalities using only binary supervisory information indicating whether a given image and profile originate from the same particle or from different particles. For plankton recognition, we employ a small labeled gallery of known plankton species combined with a $k$-NN classifier. This approach yields a recognition model that is inherently multimodal, i.e., capable of utilizing information extracted from both image and profile data. We demonstrate that the proposed method achieves high recognition accuracy while requiring only a minimal number of labeled images. Furthermore, we show that the approach outperforms an image-only self-supervised baseline. Code available at https://github.com/Jookare/cross-modal-plankton.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类