Multimodal Learning 相关度: 9/10

Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis

Xingxing Weng, Ruifeng Ni, Chao Pang, XiangYu Hao, Yishan Wang, Xiaokang Zhang, Wei Xu, Gui-Song Xia
arXiv: 2604.00820v1 发布: 2026-04-01 更新: 2026-04-01

AI 摘要

提出CLeaRS基准,评估遥感视觉语言模型在持续学习中的灾难性遗忘问题,并分析现有方法的局限性。

主要贡献

  • 提出了CLeaRS遥感持续视觉语言学习基准
  • 定义了三种评估协议:长时程、模态增量和任务增量
  • 评估了现有视觉语言模型和持续学习方法在CLeaRS上的表现

方法论

构建包含10个子集(207k图像-文本对)的遥感数据集,并定义了三个评估协议,对现有模型进行基准测试。

原文摘要

Current remote sensing vision-language models (RS VLMs) demonstrate impressive performance in image interpretation but rely on static training data, limiting their ability to accommodate continuously emerging sensing modalities and downstream tasks. This exposes a fundamental challenge: enabling RS VLMs to continually adapt without catastrophic forgetting. Despite its practical importance, the continual learning capability of RS VLMs remains underexplored, and no dedicated benchmark currently exists. In this work, we present CLeaRS, a comprehensive benchmark for continual vision-language learning in remote sensing. CLeaRS comprises 10 curated subsets with over 207k image-text pairs, spanning diverse interpretation tasks, sensing modalities, and application scenarios. We further define three evaluation protocols: long-horizon, modality-incremental, and task-incremental settings, to systematically assess continual adaptation. Extensive benchmarking of diverse vision-language models reveals catastrophic forgetting across all settings. Moreover, representative continual learning methods, when adapted to RS VLMs, exhibit limited effectiveness in handling task, instruction, and modality transitions. Our findings underscore the need for developing continual learning methods tailored to RS VLMs.

标签

遥感 视觉语言模型 持续学习 基准测试

arXiv 分类

cs.CV