Multimodal Learning 相关度: 9/10

GeoHeight-Bench: Towards Height-Aware Multimodal Reasoning in Remote Sensing

Xuran Hu, Zhitong Xiong, Zhongcheng Hong, Yifang Ban, Xiaoxiang Zhu, Wufan Zhao
arXiv: 2603.25565v1 发布: 2026-03-26 更新: 2026-03-26

AI 摘要

提出了针对遥感图像高度感知能力的多模态大模型评估框架与基线模型。

主要贡献

  • 构建了用于相对高度分析的GeoHeight-Bench基准。
  • 构建了更具挑战性的地形感知推理GeoHeight-Bench+基准。
  • 提出了首个高度感知的遥感LMM基线模型GeoHeightChat。

方法论

利用VLM驱动的数据生成流程,结合系统性Prompt工程和元数据提取构建基准,并设计高度感知LMM。

原文摘要

Current Large Multimodal Models (LMMs) in Earth Observation typically neglect the critical "vertical" dimension, limiting their reasoning capabilities in complex remote sensing geometries and disaster scenarios where physical spatial structures often outweigh planar visual textures. To bridge this gap, we introduce a comprehensive evaluation framework dedicated to height-aware remote sensing understanding. First, to overcome the severe scarcity of annotated data, we develop a scalable, VLM-driven data generation pipeline utilizing systematic prompt engineering and metadata extraction. This pipeline constructs two complementary benchmarks: GeoHeight-Bench for relative height analysis, and a more challenging GeoHeight-Bench+ for holistic, terrain-aware reasoning. Furthermore, to validate the necessity of height perception, we propose GeoHeightChat, the first height-aware remote sensing LMM baseline. Serving as a strong proof of concept, our baseline demonstrates that synergizing visual semantics with implicitly injected height geometric features effectively mitigates the "vertical blind spot", successfully unlocking a new paradigm of interactive height reasoning in existing optical models.

标签

遥感 多模态学习 高度感知 基准测试

arXiv 分类

cs.CV