Multimodal Learning 相关度: 9/10

HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images

Kundan Thota, Xuanhao Mu, Thorsten Schlachter, Veit Hagenmeyer
arXiv: 2602.20066v1 发布: 2026-02-23 更新: 2026-02-23

AI 摘要

HeatPrompt利用视觉-语言模型和卫星图像,零样本预测城市热需求,提升预测精度。

主要贡献

  • 提出HeatPrompt零样本热需求预测框架
  • 利用预训练VLM提取语义特征进行热需求建模
  • 在数据稀缺地区提供轻量级的热规划支持

方法论

使用VLM提取卫星图像的语义特征,结合GIS和建筑特征,训练MLP回归器预测热需求。

原文摘要

Accurate heat-demand maps play a crucial role in decarbonizing space heating, yet most municipalities lack detailed building-level data needed to calculate them. We introduce HeatPrompt, a zero-shot vision-language energy modeling framework that estimates annual heat demand using semantic features extracted from satellite images, basic Geographic Information System (GIS), and building-level features. We feed pretrained Large Vision Language Models (VLMs) with a domain-specific prompt to act as an energy planner and extract the visual attributes such as roof age, building density, etc, from the RGB satellite image that correspond to the thermal load. A Multi-Layer Perceptron (MLP) regressor trained on these captions shows an $R^2$ uplift of 93.7% and shrinks the mean absolute error (MAE) by 30% compared to the baseline model. Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions.

标签

视觉语言模型 热需求预测 卫星图像 零样本学习

arXiv 分类

cs.CV cs.AI