Multimodal Learning 相关度: 9/10

COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data -- Generation Stochastic by Design

Miguel Espinosa, Eva Gmelich Meijling, Valerio Marsocci, Elliot J. Crowley, Mikolaj Czerkawski
arXiv: 2603.03239v1 发布: 2026-03-03 更新: 2026-03-03

AI 摘要

COP-GEN利用扩散Transformer对多模态地球观测数据进行条件生成建模。

主要贡献

  • 提出COP-GEN,一种多模态隐扩散Transformer模型。
  • 实现了任意到任意的条件生成,包括零样本模态转换。
  • 验证了模型在多模态数据生成方面的有效性和物理一致性。

方法论

使用扩散Transformer建模多模态地球观测数据的联合分布,实现条件生成和模态转换。

原文摘要

Earth observation applications increasingly rely on data from multiple sensors, including optical, radar, elevation, and land-cover products. Relationships between these modalities are fundamental for data integration but are inherently non-injective: identical conditioning information can correspond to multiple physically plausible observations. Thus, such conditional mappings should be parametrised as data distributions. As a result, deterministic models tend to collapse toward conditional means and fail to represent the uncertainty and variability required for tasks such as data completion and cross-sensor translation. We introduce COP-GEN, a multimodal latent diffusion transformer that models the joint distribution of heterogeneous Earth Observation modalities at their native spatial resolutions. By parameterising cross-modal mappings as conditional distributions, COP-GEN enables flexible any-to-any conditional generation, including zero-shot modality translation, spectral band infilling, and generation under partial or missing inputs, without task-specific retraining. Experiments on a large-scale global multimodal dataset show that COP-GEN generates diverse yet physically consistent realisations while maintaining strong peak fidelity across optical, radar, and elevation modalities. Qualitative and quantitative analyses demonstrate that the model captures meaningful cross-modal structure and systematically adapts its output uncertainty as conditioning information increases. These results highlight the practical importance of stochastic generative modeling for Earth observation and motivate evaluation protocols that move beyond single-reference, pointwise metrics. Website: https:// miquel-espinosa.github.io/cop-gen

标签

地球观测 多模态学习 扩散模型 Transformer

arXiv 分类

cs.CV