Comparative analysis of dual-form networks for live land monitoring using multi-modal satellite image time series
AI 摘要
研究双形式注意力机制在多模卫星图像时间序列土地监测中的应用,提升效率。
主要贡献
- 提出基于双形式注意力机制的高效多模SITS分析方法
- 设计了针对时序不规则和未对齐问题的时序自适应双形式机制
- 验证了该方法在SITS预测和太阳能电池板建设监测中的有效性
方法论
使用线性注意力和保留机制构建多模态谱-时间编码器,并进行时间自适应。
原文摘要
Multi-modal Satellite Image Time Series (SITS) analysis faces significant computational challenges for live land monitoring applications. While Transformer architectures excel at capturing temporal dependencies and fusing multi-modal data, their quadratic computational complexity and the need to reprocess entire sequences for each new acquisition limit their deployment for regular, large-area monitoring. This paper studies various dual-form attention mechanisms for efficient multi-modal SITS analysis, that enable parallel training while supporting recurrent inference for incremental processing. We compare linear attention and retention mechanisms within a multi-modal spectro-temporal encoder. To address SITS-specific challenges of temporal irregularity and unalignment, we develop temporal adaptations of dual-form mechanisms that compute token distances based on actual acquisition dates rather than sequence indices. Our approach is evaluated on two tasks using Sentinel-1 and Sentinel-2 data: multi-modal SITS forecasting as a proxy task, and real-world solar panel construction monitoring. Experimental results demonstrate that dual-form mechanisms achieve performance comparable to standard Transformers while enabling efficient recurrent inference. The multimodal framework consistently outperforms mono-modal approaches across both tasks, demonstrating the effectiveness of dual mechanisms for sensor fusion. The results presented in this work open new opportunities for operational land monitoring systems requiring regular updates over large geographic areas.