YieldSAT: A Multimodal Benchmark Dataset for High-Resolution Crop Yield Prediction
AI 摘要
YieldSAT数据集发布,用于高分辨率农作物产量预测,包含多模态数据和深度学习模型。
主要贡献
- 发布了大规模、高质量的农作物产量预测数据集YieldSAT
- 提出了基于深度学习的像素回归方法用于产量预测
- 提出了domain-informed Deep Ensemble方法应对数据分布偏移
方法论
利用多光谱卫星图像和环境数据,通过深度学习模型(包括像素回归和Deep Ensemble)进行农作物产量预测。
原文摘要
Crop yield prediction requires substantial data to train scalable models. However, creating yield prediction datasets is constrained by high acquisition costs, heterogeneous data quality, and data privacy regulations. Consequently, existing datasets are scarce, low in quality, or limited to regional levels or single crop types, hindering the development of scalable data-driven solutions. In this work, we release YieldSAT, a large, high-quality, and multimodal dataset for high-resolution crop yield prediction. YieldSAT spans various climate zones across multiple countries, including Argentina, Brazil, Uruguay, and Germany, and includes major crop types, including corn, rapeseed, soybeans, and wheat, across 2,173 expert-curated fields. In total, over 12.2 million yield samples are available, each with a spatial resolution of 10 m. Each field is paired with multispectral satellite imagery, resulting in 113,555 labeled satellite images, complemented by auxiliary environmental data. We demonstrate the potential of large-scale and high-resolution crop yield prediction as a pixel regression task by comparing various deep learning models and data fusion architectures. Furthermore, we highlight open challenges arising from severe distribution shifts in the ground truth data under real-world conditions. To mitigate this, we explore a domain-informed Deep Ensemble approach that exhibits significant performance gains. The dataset is available at https://yieldsat.github.io/.