Multimodal Learning 相关度: 8/10

RSGen: Enhancing Layout-Driven Remote Sensing Image Generation with Diverse Edge Guidance

Xianbao Hou, Yonghao He, Zeyd Boukhers, John See, Hu Su, Wei Sui, Cong Yang

arXiv: 2603.15484v1 发布: 2026-03-16 更新: 2026-03-16

下载 PDF arXiv 页面

AI 摘要

RSGen通过多样化边缘引导，提升布局驱动的遥感图像生成效果，优化目标检测。

主要贡献

提出RSGen框架，利用边缘信息增强遥感图像生成
使用图像到图像生成增强边缘图的多样性
在多个基线模型上验证了RSGen的有效性

方法论

RSGen通过图像到图像生成增强边缘图多样性，并以此引导布局到图像模型，实现像素级控制。

原文摘要

Diffusion models have significantly mitigated the impact of annotated data scarcity in remote sensing (RS). Although recent approaches have successfully harnessed these models to enable diverse and controllable Layout-to-Image (L2I) synthesis, they still suffer from limited fine-grained control and fail to strictly adhere to bounding box constraints. To address these limitations, we propose RSGen, a plug-and-play framework that leverages diverse edge guidance to enhance layout-driven RS image generation. Specifically, RSGen employs a progressive enhancement strategy: 1) it first enriches the diversity of edge maps composited from retrieved training instances via Image-to-Image generation; and 2) subsequently utilizes these diverse edge maps as conditioning for existing L2I models to enforce pixel-level control within bounding boxes, ensuring the generated instances strictly adhere to the layout. Extensive experiments across three baseline models demonstrate that RSGen significantly boosts the capabilities of existing L2I models. For instance, with CC-Diff on the DOTA dataset for oriented object detection, we achieve remarkable gains of +9.8/+12.0 in YOLOScore mAP50/mAP50-95 and +1.6 in mAP on the downstream detection task. Our code will be publicly available: https://github.com/D-Robotics-AI-Lab/RSGen

arXiv 分类

cs.CV cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类