Multimodal Learning 相关度: 9/10

PReD: An LLM-based Foundation Multimodal Model for Electromagnetic Perception, Recognition, and Decision

Zehua Han, Jing Xiao, Yiqi Duan, Mengyu Xiang, Yuheng Ji, Xiaolong Zheng, Chenghanyu Zhang, Zhendong She, Junyu Shen, Dingwei Tan, Shichu Sun, Zhou Cong, Mingxuan Liu, Fengxiang Wang, Jinping Sun, Yangang Sun

arXiv: 2603.28183v1 发布: 2026-03-30 更新: 2026-03-30

下载 PDF arXiv 页面

AI 摘要

PReD是首个电磁领域的多模态大模型，实现感知、识别、决策的智能闭环。

主要贡献

构建高质量多任务电磁数据集PReD-1.3M
提出电磁领域多模态基础模型PReD
设计评估基准PReD-Bench

方法论

采用多阶段训练策略，统一电磁信号的多项任务，实现端到端信号理解到语言驱动推理和决策的闭环优化。

原文摘要

Multimodal Large Language Models have demonstrated powerful cross-modal understanding and reasoning capabilities in general domains. However, in the electromagnetic (EM) domain, they still face challenges such as data scarcity and insufficient integration of domain knowledge. This paper proposes PReD, the first foundation model for the EM domain that covers the intelligent closed-loop of "perception, recognition, decision-making." We constructed a high-quality multitask EM dataset, PReD-1.3M, and an evaluation benchmark, PReD-Bench. The dataset encompasses multi-perspective representations such as raw time-domain waveform, frequency-domain spectrograms, and constellation diagrams, covering typical features of communication and radar signals. It supports a range of core tasks, including signal detection, modulation recognition, parameter estimation, protocol recognition, radio frequency fingerprint recognition, and anti-jamming decision-making. PReD adopts a multi-stage training strategy that unifies multiple tasks for EM signals. It achieves closed-loop optimization from end-to-end signal understanding to language-driven reasoning and decision-making, significantly enhancing EM domain expertise while maintaining general multimodal capabilities. Experimental results show that PReD achieves state-of-the-art performance on PReD-Bench constructed from both open-source and self-collected signal datasets. These results collectively validate the feasibility and potential of vision-aligned foundation models in advancing the understanding and reasoning of EM signals.

arXiv 分类

cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类