Multimodal Learning 相关度: 9/10

RF-GPT: Teaching AI to See the Wireless World

Hang Zou, Yu Tian, Bohao Wang, Lina Bariah, Samson Lasaulce, Chongwen Huang, Mérouane Debbah

arXiv: 2602.14833v1 发布: 2026-02-16 更新: 2026-02-16

下载 PDF arXiv 页面

AI 摘要

RF-GPT通过视觉编码器和LLM理解RF信号，实现无线通信领域的高级推理。

主要贡献

提出了一种射频语言模型（RFLM）RF-GPT
利用多模态LLM处理和理解射频频谱图
构建了大规模合成RF数据集用于训练RF-GPT

方法论

将IQ波形映射到频谱图，利用预训练视觉编码器提取特征，注入LLM进行指令微调。

原文摘要

Large language models (LLMs) and multimodal models have become powerful general-purpose reasoning systems. However, radio-frequency (RF) signals, which underpin wireless systems, are still not natively supported by these models. Existing LLM-based approaches for telecom focus mainly on text and structured data, while conventional RF deep-learning models are built separately for specific signal-processing tasks, highlighting a clear gap between RF perception and high-level reasoning. To bridge this gap, we introduce RF-GPT, a radio-frequency language model (RFLM) that utilizes the visual encoders of multimodal LLMs to process and understand RF spectrograms. In this framework, complex in-phase/quadrature (IQ) waveforms are mapped to time-frequency spectrograms and then passed to pretrained visual encoders. The resulting representations are injected as RF tokens into a decoder-only LLM, which generates RF-grounded answers, explanations, and structured outputs. To train RF-GPT, we perform supervised instruction fine-tuning of a pretrained multimodal LLM using a fully synthetic RF corpus. Standards-compliant waveform generators produce wideband scenes for six wireless technologies, from which we derive time-frequency spectrograms, exact configuration metadata, and dense captions. A text-only LLM then converts these captions into RF-grounded instruction-answer pairs, yielding roughly 12,000 RF scenes and 0.625 million instruction examples without any manual labeling. Across benchmarks for wideband modulation classification, overlap analysis, wireless-technology recognition, WLAN user counting, and 5G NR information extraction, RF-GPT achieves strong multi-task performance, whereas general-purpose VLMs with no RF grounding largely fail.

arXiv 分类

eess.SP cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类