Multimodal Learning 相关度: 9/10

ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation

Zihe Wang, Yihuan Wang, Haiyang Yu. Zhiyong Cui, Xiaojian Liao, Chengcheng Wang, Yonglin Tian, Yongxin Tong

arXiv: 2603.16495v1 发布: 2026-03-17 更新: 2026-03-17

下载 PDF arXiv 页面

AI 摘要

ExpressMind是一个专为高速公路运营设计的预训练多模态大语言模型，提升智能交通认知能力。

主要贡献

构建了行业首个全栈高速公路数据集
提出了基于自监督学习和无监督学习的双层LLM预训练范式
引入了图增强RAG框架，动态索引高速公路知识库
开发了RL对齐的CoT机制，增强事件响应策略的推理能力
整合了跨模态编码器，对齐视频和文本特征

方法论

构建高速公路数据集，提出双层预训练范式和图增强RAG，并使用RL-CoT增强推理能力，整合跨模态编码器。

原文摘要

The current expressway operation relies on rule-based and isolated models, which limits the ability to jointly analyze knowledge across different systems. Meanwhile, Large Language Models (LLMs) are increasingly applied in intelligent transportation, advancing traffic models from algorithmic to cognitive intelligence. However, general LLMs are unable to effectively understand the regulations and causal relationships of events in unconventional scenarios in the expressway field. Therefore, this paper constructs a pre-trained multimodal large language model (MLLM) for expressways, ExpressMind, which serves as the cognitive core for intelligent expressway operations. This paper constructs the industry's first full-stack expressway dataset, encompassing traffic knowledge texts, emergency reasoning chains, and annotated video events to overcome data scarcity. This paper proposes a dual-layer LLM pre-training paradigm based on self-supervised training and unsupervised learning. Additionally, this study introduces a Graph-Augmented RAG framework to dynamically index the expressway knowledge base. To enhance reasoning for expressway incident response strategies, we develop a RL-aligned Chain-of-Thought (RL-CoT) mechanism that enforces consistency between model reasoning and expert problem-solving heuristics for incident handling. Finally, ExpressMind integrates a cross-modal encoder to align the dynamic feature sequences under the visual and textual channels, enabling it to understand traffic scenes in both video and image modalities. Extensive experiments on our newly released multi-modal expressway benchmark demonstrate that ExpressMind comprehensively outperforms existing baselines in event detection, safety response generation, and complex traffic analysis. The code and data are available at: https://wanderhee.github.io/ExpressMind/.

arXiv 分类

cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类