ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization
AI 摘要
提出ShipTraj-R1,利用GRPO优化LLM进行船舶轨迹预测,效果优于现有方法。
主要贡献
- 设计动态提示,引导自适应CoT推理
- 引入规则奖励机制,激励推理格式和预测精度
- 利用GRPO优化LLM(Qwen3),提升轨迹预测性能
方法论
将船舶轨迹预测转化为文本生成问题,通过动态prompt和规则奖励,利用GRPO机制训练LLM模型。
原文摘要
Recent advancements in reinforcement fine-tuning have significantly improved the reasoning ability of large language models (LLMs). In particular, methods such as group relative policy optimization (GRPO) have demonstrated strong capabilities across various fields. However, applying LLMs to ship trajectory prediction remains largely unexplored. In this paper, we propose ShipTraj-R1, a novel LLM-based framework that reformulates ship trajectory prediction as a text-to-text generation problem. (1) We design a dynamic prompt containing trajectory information about conflicting ships to guide the model to achieve adaptive chain-of-thought (CoT) reasoning. (2) We introduce a comprehensive rule-based reward mechanism to incentivize the reasoning format and prediction accuracy of the model. (3) Our ShipTraj-R1 is reinforced through the GRPO mechanism guided by domain-specific prompts and rewards, and utilizes the Qwen3 as the model backbone. Extensive experimental results on two complex and real-world maritime datasets show that the proposed ShipTraj-R1 achieves the least error compared with state-of-the-art deep learning and LLM-based baselines.