AI Agents 相关度: 8/10

Fine-Tuning Large Language Models for Cooperative Tactical Deconfliction of Small Unmanned Aerial Systems

Iman Sharifi, Alex Zongo, Peng Wei

arXiv: 2603.28561v1 发布: 2026-03-30 更新: 2026-03-30

下载 PDF arXiv 页面

AI 摘要

本文研究了如何通过微调大型语言模型来解决无人机战术冲突问题，提升无人机空域管理的安全性与效率。

主要贡献

提出基于BlueSky的仿真数据生成管道
使用LoRA和GRPO微调Qwen-Math-7B模型
验证了微调策略在提高决策准确性和一致性方面的有效性

方法论

利用BlueSky模拟器生成数据，并使用监督微调（LoRA）和基于偏好的微调（GRPO）方法优化预训练的LLM，然后在闭环模拟中评估。

原文摘要

The growing deployment of small Unmanned Aerial Systems (sUASs) in low-altitude airspaces has increased the need for reliable tactical deconfliction under safety-critical constraints. Tactical deconfliction involves short-horizon decision-making in dense, partially observable, and heterogeneous multi-agent environments, where both cooperative separation assurance and operational efficiency must be maintained. While Large Language Models (LLMs) exhibit strong reasoning capabilities, their direct application to air traffic control remains limited by insufficient domain grounding and unpredictable output inconsistency. This paper investigates LLMs as decision-makers in cooperative multi-agent tactical deconfliction using fine-tuning strategies that align model outputs to human operator heuristics. We propose a simulation-to-language data generation pipeline based on the BlueSky air traffic simulator that produces rule-consistent deconfliction datasets reflecting established safety practices. A pretrained Qwen-Math-7B model is fine-tuned using two parameter-efficient strategies: supervised fine-tuning with Low-Rank Adaptation (LoRA) and preference-based fine-tuning combining LoRA with Group-Relative Policy Optimization (GRPO). Experimental results on validation datasets and closed-loop simulations demonstrate that supervised LoRA fine-tuning substantially improves decision accuracy, consistency, and separation performance compared to the pretrained LLM, with significant reductions in near mid-air collisions. GRPO provides additional coordination benefits but exhibits reduced robustness when interacting with heterogeneous agent policies.

arXiv 分类

cs.RO cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类