TAU-R1: Visual Language Model for Traffic Anomaly Understanding
AI 摘要
提出了用于交通异常理解的视觉语言模型TAU-R1,并构建了Roundabout-TAU数据集。
主要贡献
- 构建了Roundabout-TAU数据集
- 提出了两层视觉语言框架TAU-R1
- 引入了分解QA增强的监督微调和TAU-GRPO后训练方法
方法论
TAU-R1包含异常分类器和异常推理器两层结构,通过两阶段训练提升任务性能。
原文摘要
Traffic Anomaly Understanding (TAU) is important for traffic safety in Intelligent Transportation Systems. Recent vision-language models (VLMs) have shown strong capabilities in video understanding. However, progress on TAU remains limited due to the lack of benchmarks and task-specific methodologies. To address this limitation, we introduce Roundabout-TAU, a dataset constructed from real-world roundabout videos collected in collaboration with the City of Carmel, Indiana. The dataset contains 342 clips and is annotated with more than 2,000 question-answer pairs covering multiple aspects of traffic anomaly understanding. Building on this benchmark, we propose TAU-R1, a two-layer vision-language framework for TAU. The first layer is a lightweight anomaly classifier that performs coarse anomaly categorisation, while the second layer is a larger anomaly reasoner that generates detailed event summaries. To improve task-specific reasoning, we introduce a two-stage training strategy consisting of decomposed-QA-enhanced supervised fine-tuning followed by TAU-GRPO, a GRPO-based post-training method with TAU-specific reward functions. Experimental results show that TAU-R1 achieves strong performance on both anomaly classification and reasoning tasks while maintaining deployment efficiency. The dataset and code are available at: https://github.com/siri-rouser/TAU-R1