Multimodal Learning 相关度: 8/10

Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning

Jingchun Yang, Jinchang Zhang

arXiv: 2603.17930v1 发布: 2026-03-18 更新: 2026-03-18

下载 PDF arXiv 页面

AI 摘要

提出C-TRAIL数据集和一个两阶段框架，用于从行车记录仪视频中推断交通责任。

主要贡献

提出了C-TRAIL多模态法律数据集，包含行车记录仪视频和对应的法律条文
提出了一个两阶段框架，包括交通事件理解模块和法律多智能体框架
实验证明该方法优于通用和法律LLM，以及现有的基于智能体的方法

方法论

两阶段框架：首先通过交通事件理解模块生成视频文本描述，然后使用法律多智能体框架输出责任模式和法律依据。

原文摘要

The widespread adoption of dashcams has made video evidence in traffic accidents increasingly abundant, yet transforming "what happened in the video" into "who is responsible under which legal provisions" still relies heavily on human experts. Existing ego-view traffic accident studies mainly focus on perception and semantic understanding, while LLM-based legal methods are mostly built on textual case descriptions and rarely incorporate video evidence, leaving a clear gap between the two. We first propose C-TRAIL, a multimodal legal dataset that, under the Chinese traffic regulation system, explicitly aligns dashcam videos and textual descriptions with a closed set of responsibility modes and their corresponding Chinese traffic statutes. On this basis, we introduce a two-stage framework: (1) a traffic accident understanding module that generates textual video descriptions; and (2) a legal multi-agent framework that outputs responsibility modes, statute sets, and complete judgment reports. Experimental results on C-TRAIL and MM-AU show that our method outperforms general and legal LLMs, as well as existing agent-based approaches, while providing a transparent and interpretable legal reasoning process.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类