BFMD: A Full-Match Badminton Dense Dataset for Dense Shot Captioning
AI 摘要
提出了一个羽毛球全场比赛密集标注数据集BFMD,并构建了基于VideoMAE的多模态字幕生成框架。
主要贡献
- 构建了首个羽毛球全场比赛密集标注数据集BFMD
- 提出了基于VideoMAE的多模态字幕生成框架
- 引入了语义反馈机制提升字幕语义一致性
方法论
使用VideoMAE作为基础模型,融合视觉、轨迹、姿态等多模态信息,并引入语义反馈机制指导字幕生成。
原文摘要
Understanding tactical dynamics in badminton requires analyzing entire matches rather than isolated clips. However, existing badminton datasets mainly focus on short clips or task-specific annotations and rarely provide full-match data with dense multimodal annotations. This limitation makes it difficult to generate accurate shot captions and perform match-level analysis. To address this limitation, we introduce the first Badminton Full Match Dense (BFMD) dataset, with 19 broadcast matches (including both singles and doubles) covering over 20 hours of play, comprising 1,687 rallies and 16,751 hit events, each annotated with a shot caption. The dataset provides hierarchical annotations including match segments, rally events, and dense rally-level multimodal annotations such as shot types, shuttle trajectories, player pose keypoints, and shot captions. We develop a VideoMAE-based multimodal captioning framework with a Semantic Feedback mechanism that leverages shot semantics to guide caption generation and improve semantic consistency. Experimental results demonstrate that multimodal modeling and semantic feedback improve shot caption quality over RGB-only baselines. We further showcase the potential of BFMD by analyzing the temporal evolution of tactical patterns across full matches.