Multimodal Learning 相关度: 8/10

BFMD: A Full-Match Badminton Dense Dataset for Dense Shot Captioning

Ning Ding, Keisuke Fujii, Toru Tamaki

arXiv: 2603.25533v1 发布: 2026-03-26 更新: 2026-03-26

下载 PDF arXiv 页面

AI 摘要

提出了一个羽毛球全场比赛密集标注数据集BFMD，并构建了基于VideoMAE的多模态字幕生成框架。

主要贡献

构建了首个羽毛球全场比赛密集标注数据集BFMD
提出了基于VideoMAE的多模态字幕生成框架
引入了语义反馈机制提升字幕语义一致性

方法论

使用VideoMAE作为基础模型，融合视觉、轨迹、姿态等多模态信息，并引入语义反馈机制指导字幕生成。

原文摘要

Understanding tactical dynamics in badminton requires analyzing entire matches rather than isolated clips. However, existing badminton datasets mainly focus on short clips or task-specific annotations and rarely provide full-match data with dense multimodal annotations. This limitation makes it difficult to generate accurate shot captions and perform match-level analysis. To address this limitation, we introduce the first Badminton Full Match Dense (BFMD) dataset, with 19 broadcast matches (including both singles and doubles) covering over 20 hours of play, comprising 1,687 rallies and 16,751 hit events, each annotated with a shot caption. The dataset provides hierarchical annotations including match segments, rally events, and dense rally-level multimodal annotations such as shot types, shuttle trajectories, player pose keypoints, and shot captions. We develop a VideoMAE-based multimodal captioning framework with a Semantic Feedback mechanism that leverages shot semantics to guide caption generation and improve semantic consistency. Experimental results demonstrate that multimodal modeling and semantic feedback improve shot caption quality over RGB-only baselines. We further showcase the potential of BFMD by analyzing the temporal evolution of tactical patterns across full matches.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类