Learning Quantised Structure-Preserving Motion Representations for Dance Fingerprinting
AI 摘要
DANCEMATCH提出了一种端到端舞蹈检索框架,通过量化运动表示实现高效舞蹈指纹识别。
主要贡献
- 提出了DANCEMATCH框架,用于运动驱动的舞蹈检索。
- 引入Skeleton Motion Quantisation (SMQ) 和 Spatio-Temporal Transformers (STT) 编码人体姿势。
- 设计了DANCE RETRIEVAL ENGINE (DRE) 用于亚线性检索。
- 发布了DANCETYPESBENCHMARK数据集,用于量化运动token的姿势对齐。
方法论
利用SMQ和STT将姿势编码为结构化运动词汇表,然后使用基于直方图的索引进行检索和重排序。
原文摘要
We present DANCEMATCH, an end-to-end framework for motion-based dance retrieval, the task of identifying semantically similar choreographies directly from raw video, defined as DANCE FINGERPRINTING. While existing motion analysis and retrieval methods can compare pose sequences, they rely on continuous embeddings that are difficult to index, interpret, or scale. In contrast, DANCEMATCH constructs compact, discrete motion signatures that capture the spatio-temporal structure of dance while enabling efficient large-scale retrieval. Our system integrates Skeleton Motion Quantisation (SMQ) with Spatio-Temporal Transformers (STT) to encode human poses, extracted via Apple CoMotion, into a structured motion vocabulary. We further design DANCE RETRIEVAL ENGINE (DRE), which performs sub-linear retrieval using a histogram-based index followed by re-ranking for refined matching. To facilitate reproducible research, we release DANCETYPESBENCHMARK, a pose-aligned dataset annotated with quantised motion tokens. Experiments demonstrate robust retrieval across diverse dance styles and strong generalisation to unseen choreographies, establishing a foundation for scalable motion fingerprinting and quantitative choreographic analysis.