Beyond Hungarian: Match-Free Supervision for End-to-End Object Detection
AI 摘要
提出一种无需匈牙利算法的DETR训练方法,通过跨注意力机制实现query和目标的隐式匹配。
主要贡献
- 提出基于跨注意力的Query选择模块(CAQS)
- 实现了无需显式匹配的端到端目标检测
- 显著提升训练效率,降低匹配延迟
方法论
利用ground-truth信息,通过跨注意力机制probe decoder queries,最小化加权误差,学习query和目标的隐式对应关系。
原文摘要
Recent DEtection TRansformer (DETR) based frameworks have achieved remarkable success in end-to-end object detection. However, the reliance on the Hungarian algorithm for bipartite matching between queries and ground truths introduces computational overhead and complicates the training dynamics. In this paper, we propose a novel matching-free training scheme for DETR-based detectors that eliminates the need for explicit heuristic matching. At the core of our approach is a dedicated Cross-Attention-based Query Selection (CAQS) module. Instead of discrete assignment, we utilize encoded ground-truth information to probe the decoder queries through a cross-attention mechanism. By minimizing the weighted error between the queried results and the ground truths, the model autonomously learns the implicit correspondences between object queries and specific targets. This learned relationship further provides supervision signals for the learning of queries. Experimental results demonstrate that our proposed method bypasses the traditional matching process, significantly enhancing training efficiency, reducing the matching latency by over 50\%, effectively eliminating the discrete matching bottleneck through differentiable correspondence learning, and also achieving superior performance compared to existing state-of-the-art methods.