Multimodal Learning 相关度: 7/10

Beyond Hungarian: Match-Free Supervision for End-to-End Object Detection

Shoumeng Qiu, Xinrun Li, Yang Long

arXiv: 2603.08514v1 发布: 2026-03-09 更新: 2026-03-09

下载 PDF arXiv 页面

AI 摘要

提出一种无需匈牙利算法的DETR训练方法，通过跨注意力机制实现query和目标的隐式匹配。

主要贡献

提出基于跨注意力的Query选择模块（CAQS）
实现了无需显式匹配的端到端目标检测
显著提升训练效率，降低匹配延迟

方法论

利用ground-truth信息，通过跨注意力机制probe decoder queries，最小化加权误差，学习query和目标的隐式对应关系。

原文摘要

Recent DEtection TRansformer (DETR) based frameworks have achieved remarkable success in end-to-end object detection. However, the reliance on the Hungarian algorithm for bipartite matching between queries and ground truths introduces computational overhead and complicates the training dynamics. In this paper, we propose a novel matching-free training scheme for DETR-based detectors that eliminates the need for explicit heuristic matching. At the core of our approach is a dedicated Cross-Attention-based Query Selection (CAQS) module. Instead of discrete assignment, we utilize encoded ground-truth information to probe the decoder queries through a cross-attention mechanism. By minimizing the weighted error between the queried results and the ground truths, the model autonomously learns the implicit correspondences between object queries and specific targets. This learned relationship further provides supervision signals for the learning of queries. Experimental results demonstrate that our proposed method bypasses the traditional matching process, significantly enhancing training efficiency, reducing the matching latency by over 50\%, effectively eliminating the discrete matching bottleneck through differentiable correspondence learning, and also achieving superior performance compared to existing state-of-the-art methods.

arXiv 分类

cs.CV cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类