Multimodal Learning 相关度: 6/10

SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement

Rulin Zhou, Guankun Wang, An Wang, Yujie Ma, Lixin Ouyang, Bolin Cui, Junyan Li, Chaowei Zhu, Mingyang Li, Ming Chen, Xiaopin Zhong, Peng Lu, Jiankun Wang, Xianming Liu, Hongliang Ren
arXiv: 2602.20636v1 发布: 2026-02-24 更新: 2026-02-24

AI 摘要

提出SurgAtt-Tracker,通过时序重排序和运动感知优化,实现稳定准确的手术视野关注点追踪。

主要贡献

  • 提出SurgAtt-Tracker框架
  • 构建大规模手术关注点数据集SurgAtt-1.16M
  • 实现帧级别的视野引导信号

方法论

SurgAtt-Tracker利用时序一致性,通过proposal-level reranking和motion-aware refinement追踪手术关注点。

原文摘要

Accurate and stable field-of-view (FoV) guidance is critical for safe and efficient minimally invasive surgery, yet existing approaches often conflate visual attention estimation with downstream camera control or rely on direct object-centric assumptions. In this work, we formulate surgical attention tracking as a spatio-temporal learning problem and model surgeon focus as a dense attention heatmap, enabling continuous and interpretable frame-wise FoV guidance. We propose SurgAtt-Tracker, a holistic framework that robustly tracks surgical attention by exploiting temporal coherence through proposal-level reranking and motion-aware refinement, rather than direct regression. To support systematic training and evaluation, we introduce SurgAtt-1.16M, a large-scale benchmark with a clinically grounded annotation protocol that enables comprehensive heatmap-based attention analysis across procedures and institutions. Extensive experiments on multiple surgical datasets demonstrate that SurgAtt-Tracker consistently achieves state-of-the-art performance and strong robustness under occlusion, multi-instrument interference, and cross-domain settings. Beyond attention tracking, our approach provides a frame-wise FoV guidance signal that can directly support downstream robotic FoV planning and automatic camera control.

标签

手术机器人 关注点追踪 视野引导 时序建模

arXiv 分类

cs.CV cs.AI