Multimodal Learning 相关度: 8/10

TopoOR: A Unified Topological Scene Representation for the Operating Room

Tony Danjun Wang, Ka Young Kim, Tolga Birdal, Nassir Navab, Lennart Bastian
arXiv: 2603.09466v1 发布: 2026-03-10 更新: 2026-03-10

AI 摘要

TopoOR提出了一种新的手术室场景拓扑表示方法,提升手术过程理解和预测能力。

主要贡献

  • 提出了TopoOR,一种新的手术室场景拓扑表示方法
  • 设计了高阶注意力机制,保留流形结构和模态特征
  • 在手术室相关任务上超越了传统图和LLM基线

方法论

将手术室建模为高阶拓扑结构,利用高阶注意力机制处理多模态数据,保留关系结构和模态信息。

原文摘要

Surgical Scene Graphs abstract the complexity of surgical operating rooms (OR) into a structure of entities and their relations, but existing paradigms suffer from strictly dyadic structural limitations. Frameworks that predominantly rely on pairwise message passing or tokenized sequences flatten the manifold geometry inherent to relational structures and lose structure in the process. We introduce TopoOR, a new paradigm that models multimodal operating rooms as a higher-order structure, innately preserving pairwise and group relationships. By lifting interactions between entities into higher-order topological cells, TopoOR natively models complex dynamics and multimodality present in the OR. This topological representation subsumes traditional scene graphs, thereby offering strictly greater expressivity. We also propose a higher-order attention mechanism that explicitly preserves manifold structure and modality-specific features throughout hierarchical relational attention. In this way, we circumvent combining 3D geometry, audio, and robot kinematics into a single joint latent representation, preserving the precise multimodal structure required for safety-critical reasoning, unlike existing methods. Extensive experiments demonstrate that our approach outperforms traditional graph and LLM-based baselines across sterility breach detection, robot phase prediction, and next-action anticipation

标签

手术室场景理解 拓扑表示 多模态学习 高阶注意力

arXiv 分类

cs.CV