Modulate-and-Map: Crossmodal Feature Mapping with Cross-View Modulation for 3D Anomaly Detection
AI 摘要
ModMap通过跨模态特征映射和跨视角调制,在3D异常检测中取得了领先性能。
主要贡献
- 提出了ModMap框架,用于多视角多模态3D异常检测
- 引入了跨模态和跨视角的特征映射学习机制
- 设计了跨视角训练策略,利用所有视角组合
- 公开了一个针对工业数据集的深度编码器
方法论
ModMap通过跨模态特征映射学习和跨视角调制,结合多视角集成和聚合,实现3D异常检测和分割。
原文摘要
We present ModMap, a natively multiview and multimodal framework for 3D anomaly detection and segmentation. Unlike existing methods that process views independently, our method draws inspiration from the crossmodal feature mapping paradigm to learn to map features across both modalities and views, while explicitly modelling view-dependent relationships through feature-wise modulation. We introduce a cross-view training strategy that leverages all possible view combinations, enabling effective anomaly scoring through multiview ensembling and aggregation. To process high-resolution 3D data, we train and publicly release a foundational depth encoder tailored to industrial datasets. Experiments on SiM3D, a recent benchmark that introduces the first multiview and multimodal setup for 3D anomaly detection and segmentation, demonstrate that ModMap attains state-of-the-art performance by surpassing previous methods by wide margins.