Multimodal Learning 相关度: 7/10

FlowTouch: View-Invariant Visuo-Tactile Prediction

Seongjin Bien, Carlo Kneissl, Tobias Jülg, Frank Fundel, Thomas Ressler-Antal, Florian Walter, Björn Ommer, Gitta Kutyniok, Wolfram Burgard

arXiv: 2603.08255v1 发布: 2026-03-09 更新: 2026-03-09

下载 PDF arXiv 页面

AI 摘要

FlowTouch提出了一种视角不变的视觉-触觉预测模型，利用局部3D网格实现跨域泛化。

主要贡献

提出了FlowTouch模型，用于视角不变的视觉-触觉预测
利用局部3D网格编码信息，提高模型的泛化能力
验证了模型在sim-to-real迁移和下游抓取稳定性预测中的有效性

方法论

利用场景重建和基于Flow Matching的模型进行图像生成，从视觉信息预测触觉反馈，实现视角不变性。

原文摘要

Tactile sensation is essential for contact-rich manipulation tasks. It provides direct feedback on object geometry, surface properties, and interaction forces, enhancing perception and enabling fine-grained control. An inherent limitation of tactile sensors is that readings are available only when an object is touched. This precludes their use during planning and the initial execution phase of a task. Predicting tactile information from visual information can bridge this gap. A common approach is to learn a direct mapping from camera images to the output of vision-based tactile sensors. However, the resulting model will depend strongly on the specific setup and on how well the camera can capture the area where an object is touched. In this work, we introduce FlowTouch, a novel model for view-invariant visuo-tactile prediction. Our key idea is to use an object's local 3D mesh to encode rich information for predicting tactile patterns while abstracting away from scene-dependent details. FlowTouch integrates scene reconstruction and Flow Matching-based models for image generation. Our results show that FlowTouch is able to bridge the sim-to-real gap and generalize to new sensor instances. We further show that the resulting tactile images can be used for downstream grasp stability prediction. Our code, datasets and videos are available at https://flowtouch.github.io/

arXiv 分类

cs.RO cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类