DECO: Decoupled Multimodal Diffusion Transformer for Bimanual Dexterous Manipulation with a Plugin Tactile Adapter
AI 摘要
DECO提出了一种解耦多模态扩散Transformer,用于灵巧双臂操作。
主要贡献
- 提出了DECO框架,用于解耦多模态条件
- 引入触觉适配器,增强感知能力
- 构建了DECO-50触觉感知双臂操作数据集
方法论
DECO基于DiT架构,解耦图像、动作和状态等模态,通过自注意力、跨注意力和自适应归一化融合信息。
原文摘要
Overview of the Proposed DECO Framework.} DECO is a DiT-based policy that decouples multimodal conditioning. Image and action tokens interact via joint self attention, while proprioceptive states and optional conditions are injected through adaptive layer normalization. Tactile signals are injected via cross attention, while a lightweight LoRA-based adapter is used to efficiently fine-tune the pretrained policy. DECO is also accompanied by DECO-50, a bimanual dexterous manipulation dataset with tactile sensing, consisting of 4 scenarios and 28 sub-tasks, covering more than 50 hours of data, approximately 5 million frames, and 8,000 successful trajectories.