An Attention Mechanism for Robust Multimodal Integration in a Global Workspace Architecture
AI 摘要
论文提出一种顶向下注意力机制,增强全局工作空间架构在多模态任务中的噪声鲁棒性和泛化能力。
主要贡献
- 提出了一种用于全局工作空间的顶向下注意力机制
- 证明了该机制提升了多模态系统的噪声鲁棒性
- 展示了该机制具有良好的跨任务和跨模态泛化能力
方法论
在全局工作空间架构中引入注意力机制,通过选择相关模态进行集成,并在多模态数据集上进行实验验证。
原文摘要
Global Workspace Theory (GWT), inspired by cognitive neuroscience, posits that flexible cognition could arise via the attentional selection of a relevant subset of modalities within a multimodal integration system. This cognitive framework can inspire novel computational architectures for multimodal integration. Indeed, recent implementations of GWT have explored its multimodal representation capabilities, but the related attention mechanisms remain understudied. Here, we propose and evaluate a top-down attention mechanism to select modalities inside a global workspace. First, we demonstrate that our attention mechanism improves noise robustness of a global workspace system on two multimodal datasets of increasing complexity: Simple Shapes and MM-IMDb 1.0. Second, we highlight various cross-task and cross-modality generalization capabilities that are not shared by multimodal attention models from the literature. Comparing against existing baselines on the MM-IMDb 1.0 benchmark, we find our attention mechanism makes the global workspace competitive with the state of the art.