Mind the Gap: A Framework for Assessing Pitfalls in Multimodal Active Learning
AI 摘要
该论文提出了评估多模态主动学习陷阱的框架,揭示了现有方法在模态平衡上的不足。
主要贡献
- 提出了多模态主动学习的基准测试框架
- 分析了多模态主动学习中存在的模态不平衡问题
- 揭示了现有查询策略在多模态场景下的局限性
方法论
构建合成数据集,模拟多模态学习中的陷阱,系统评估不同查询策略的性能,并在真实数据集上验证结果。
原文摘要
Multimodal learning enables neural networks to integrate information from heterogeneous sources, but active learning in this setting faces distinct challenges. These include missing modalities, differences in modality difficulty, and varying interaction structures. These are issues absent in the unimodal case. While the behavior of active learning strategies in unimodal settings is well characterized, their behavior under such multimodal conditions remains poorly understood. We introduce a new framework for benchmarking multimodal active learning that isolates these pitfalls using synthetic datasets, allowing systematic evaluation without confounding noise. Using this framework, we compare unimodal and multimodal query strategies and validate our findings on two real-world datasets. Our results show that models consistently develop imbalanced representations, relying primarily on one modality while neglecting others. Existing query methods do not mitigate this effect, and multimodal strategies do not consistently outperform unimodal ones. These findings highlight limitations of current active learning methods and underline the need for modality-aware query strategies that explicitly address these pitfalls. Code and benchmark resources will be made publicly available.