MMA: Multimodal Memory Agent
AI 摘要
MMA通过动态评估检索到的记忆可靠性,提升多模态Agent在复杂环境中的表现。
主要贡献
- 提出Multimodal Memory Agent (MMA)模型
- 引入动态可靠性评分机制
- 构建MMA-Bench基准测试
方法论
为检索到的记忆项分配动态可靠性分数,结合来源可靠性、时间衰减和冲突感知网络共识。
原文摘要
Long-horizon multimodal agents depend on external memory; however, similarity-based retrieval often surfaces stale, low-credibility, or conflicting items, which can trigger overconfident errors. We propose Multimodal Memory Agent (MMA), which assigns each retrieved memory item a dynamic reliability score by combining source credibility, temporal decay, and conflict-aware network consensus, and uses this signal to reweight evidence and abstain when support is insufficient. We also introduce MMA-Bench, a programmatically generated benchmark for belief dynamics with controlled speaker reliability and structured text-vision contradictions. Using this framework, we uncover the "Visual Placebo Effect", revealing how RAG-based agents inherit latent visual biases from foundation models. On FEVER, MMA matches baseline accuracy while reducing variance by 35.2% and improving selective utility; on LoCoMo, a safety-oriented configuration improves actionable accuracy and reduces wrong answers; on MMA-Bench, MMA reaches 41.18% Type-B accuracy in Vision mode, while the baseline collapses to 0.0% under the same protocol. Code: https://github.com/AIGeeksGroup/MMA.