LLM Memory & RAG 相关度: 9/10

MMA: Multimodal Memory Agent

Yihao Lu, Wanru Cheng, Zeyu Zhang, Hao Tang
arXiv: 2602.16493v1 发布: 2026-02-18 更新: 2026-02-18

AI 摘要

MMA通过动态评估检索到的记忆可靠性,提升多模态Agent在复杂环境中的表现。

主要贡献

  • 提出Multimodal Memory Agent (MMA)模型
  • 引入动态可靠性评分机制
  • 构建MMA-Bench基准测试

方法论

为检索到的记忆项分配动态可靠性分数,结合来源可靠性、时间衰减和冲突感知网络共识。

原文摘要

Long-horizon multimodal agents depend on external memory; however, similarity-based retrieval often surfaces stale, low-credibility, or conflicting items, which can trigger overconfident errors. We propose Multimodal Memory Agent (MMA), which assigns each retrieved memory item a dynamic reliability score by combining source credibility, temporal decay, and conflict-aware network consensus, and uses this signal to reweight evidence and abstain when support is insufficient. We also introduce MMA-Bench, a programmatically generated benchmark for belief dynamics with controlled speaker reliability and structured text-vision contradictions. Using this framework, we uncover the "Visual Placebo Effect", revealing how RAG-based agents inherit latent visual biases from foundation models. On FEVER, MMA matches baseline accuracy while reducing variance by 35.2% and improving selective utility; on LoCoMo, a safety-oriented configuration improves actionable accuracy and reduces wrong answers; on MMA-Bench, MMA reaches 41.18% Type-B accuracy in Vision mode, while the baseline collapses to 0.0% under the same protocol. Code: https://github.com/AIGeeksGroup/MMA.

标签

multimodal agent memory RAG reliability

arXiv 分类

cs.CV