Agent Tuning & Optimization 相关度: 9/10

OmniMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory

Jiaqi Liu, Zipeng Ling, Shi Qiu, Yanqing Liu, Siwei Han, Peng Xia, Haoqin Tu, Zeyu Zheng, Cihang Xie, Charles Fleming, Mingyu Ding, Huaxiu Yao
arXiv: 2604.01007v1 发布: 2026-04-01 更新: 2026-04-01

AI 摘要

论文提出OmniMem,一个基于自主研究的终身多模态记忆框架,显著提升AI agent在多模态任务上的表现。

主要贡献

  • 提出 OmniMem 框架,用于终身多模态记忆
  • 构建自主研究流水线,自动化探索架构、检索、prompt和数据pipeline的设计空间
  • 证明bug修复、架构修改和prompt工程比超参数调整更重要

方法论

利用自主研究流水线,通过实验诊断失败模式、提出架构修改建议、修复数据pipeline bug,迭代优化多模态记忆框架。

原文摘要

AI agents increasingly operate over extended time horizons, yet their ability to retain, organize, and recall multimodal experiences remains a critical bottleneck. Building effective lifelong memory requires navigating a vast design space spanning architecture, retrieval strategies, prompt engineering, and data pipelines; this space is too large and interconnected for manual exploration or traditional AutoML to explore effectively. We deploy an autonomous research pipeline to discover OmniMem, a unified multimodal memory framework for lifelong AI agents. Starting from a naïve baseline (F1=0.117 on LoCoMo), the pipeline autonomously executes ${\sim}50$ experiments across two benchmarks, diagnosing failure modes, proposing architectural modifications, and repairing data pipeline bugs, all without human intervention in the inner loop. The resulting system achieves state-of-the-art on both benchmarks, improving F1 by +411% on LoCoMo (0.117$\to$0.598) and +214% on Mem-Gallery (0.254$\to$0.797) relative to the initial configurations. Critically, the most impactful discoveries are not hyperparameter adjustments: bug fixes (+175%), architectural changes (+44%), and prompt engineering (+188\% on specific categories) each individually exceed the cumulative contribution of all hyperparameter tuning, demonstrating capabilities fundamentally beyond the reach of traditional AutoML. We provide a taxonomy of six discovery types and identify four properties that make multimodal memory particularly suited for autoresearch, offering guidance for applying autonomous research pipelines to other AI system domains. Code is available at this https://github.com/aiming-lab/OmniMem.

标签

multimodal memory AI agent autonomous research lifelong learning prompt engineering

arXiv 分类

cs.AI