Multimodal Learning 相关度: 9/10

PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs

Jinyue Li, Yuci Liang, Qiankun Li, Xinheng Lyu, Jiayu Qian, Huabao Chen, Kun Wang, Zhigang Zeng, Anil Anthony Bharath, Yang Liu
arXiv: 2603.09943v1 发布: 2026-03-10 更新: 2026-03-10

AI 摘要

PathMem提出一种记忆增强的病理学MLLM框架,有效融合结构化知识并提升诊断推理能力。

主要贡献

  • 提出PathMem框架,融合长期记忆和工作记忆
  • 引入Memory Transformer,动态转换知识
  • 在病理学基准测试上取得SOTA性能

方法论

构建病理知识长期记忆,通过Memory Transformer建模LTM到WM的动态转换,实现上下文感知的记忆精炼。

原文摘要

Computational pathology demands both visual pattern recognition and dynamic integration of structured domain knowledge, including taxonomy, grading criteria, and clinical evidence. In practice, diagnostic reasoning requires linking morphological evidence with formal diagnostic and grading criteria. Although multimodal large language models (MLLMs) demonstrate strong vision language reasoning capabilities, they lack explicit mechanisms for structured knowledge integration and interpretable memory control. As a result, existing models struggle to consistently incorporate pathology-specific diagnostic standards during reasoning. Inspired by the hierarchical memory process of human pathologists, we propose PathMem, a memory-centric multimodal framework for pathology MLLMs. PathMem organizes structured pathology knowledge as a long-term memory (LTM) and introduces a Memory Transformer that models the dynamic transition from LTM to working memory (WM) through multimodal memory activation and context-aware knowledge grounding, enabling context-aware memory refinement for downstream reasoning. PathMem achieves SOTA performance across benchmarks, improving WSI-Bench report generation (12.8% WSI-Precision, 10.1% WSI-Relevance) and open-ended diagnosis by 9.7% and 8.9% over prior WSI-based models.

标签

MLLM 病理学 记忆网络 知识融合

arXiv 分类

cs.AI