Multimodal Learning 相关度: 9/10

From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Hongrui Jia, Chaoya Jiang, Shikun Zhang, Wei Ye
arXiv: 2602.22859v1 发布: 2026-02-26 更新: 2026-02-26

AI 摘要

DPE是一种诊断驱动的迭代训练方法,通过诊断盲点动态调整数据,持续提升LMMs性能。

主要贡献

  • 提出Diagnostic-driven Progressive Evolution (DPE) 训练框架
  • 利用多智能体标注和质量控制大规模多模态数据
  • 动态调整数据混合,针对弱点生成数据以进行定向强化

方法论

DPE通过诊断模型弱点,引导数据生成和强化学习,迭代提升LMMs在开放任务分布下的性能。

原文摘要

As Large Multimodal Models (LMMs) scale up and reinforcement learning (RL) methods mature, LMMs have made notable progress in complex reasoning and decision making. Yet training still relies on static data and fixed recipes, making it difficult to diagnose capability blind spots or provide dynamic, targeted reinforcement. Motivated by findings that test driven error exposure and feedback based correction outperform repetitive practice, we propose Diagnostic-driven Progressive Evolution (DPE), a spiral loop where diagnosis steers data generation and reinforcement, and each iteration re-diagnoses the updated model to drive the next round of targeted improvement. DPE has two key components. First, multiple agents annotate and quality control massive unlabeled multimodal data, using tools such as web search and image editing to produce diverse, realistic samples. Second, DPE attributes failures to specific weaknesses, dynamically adjusts the data mixture, and guides agents to generate weakness focused data for targeted reinforcement. Experiments on Qwen3-VL-8B-Instruct and Qwen2.5-VL-7B-Instruct show stable, continual gains across eleven benchmarks, indicating DPE as a scalable paradigm for continual LMM training under open task distributions. Our code, models, and data are publicly available at https://github.com/hongruijia/DPE.

标签

LMM Multimodal Learning Reinforcement Learning

arXiv 分类

cs.CV