Multimodal Learning 相关度: 9/10

Reinforced Attention Learning

Bangzheng Li, Jianmo Ni, Chen Qu, Ian Miao, Liu Yang, Xingyu Fu, Muhao Chen, Derek Zhiyuan Cheng
arXiv: 2602.04884v1 发布: 2026-02-04 更新: 2026-02-04

AI 摘要

RAL通过强化学习直接优化多模态LLM的内部注意力分布,提升感知能力和跨模态对齐。

主要贡献

  • 提出Reinforced Attention Learning (RAL)框架
  • 将强化学习应用于优化多模态LLM的注意力分布
  • 提出On-Policy Attention Distillation方法

方法论

RAL使用策略梯度方法优化注意力分布,并通过On-Policy Attention Distillation迁移注意力行为。

原文摘要

Post-training with Reinforcement Learning (RL) has substantially improved reasoning in Large Language Models (LLMs) via test-time scaling. However, extending this paradigm to Multimodal LLMs (MLLMs) through verbose rationales yields limited gains for perception and can even degrade performance. We propose Reinforced Attention Learning (RAL), a policy-gradient framework that directly optimizes internal attention distributions rather than output token sequences. By shifting optimization from what to generate to where to attend, RAL promotes effective information allocation and improved grounding in complex multimodal inputs. Experiments across diverse image and video benchmarks show consistent gains over GRPO and other baselines. We further introduce On-Policy Attention Distillation, demonstrating that transferring latent attention behaviors yields stronger cross-modal alignment than standard knowledge distillation. Our results position attention policies as a principled and general alternative for multimodal post-training.

标签

强化学习 多模态学习 注意力机制 LLM

arXiv 分类

cs.CL cs.CV cs.LG