Multimodal Learning 相关度: 9/10

Seg-ReSearch: Segmentation with Interleaved Reasoning and External Search

Tianming Liang, Qirui Du, Jian-Fang Hu, Haichao Jiang, Zicheng Lin, Wei-Shi Zheng
arXiv: 2602.04454v1 发布: 2026-02-04 更新: 2026-02-04

AI 摘要

Seg-ReSearch通过交错推理和外部搜索,突破MLLM的知识瓶颈,提升了分割性能。

主要贡献

  • 提出Seg-ReSearch分割范式,结合推理和外部搜索
  • 设计分层奖励机制,优化训练过程
  • 构建OK-VOS基准,评估开放世界分割能力

方法论

利用MLLM进行推理,通过外部搜索获取知识,并结合分层奖励进行训练,提升分割性能。

原文摘要

Segmentation based on language has been a popular topic in computer vision. While recent advances in multimodal large language models (MLLMs) have endowed segmentation systems with reasoning capabilities, these efforts remain confined by the frozen internal knowledge of MLLMs, which limits their potential for real-world scenarios that involve up-to-date information or domain-specific concepts. In this work, we propose \textbf{Seg-ReSearch}, a novel segmentation paradigm that overcomes the knowledge bottleneck of existing approaches. By enabling interleaved reasoning and external search, Seg-ReSearch empowers segmentation systems to handle dynamic, open-world queries that extend beyond the frozen knowledge of MLLMs. To effectively train this capability, we introduce a hierarchical reward design that harmonizes initial guidance with progressive incentives, mitigating the dilemma between sparse outcome signals and rigid step-wise supervision. For evaluation, we construct OK-VOS, a challenging benchmark that explicitly requires outside knowledge for video object segmentation. Experiments on OK-VOS and two existing reasoning segmentation benchmarks demonstrate that our Seg-ReSearch improves state-of-the-art approaches by a substantial margin. Code and data will be released at https://github.com/iSEE-Laboratory/Seg-ReSearch.

标签

分割 多模态 推理 外部搜索 视频对象分割

arXiv 分类

cs.CV