LLM Reasoning 相关度: 10/10

ReThinker: Scientific Reasoning by Rethinking with Guided Reflection and Confidence Control

Zhentao Tang, Yuqi Cui, Shixiong Kai, Wenqian Zhao, Ke Ye, Xing Li, Anxin Tian, Zehua Pei, Hui-Ling Zhen, Shoubo Hu, Xiaoguang Li, Yunhe Wang, Mingxuan Yuan

arXiv: 2602.04496v1 发布: 2026-02-04 更新: 2026-02-04

下载 PDF arXiv 页面

AI 摘要

ReThinker通过置信度引导的反思和工具使用，显著提升了LLM在复杂科学推理任务上的性能。

主要贡献

提出了基于Solver-Critic-Selector架构的置信度感知Agent框架ReThinker
设计了反向数据合成流程和自适应轨迹回收策略，用于无监督训练
在HLE、GAIA和XBench等benchmark上取得了SOTA结果

方法论

ReThinker通过Solver解决问题，Critic评估置信度，Selector动态调整计算资源和工具调用，并结合反思机制提升性能。

原文摘要

Expert-level scientific reasoning remains challenging for large language models, particularly on benchmarks such as Humanity's Last Exam (HLE), where rigid tool pipelines, brittle multi-agent coordination, and inefficient test-time scaling often limit performance. We introduce ReThinker, a confidence-aware agentic framework that orchestrates retrieval, tool use, and multi-agent reasoning through a stage-wise Solver-Critic-Selector architecture. Rather than following a fixed pipeline, ReThinker dynamically allocates computation based on model confidence, enabling adaptive tool invocation, guided multi-dimensional reflection, and robust confidence-weighted selection. To support scalable training without human annotation, we further propose a reverse data synthesis pipeline and an adaptive trajectory recycling strategy that transform successful reasoning traces into high-quality supervision. Experiments on HLE, GAIA, and XBench demonstrate that ReThinker consistently outperforms state-of-the-art foundation models with tools and existing deep research systems, achieving state-of-the-art results on expert-level reasoning tasks.

arXiv 分类

cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类