AI Agents 相关度: 9/10

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie Hu, Yu Qin, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang
arXiv: 2603.24579v1 发布: 2026-03-25 更新: 2026-03-25

AI 摘要

MARCH利用多智能体强化学习和信息不对称机制,显著降低LLM的幻觉问题,提升RAG系统的可靠性。

主要贡献

  • 提出了MARCH框架,通过信息不对称打破自验证偏见
  • 设计了Solver, Proposer, Checker三个智能体协同工作
  • 使用MARL训练智能体,实现共同进化和优化

方法论

MARCH使用三个智能体,Solver生成答案,Proposer分解为原子命题,Checker独立验证,并用MARL优化。

原文摘要

Hallucination remains a critical bottleneck for large language models (LLMs), undermining their reliability in real-world applications, especially in Retrieval-Augmented Generation (RAG) systems. While existing hallucination detection methods employ LLM-as-a-judge to verify LLM outputs against retrieved evidence, they suffer from inherent confirmation bias, where the verifier inadvertently reproduces the errors of the original generation. To address this, we introduce Multi-Agent Reinforced Self-Check for Hallucination (MARCH), a framework that enforces rigorous factual alignment by leveraging deliberate information asymmetry. MARCH orchestrates a collaborative pipeline of three specialized agents: a Solver, a Proposer, and a Checker. The Solver generates an initial RAG response, which the Proposer decomposes into claim-level verifiable atomic propositions. Crucially, the Checker validates these propositions against retrieved evidence in isolation, deprived of the Solver's original output. This well-crafted information asymmetry scheme breaks the cycle of self-confirmation bias. By training this pipeline with multi-agent reinforcement learning (MARL), we enable the agents to co-evolve and optimize factual adherence. Extensive experiments across hallucination benchmarks demonstrate that MARCH substantially reduces hallucination rates. Notably, an 8B-parameter LLM equipped with MARCH achieves performance competitive with powerful closed-source models. MARCH paves a scalable path for factual self-improvement of LLMs through co-evolution. The code is at https://github.com/Qwen-Applications/MARCH.

标签

LLM Hallucination RAG Multi-Agent Reinforcement Learning

arXiv 分类

cs.CL