Courtroom-Style Multi-Agent Debate with Progressive RAG and Role-Switching for Controversial Claim Verification
AI 摘要
提出PROClaim框架,通过模拟法庭辩论和渐进式RAG提升LLM在争议性声明验证中的准确性和可靠性。
主要贡献
- 提出 courtroom-style 的多智能体辩论框架 PROClaim
- 引入 Progressive RAG (P-RAG) 动态扩展和优化证据
- 使用证据协商、自我反思和异构多法官聚合
方法论
构建模拟法庭环境,利用角色扮演、渐进式信息检索和多智能体辩论,增强LLM的推理能力。
原文摘要
Large language models (LLMs) remain unreliable for high-stakes claim verification due to hallucinations and shallow reasoning. While retrieval-augmented generation (RAG) and multi-agent debate (MAD) address this, they are limited by one-pass retrieval and unstructured debate dynamics. We propose a courtroom-style multi-agent framework, PROClaim, that reformulates verification as a structured, adversarial deliberation. Our approach integrates specialized roles (e.g., Plaintiff, Defense, Judge) with Progressive RAG (P-RAG) to dynamically expand and refine the evidence pool during the debate. Furthermore, we employ evidence negotiation, self-reflection, and heterogeneous multi-judge aggregation to enforce calibration, robustness, and diversity. In zero-shot evaluations on the Check-COVID benchmark, PROClaim achieves 81.7% accuracy, outperforming standard multi-agent debate by 10.0 percentage points, with P-RAG driving the primary performance gains (+7.5 pp). We ultimately demonstrate that structural deliberation and model heterogeneity effectively mitigate systematic biases, providing a robust foundation for reliable claim verification. Our code and data are publicly available at https://github.com/mnc13/PROClaim.