LLM Reasoning 相关度: 9/10

Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

Zou Qiang
arXiv: 2603.19182v1 发布: 2026-03-19 更新: 2026-03-19

AI 摘要

论文提出Box Maze框架,通过显式过程控制提高LLM推理的可靠性,减少对抗条件下的边界失效。

主要贡献

  • 提出Box Maze框架,一种显式过程控制架构
  • 将LLM推理分解为记忆 grounding、结构化推理和边界强制三层
  • 初步模拟实验表明过程控制能显著降低对抗条件下的边界失效

方法论

通过模拟实验,在不同LLM系统上进行对抗性测试,评估Box Maze框架在边界维持方面的效果。

原文摘要

Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture that decomposes LLM reasoning into three explicit layers: memory grounding, structured inference, and boundary enforcement. We introduce preliminary simulation-based evaluation involving progressive boundary erosion scenarios across multiple heterogeneous LLM systems (DeepSeek-V3, Doubao, Qwen). Results from n=50 adversarial scenarios suggest that explicit cognitive control layers may improve consistency in boundary maintenance, with architectural constraints reducing boundary failure rates from approximately 40% (baseline RLHF) to below 1% under adversarial conditions. While current validation is simulation-based, these preliminary results indicate that process-level control may offer a promising direction for improving reliability in large language model reasoning.

标签

LLM Reasoning Process Control Adversarial Robustness

arXiv 分类

cs.AI cs.CL