AI Agents 相关度: 9/10

FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight

Jiayi Zhou, Yang Sheng, Hantao Lou, Yaodong Yang, Jie Fu
arXiv: 2602.11136v1 发布: 2026-02-11 更新: 2026-02-11

AI 摘要

提出FormalJudge框架,结合神经符号方法,实现LLM Agent行为安全和约束满足的验证与提升。

主要贡献

  • 提出了基于神经符号范式的FormalJudge框架,用于LLM Agent的监督。
  • 利用双向Formal-of-Thought架构,将自然语言需求转化为可验证的Formal specifications。
  • 通过实验验证了FormalJudge在行为安全、约束满足和欺骗检测方面的有效性。

方法论

采用神经符号方法,利用LLM编译高层意图到原子约束,并用Dafny和Z3验证约束满足情况。

原文摘要

As LLM-based agents increasingly operate in high-stakes domains with real-world consequences, ensuring their behavioral safety becomes paramount. The dominant oversight paradigm, LLM-as-a-Judge, faces a fundamental dilemma: how can probabilistic systems reliably supervise other probabilistic systems without inheriting their failure modes? We argue that formal verification offers a principled escape from this dilemma, yet its adoption has been hindered by a critical bottleneck: the translation from natural language requirements to formal specifications. This paper bridges this gap by proposing , a neuro-symbolic framework that employs a bidirectional Formal-of-Thought architecture: LLMs serve as specification compilers that top-down decompose high-level human intent into atomic, verifiable constraints, then bottom-up prove compliance using Dafny specifications and Z3 Satisfiability modulo theories solving, which produces mathematical guarantees rather than probabilistic scores. We validate across three benchmarks spanning behavioral safety, multi-domain constraint adherence, and agentic upward deception detection. Experiments on 7 agent models demonstrate that achieves an average improvement of 16.6% over LLM-as-a-Judge baselines, enables weak-to-strong generalization where a 7B judge achieves over 90% accuracy detecting deception from 72B agents, and provides near-linear safety improvement through iterative refinement.

标签

LLM Agent Formal Verification Neuro-Symbolic Behavioral Safety

arXiv 分类

cs.AI