AI Agents 相关度: 9/10

Towards Selection as Power: Bounding Decision Authority in Autonomous Agents

Jose Manuel de la Chica Rodriguez, Juan Manuel Vera Díaz

arXiv: 2602.14606v1 发布: 2026-02-16 更新: 2026-02-16

下载 PDF arXiv 页面

AI 摘要

提出一种新的自治代理治理架构，通过限制选择权力来提高安全性。

主要贡献

提出了一种新的治理架构，将认知、选择和行动分离。
引入了外部候选生成(CEFL)、受控Reducer等机制来限制选择权力。
在受监管的金融场景中评估了该系统，验证了其可行性和有效性。

方法论

设计了一种新的自治代理架构，并在金融场景中进行实验评估，使用了多种度量指标评估性能。

原文摘要

Autonomous agentic systems are increasingly deployed in regulated, high-stakes domains where decisions may be irreversible and institutionally constrained. Existing safety approaches emphasize alignment, interpretability, or action-level filtering. We argue that these mechanisms are necessary but insufficient because they do not directly govern selection power: the authority to determine which options are generated, surfaced, and framed for decision. We propose a governance architecture that separates cognition, selection, and action into distinct domains and models autonomy as a vector of sovereignty. Cognitive autonomy remains unconstrained, while selection and action autonomy are bounded through mechanically enforced primitives operating outside the agent's optimization space. The architecture integrates external candidate generation (CEFL), a governed reducer, commit-reveal entropy isolation, rationale validation, and fail-loud circuit breakers. We evaluate the system across multiple regulated financial scenarios under adversarial stress targeting variance manipulation, threshold gaming, framing skew, ordering effects, and entropy probing. Metrics quantify selection concentration, narrative diversity, governance activation cost, and failure visibility. Results show that mechanical selection governance is implementable, auditable, and prevents deterministic outcome capture while preserving reasoning capacity. Although probabilistic concentration remains, the architecture measurably bounds selection authority relative to conventional scalar pipelines. This work reframes governance as bounded causal power rather than internal intent alignment, offering a foundation for deploying autonomous agents where silent failure is unacceptable.

arXiv 分类

cs.MA cs.AI cs.CE

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类