AI Agents 相关度: 9/10

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Biplab Pal, Santanu Bhattacharya

arXiv: 2603.24582v1 发布: 2026-03-25 更新: 2026-03-25

下载 PDF arXiv 页面

AI 摘要

论文提出马尔可夫框架，用于评估智能体AI的可靠性和监管成本，并应用于企业采购流程。

主要贡献

提出基于马尔可夫框架的智能体可靠性评估方法
定义了状态盲点质量和状态-动作盲质量等关键指标
通过实际企业流程数据验证了框架的有效性

方法论

构建基于马尔可夫链的数学模型，利用企业事件日志数据，模拟智能体行为，计算关键指标并评估可靠性和监管成本。

原文摘要

Agentic artificial intelligence (AI) in organizations is a sequential decision problem constrained by reliability and oversight cost. When deterministic workflows are replaced by stochastic policies over actions and tool calls, the key question is not whether a next step appears plausible, but whether the resulting trajectory remains statistically supported, locally unambiguous, and economically governable. We develop a measure-theoretic Markov framework for this setting. The core quantities are state blind-spot mass B_n(tau), state-action blind mass B^SA_{pi,n}(tau), an entropy-based human-in-the-loop escalation gate, and an expected oversight-cost identity over the workflow visitation measure. We instantiate the framework on the Business Process Intelligence Challenge 2019 purchase-to-pay log (251,734 cases, 1,595,923 events, 42 distinct workflow actions) and construct a log-driven simulated agent from a chronological 80/20 split of the same process. The main empirical finding is that a large workflow can appear well supported at the state level while retaining substantial blind mass over next-step decisions: refining the operational state to include case context, economic magnitude, and actor class expands the state space from 42 to 668 and raises state-action blind mass from 0.0165 at tau=50 to 0.1253 at tau=1000. On the held-out split, m(s) = max_a pi-hat(a|s) tracks realized autonomous step accuracy within 3.4 percentage points on average. The same quantities that delimit statistically credible autonomy also determine expected oversight burden. The framework is demonstrated on a large-scale enterprise procurement workflow and is designed for direct application to engineering processes for which operational event logs are available.

arXiv 分类

cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类