AI Agents 相关度: 9/10

Security awareness in LLM agents: the NDAI zone case

Enrico Bottazzi, Pia Park

arXiv: 2603.19011v1 发布: 2026-03-19 更新: 2026-03-19

下载 PDF arXiv 页面

AI 摘要

研究LLM智能体在安全环境下的安全意识，发现其安全验证能力不足。

主要贡献

揭示了LLM在安全环境判断上的不对称性，即能检测危险信号但无法验证安全
通过实验评估了不同LLM模型对安全证据的加权方式
指出了隐私保护协议中LLM安全意识不足的问题，并提出了改进方向

方法论

通过NDAI风格的谈判任务，评估10个LLM模型在不同安全证据场景下的行为。

原文摘要

NDAI zones let inventor and investor agents negotiate inside a Trusted Execution Environment (TEE) where any disclosed information is deleted if no deal is reached. This makes full IP disclosure the rational strategy for the inventor's agent. Leveraging this infrastructure, however, requires agents to distinguish a secure environment from an insecure one, a capability LLM agents lack natively, since they can rely only on evidence passed through the context window to form awareness of their execution environment. We ask: How do different LLM models weight various forms of evidence when forming awareness of the security of their execution environment? Using an NDAI-style negotiation task across 10 language models and various evidence scenarios, we find a clear asymmetry: a failing attestation universally suppresses disclosure across all models, whereas a passing attestation produces highly heterogeneous responses: some models increase disclosure, others are unaffected, and a few paradoxically reduce it. This reveals that current LLM models can reliably detect danger signals but cannot reliably verify safety, the very capability required for privacy-preserving agentic protocols such as NDAI zones. Bridging this gap, possibly through interpretability analysis, targeted fine-tuning, or improved evidence architectures, remains the central open challenge for deploying agents that calibrate information sharing to actual evidence quality.

arXiv 分类

cs.CR cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类