SNEAK: Evaluating Strategic Communication and Information Leakage in Large Language Models
AI 摘要
提出了SNEAK基准,用于评估LLM在非对称信息下的选择性信息共享能力,发现现有模型在此方面表现不佳。
主要贡献
- 提出了SNEAK基准,用于评估LLM的策略性沟通能力
- 评估了现有LLM在信息共享和保密之间的权衡
- 发现人类在此任务上显著优于现有LLM
方法论
设计了一个博弈场景,LLM需要生成消息,旨在被盟友理解同时防止被敌人推断出秘密信息,使用效用和泄漏两个指标进行评估。
原文摘要
Large language models (LLMs) are increasingly deployed in multi-agent settings where communication must balance informativeness and secrecy. In such settings, an agent may need to signal information to collaborators while preventing an adversary from inferring sensitive details. However, existing LLM benchmarks primarily evaluate capabilities such as reasoning, factual knowledge, or instruction following, and do not directly measure strategic communication under asymmetric information. We introduce SNEAK (Secret-aware Natural language Evaluation for Adversarial Knowledge), a benchmark for evaluating selective information sharing in language models. In SNEAK, a model is given a semantic category, a candidate set of words, and a secret word, and must generate a message that indicates knowledge of the secret without revealing it too clearly. We evaluate generated messages using two simulated agents with different information states: an ally, who knows the secret and must identify the intended message, and a chameleon, who does not know the secret and attempts to infer it from the message. This yields two complementary metrics: utility, measuring how well the message communicates to collaborators, and leakage, measuring how much information it reveals to an adversary. Using this framework, we analyze the trade-off between informativeness and secrecy in modern language models and show that strategic communication under asymmetric information remains a challenging capability for current systems. Notably, human participants outperform all evaluated models by a large margin, achieving up to four times higher scores.