AI Agents 相关度: 9/10

ValueFlow: Measuring the Propagation of Value Perturbations in Multi-Agent LLM Systems

Jinnuo Liu, Chuke Liu, Hua Shen
arXiv: 2602.08567v1 发布: 2026-02-09 更新: 2026-02-09

AI 摘要

ValueFlow框架评估多智能体LLM系统中价值观扰动的传播和影响。

主要贡献

  • 提出ValueFlow框架,用于评估多智能体系统中价值观漂移。
  • 构建包含56个价值观的评估数据集。
  • 定义β-susceptibility和系统敏感度(SS)指标,用于衡量个体和系统层面的敏感度。

方法论

基于价值观扰动,利用LLM作为裁判评估智能体的价值观取向,并分析个体和系统层面的敏感度。

原文摘要

Multi-agent large language model (LLM) systems increasingly consist of agents that observe and respond to one another's outputs. While value alignment is typically evaluated for isolated models, how value perturbations propagate through agent interactions remains poorly understood. We present ValueFlow, a perturbation-based evaluation framework for measuring and analyzing value drift in multi-agent systems. ValueFlow introduces a 56-value evaluation dataset derived from the Schwartz Value Survey and quantifies agents' value orientations during interaction using an LLM-as-a-judge protocol. Building on this measurement layer, ValueFlow decomposes value drift into agent-level response behavior and system-level structural effects, operationalized by two metrics: beta-susceptibility, which measures an agent's sensitivity to perturbed peer signals, and system susceptibility (SS), which captures how node-level perturbations affect final system outputs. Experiments across multiple model backbones, prompt personas, value dimensions, and network structures show that susceptibility varies widely across values and is strongly shaped by structural topology.

标签

多智能体系统 价值观对齐 价值观漂移 评估框架

arXiv 分类

cs.MA cs.CL