TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems
AI 摘要
TrinityGuard是一个用于LLM多智能体系统安全评估和监控的综合框架。
主要贡献
- 提出三层细粒度风险分类,涵盖20种风险类型
- 设计TrinityGuard框架,包含MAS抽象层、评估层和运行时监控代理
- 形式化安全指标,并在案例研究中验证框架的有效性
方法论
基于OWASP标准,构建风险评估和监控框架,使用攻击探针生成漏洞报告,并使用监控代理进行实时警报。
原文摘要
With the rapid development of LLM-based multi-agent systems (MAS), their significant safety and security concerns have emerged, which introduce novel risks going beyond single agents or LLMs. Despite attempts to address these issues, the existing literature lacks a cohesive safeguarding system specialized for MAS risks. In this work, we introduce TrinityGuard, a comprehensive safety evaluation and monitoring framework for LLM-based MAS, grounded in the OWASP standards. Specifically, TrinityGuard encompasses a three-tier fine-grained risk taxonomy that identifies 20 risk types, covering single-agent vulnerabilities, inter-agent communication threats, and system-level emergent hazards. Designed for scalability across various MAS structures and platforms, TrinityGuard is organized in a trinity manner, involving an MAS abstraction layer that can be adapted to any MAS structures, an evaluation layer containing risk-specific test modules, alongside runtime monitor agents coordinated by a unified LLM Judge Factory. During Evaluation, TrinityGuard executes curated attack probes to generate detailed vulnerability reports for each risk type, where monitor agents analyze structured execution traces and issue real-time alerts, enabling both pre-development evaluation and runtime monitoring. We further formalize these safety metrics and present detailed case studies across various representative MAS examples, showcasing the versatility and reliability of TrinityGuard. Overall, TrinityGuard acts as a comprehensive framework for evaluating and monitoring various risks in MAS, paving the way for further research into their safety and security.