AI Agents 相关度: 9/10

SKILLS: Structured Knowledge Injection for LLM-Driven Telecommunications Operations

Ivo Brett
arXiv: 2603.15372v1 发布: 2026-03-16 更新: 2026-03-16

AI 摘要

SKILLS框架提升LLM在电信运维中的API交互能力,通过注入结构化知识显著提高任务成功率。

主要贡献

  • 提出SKILLS框架,用于评估LLM在电信运维中的应用。
  • 构建包含37个电信运维场景的benchmark,覆盖8个TM Forum Open API领域。
  • 验证了结构化知识注入能够显著提升LLM在电信运维任务中的性能。

方法论

构建基于真实API的模拟环境,通过注入SKILL.md文档编码知识,对比baseline模型与注入技能后的模型在各场景下的表现。

原文摘要

As telecommunications operators accelerate adoption of AI-enabled automation, a practical question remains unresolved: can general-purpose large language model (LLM) agents reliably execute telecom operations workflows through real API interfaces, or do they require structured domain guidance? We introduce SKILLS (Structured Knowledge Injection for LLM-driven Service Lifecycle operations), a benchmark framework comprising 37 telecom operations scenarios spanning 8 TM Forum Open API domains (TMF620, TMF621, TMF622, TMF628, TMF629, TMF637, TMF639, TMF724). Each scenario is grounded in live mock API servers with seeded production-representative data, MCP tool interfaces, and deterministic evaluation rubrics combining response content checks, tool-call verification, and database state assertions. We evaluate open-weight models under two conditions: baseline (generic agent with tool access but no domain guidance) and with-skill (agent augmented with a portable SKILL.md document encoding workflow logic, API patterns, and business rules). Results across 5 open-weight model conditions and 185 scenario-runs show consistent skill lift across all models. MiniMax M2.5 leads (81.1% with-skill, +13.5pp), followed by Nemotron 120B (78.4%, +18.9pp), GLM-5 Turbo (78.4%, +5.4pp), and Seed 2.0 Lite (75.7%, +18.9pp).

标签

LLM AI Agent Telecommunications API Benchmark

arXiv 分类

cs.SE cs.AI cs.CR