AI Agents 相关度: 9/10

OrgForge: A Multi-Agent Simulation Framework for Verifiable Synthetic Corporate Corpora

Jeffrey Flynt
arXiv: 2603.14997v1 发布: 2026-03-16 更新: 2026-03-16

AI 摘要

OrgForge是一个多智能体仿真框架,用于生成可验证的合成企业语料库,提升RAG pipeline评估效果。

主要贡献

  • 提出OrgForge框架,用于生成具有结构化、可验证的企业数据。
  • 设计了一种严格的物理-认知边界,通过确定性引擎维护事件真值。
  • 实现了因果链追踪系统,用于积累跨人工制品的证据图。

方法论

OrgForge使用确定性Python引擎驱动事件,LLM生成文本,通过Actor本地时钟确保时间戳正确性,并模拟组织行为。

原文摘要

Evaluating retrieval-augmented generation (RAG) pipelines requires corpora where ground truth is knowable, temporally structured, and cross-artifact properties that real-world datasets rarely provide cleanly. Existing resources such as the Enron corpus carry legal ambiguity, demographic skew, and no structured ground truth. Purely LLM-generated synthetic data solves the legal problem but introduces a subtler one: the generating model cannot be prevented from hallucinating facts that contradict themselves across documents.We present OrgForge, an open-source multi-agent simulation framework that enforces a strict physics-cognition boundary: a deterministic Python engine maintains a SimEvent ground truth bus; large language models generate only surface prose, constrained by validated proposals. An actor-local clock enforces causal timestamp correctness across all artifact types, eliminating the class of timeline inconsistencies that arise when timestamps are sampled independently per document. We formalize three graph-dynamic subsystems stress propagation via betweenness centrality, temporal edge-weight decay, and Dijkstra escalation routing that govern organizational behavior independently of any LLM. Running a configurable N-day simulation, OrgForge produces interleaved Slack threads, JIRA tickets, Confluence pages, Git pull requests, and emails, all traceable to a shared, immutable event log. We additionally describe a causal chain tracking subsystem that accumulates cross-artifact evidence graphs per incident, a hybrid reciprocal-rank-fusion recurrence detector for identifying repeated failure classes, and an inbound/outbound email engine that routes vendor alerts, customer complaints, and HR correspondence through gated causal chains with probabilistic drop simulation. OrgForge is available under the MIT license.

标签

RAG Multi-Agent Simulation Synthetic Data Corporate Corpus

arXiv 分类

cs.CL cs.AI cs.IR