AI Agents 相关度: 7/10

DataJoint 2.0: A Computational Substrate for Agentic Scientific Workflows

Dimitri Yatsenko, Thinh T. Nguyen
arXiv: 2602.16585v1 发布: 2026-02-18 更新: 2026-02-18

AI 摘要

DataJoint 2.0构建了一个用于科学工作流的计算基础,实现可查询、可执行和机器可读的SciOps。

主要贡献

  • 关系工作流模型
  • 对象增强模式
  • 语义匹配
  • 可扩展类型系统

方法论

通过关系型数据库和扩展技术,统一数据结构、数据和计算转换,实现可控的科学工作流。

原文摘要

Operational rigor determines whether human-agent collaboration succeeds or fails. Scientific data pipelines need the equivalent of DevOps -- SciOps -- yet common approaches fragment provenance across disconnected systems without transactional guarantees. DataJoint 2.0 addresses this gap through the relational workflow model: tables represent workflow steps, rows represent artifacts, foreign keys prescribe execution order. The schema specifies not only what data exists but how it is derived -- a single formal system where data structure, computational dependencies, and integrity constraints are all queryable, enforceable, and machine-readable. Four technical innovations extend this foundation: object-augmented schemas integrating relational metadata with scalable object storage, semantic matching using attribute lineage to prevent erroneous joins, an extensible type system for domain-specific formats, and distributed job coordination designed for composability with external orchestration. By unifying data structure, data, and computational transformations, DataJoint creates a substrate for SciOps where agents can participate in scientific workflows without risking data corruption.

标签

SciOps 数据管道 工作流管理 数据一致性

arXiv 分类

cs.DB cs.AI