DataJoint 2.0: A Computational Substrate for Agentic Scientific Workflows
AI 摘要
DataJoint 2.0构建了一个用于科学工作流的计算基础,实现可查询、可执行和机器可读的SciOps。
主要贡献
- 关系工作流模型
- 对象增强模式
- 语义匹配
- 可扩展类型系统
方法论
通过关系型数据库和扩展技术,统一数据结构、数据和计算转换,实现可控的科学工作流。
原文摘要
Operational rigor determines whether human-agent collaboration succeeds or fails. Scientific data pipelines need the equivalent of DevOps -- SciOps -- yet common approaches fragment provenance across disconnected systems without transactional guarantees. DataJoint 2.0 addresses this gap through the relational workflow model: tables represent workflow steps, rows represent artifacts, foreign keys prescribe execution order. The schema specifies not only what data exists but how it is derived -- a single formal system where data structure, computational dependencies, and integrity constraints are all queryable, enforceable, and machine-readable. Four technical innovations extend this foundation: object-augmented schemas integrating relational metadata with scalable object storage, semantic matching using attribute lineage to prevent erroneous joins, an extensible type system for domain-specific formats, and distributed job coordination designed for composability with external orchestration. By unifying data structure, data, and computational transformations, DataJoint creates a substrate for SciOps where agents can participate in scientific workflows without risking data corruption.