Towards Verifiable AI with Lightweight Cryptographic Proofs of Inference
AI 摘要
提出了一个轻量级的AI模型推理可验证框架,通过采样和统计特性降低了证明开销。
主要贡献
- 提出了一种基于采样的轻量级密码学证明方法,用于验证AI模型的推理过程。
- 形式化了利用功能不同的模型之间的轨迹分离来保证可验证推理协议安全性的条件。
- 设计了一种适用于大规模部署和审计的、效率与可靠性之间权衡的协议。
方法论
利用Merkle树矢量承诺,对推理执行轨迹进行采样验证,在效率和soundness之间进行权衡。
原文摘要
When large AI models are deployed as cloud-based services, clients have no guarantee that responses are correct or were produced by the intended model. Rerunning inference locally is infeasible for large models, and existing cryptographic proof systems -- while providing strong correctness guarantees -- introduce prohibitive prover overhead (e.g., hundreds of seconds per query for billion-parameter models). We present a verification framework and protocol that replaces full cryptographic proofs with a lightweight, sampling-based approach grounded in statistical properties of neural networks. We formalize the conditions under which trace separation between functionally dissimilar models can be leveraged to argue the security of verifiable inference protocols. The prover commits to the execution trace of inference via Merkle-tree-based vector commitments and opens only a small number of entries along randomly sampled paths from output to input. This yields a protocol that trades soundness for efficiency, a tradeoff well-suited to auditing, large-scale deployment settings where repeated queries amplify detection probability, and scenarios with rationally incentivized provers who face penalties upon detection. Our approach reduces proving times by several orders of magnitude compared to state-of-the-art cryptographic proof systems, going from the order of minutes to the order of milliseconds, with moderately larger proofs. Experiments on ResNet-18 classifiers and Llama-2-7B confirm that common architectures exhibit the statistical properties our protocol requires, and that natural adversarial strategies (gradient-descent reconstruction, inverse transforms, logit swapping) fail to produce traces that evade detection. We additionally present a protocol in the refereed delegation model, where two competing servers enable correct output identification in a logarithmic number of rounds.