Multimodal Learning 相关度: 8/10

Visual Model Checking: Graph-Based Inference of Visual Routines for Image Retrieval

Adrià Molina, Oriol Ramos Terrades, Josep Lladós

arXiv: 2602.17386v1 发布: 2026-02-19 更新: 2026-02-19

下载 PDF arXiv 页面

AI 摘要

提出一种结合形式验证和深度学习的图像检索框架，提升复杂关系查询的可信度和可验证性。

主要贡献

将形式验证融入图像检索
提出基于图的视觉推理方法
提升复杂查询结果的可信度和可验证性

方法论

结合图模型验证方法和神经代码生成，对检索结果进行形式化推理，验证查询条件是否满足。

原文摘要

Information retrieval lies at the foundation of the modern digital industry. While natural language search has seen dramatic progress in recent years largely driven by embedding-based models and large-scale pretraining, the field still faces significant challenges. Specifically, queries that involve complex relationships, object compositions, or precise constraints such as identities, counts and proportions often remain unresolved or unreliable within current frameworks. In this paper, we propose a novel framework that integrates formal verification into deep learning-based image retrieval through a synergistic combination of graph-based verification methods and neural code generation. Our approach aims to support open-vocabulary natural language queries while producing results that are both trustworthy and verifiable. By grounding retrieval results in a system of formal reasoning, we move beyond the ambiguity and approximation that often characterize vector representations. Instead of accepting uncertainty as a given, our framework explicitly verifies each atomic truth in the user query against the retrieved content. This allows us to not only return matching results, but also to identify and mark which specific constraints are satisfied and which remain unmet, thereby offering a more transparent and accountable retrieval process while boosting the results of the most popular embedding-based approaches.

arXiv 分类

cs.AI cs.IR

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类