Multimodal Learning 相关度: 9/10

DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs

Yanbin Wei, Jiangyue Yan, Chun Kang, Yang Chen, Hua Liu, James Kwok, Yu Zhang

arXiv: 2602.21864v1 发布: 2026-02-25 更新: 2026-02-25

下载 PDF arXiv 页面

AI 摘要

DynamicGTR通过动态选择图拓扑表示提升VLM在图问答任务中的性能，实现精度和简洁性的平衡。

主要贡献

提出DynamicGTR框架，动态选择最优图拓扑表示
提升VLM在图算法问答任务中的性能
成功迁移到真实世界的图相关应用，无需额外训练

方法论

提出DynamicGTR框架，在推理阶段针对每个查询动态选择最优的图拓扑表示，以提高VLM的零样本图问答能力。

原文摘要

Vision-Language Models (VLMs) have emerged as versatile solutions for zero-shot question answering (QA) across various domains. However, enabling VLMs to effectively comprehend structured graphs and perform accurate, efficient QA remains challenging. Existing approaches typically rely on one single graph topology representation (GTR), such as fixed-style visual images or unified text descriptions. This ``one-size-fits-all'' strategy often neglects model-specific and task-specific preferences, resulting in inaccurate or over-lengthy responses to graph-related queries. To address this, we propose the $\mbox{DynamicGTR}$ framework, which dynamically selects the optimal GTR for each query during inference, thereby enhancing the zero-shot graph QA capabilities of VLMs with a customizable accuracy and brevity trade-off. Extensive experiments show that DynamicGTR not only improves VLM-based graph algorithm QA performance but also successfully transfers the experience trained from synthetic graph algorithm tasks to real-world applications like link prediction and node classification, without any additional training. Additionally, DynamicGTR demonstrates strong transferability across tasks, domains, and models, suggesting its potential as a flexible solution for broad graph scenarios.

arXiv 分类

cs.CV cs.AI cs.CL cs.GR

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类