AI Agents 相关度: 9/10

FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use

Jiaxuan Lu, Kong Wang, Yemin Wang, Qingmei Tang, Hongwei Zeng, Xiang Chen, Jiahao Pi, Shujian Deng, Lingzhi Chen, Yi Fu, Kehua Yang, Xiao Sun

arXiv: 2603.08262v1 发布: 2026-03-09 更新: 2026-03-09

下载 PDF arXiv 页面

AI 摘要

提出了FinToolBench，一个评估LLM在金融领域工具使用的新基准，包含大量真实金融工具。

主要贡献

构建了包含760个可执行金融工具的真实基准FinToolBench
提出了评估金融工具使用代理的关键维度：及时性、意图类型和监管领域一致性
提出了FATR，一个金融领域感知的工具检索和推理基线

方法论

构建包含大量真实金融工具的测试环境，设计多维度的评估框架，并提出领域感知的基线模型。

原文摘要

The integration of Large Language Models (LLMs) into the financial domain is driving a paradigm shift from passive information retrieval to dynamic, agentic interaction. While general-purpose tool learning has witnessed a surge in benchmarks, the financial sector, characterized by high stakes, strict compliance, and rapid data volatility, remains critically underserved. Existing financial evaluations predominantly focus on static textual analysis or document-based QA, ignoring the complex reality of tool execution. Conversely, general tool benchmarks lack the domain-specific rigor required for finance, often relying on toy environments or a negligible number of financial APIs. To bridge this gap, we introduce FinToolBench, the first real-world, runnable benchmark dedicated to evaluating financial tool learning agents. Unlike prior works limited to a handful of mock tools, FinToolBench establishes a realistic ecosystem coupling 760 executable financial tools with 295 rigorous, tool-required queries. We propose a novel evaluation framework that goes beyond binary execution success, assessing agents on finance-critical dimensions: timeliness, intent type, and regulatory domain alignment. Furthermore, we present FATR, a finance-aware tool retrieval and reasoning baseline that enhances stability and compliance. By providing the first testbed for auditable, agentic financial execution, FinToolBench sets a new standard for trustworthy AI in finance. The tool manifest, execution environment, and evaluation code will be open-sourced to facilitate future research.

arXiv 分类

cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类