Are LLMs Overkill for Databases?: A Study on the Finiteness of SQL
AI 摘要
研究表明,数据库SQL查询复杂度有限,LLM在数据库访问中可能过度设计,模板方法更优。
主要贡献
- 证明实际SQL查询复杂度有限
- 发现SQL查询模板符合Power Law分布
- 提出使用模板代替LLM进行数据库访问的可能性
方法论
分析了376个数据库中的SQL查询,统计查询模板的频率分布,并分析数据库规模与查询复杂度的关系。
原文摘要
Translating natural language to SQL for data retrieval has become more accessible thanks to code generation LLMs. But how hard is it to generate SQL code? While databases can become unbounded in complexity, the complexity of queries is bounded by real life utility and human needs. With a sample of 376 databases, we show that SQL queries, as translations of natural language questions are finite in practical complexity. There is no clear monotonic relationship between increases in database table count and increases in complexity of SQL queries. In their template forms, SQL queries follow a Power Law-like distribution of frequency where 70% of our tested queries can be covered with just 13% of all template types, indicating that the high majority of SQL queries are predictable. This suggests that while LLMs for code generation can be useful, in the domain of database access, they may be operating in a narrow, highly formulaic space where templates could be safer, cheaper, and auditable.