LLM Reasoning 相关度: 6/10

Small molecule retrieval from tandem mass spectrometry: what are we optimizing for?

Gaetan De Waele, Marek Wydmuch, Krzysztof Dembczyński, Wojciech Kotłowski, Willem Waegeman
arXiv: 2602.16507v1 发布: 2026-02-18 更新: 2026-02-18

AI 摘要

该论文研究了深度学习在LC-MS/MS数据分析中使用的损失函数对分子指纹预测和分子检索的影响,揭示了两者之间的权衡。

主要贡献

  • 揭示了指纹相似性和分子检索之间的根本权衡
  • 推导了新的后悔界限,表征了贝叶斯最优决策的差异
  • 提供了损失函数和指纹选择的指导

方法论

理论分析和数学推导,研究了常用损失函数对模型性能的影响,并基于相似性结构进行指导。

原文摘要

One of the central challenges in the computational analysis of liquid chromatography-tandem mass spectrometry (LC-MS/MS) data is to identify the compounds underlying the output spectra. In recent years, this problem is increasingly tackled using deep learning methods. A common strategy involves predicting a molecular fingerprint vector from an input mass spectrum, which is then used to search for matches in a chemical compound database. While various loss functions are employed in training these predictive models, their impact on model performance remains poorly understood. In this study, we investigate commonly used loss functions, deriving novel regret bounds that characterize when Bayes-optimal decisions for these objectives must diverge. Our results reveal a fundamental trade-off between the two objectives of (1) fingerprint similarity and (2) molecular retrieval. Optimizing for more accurate fingerprint predictions typically worsens retrieval results, and vice versa. Our theoretical analysis shows this trade-off depends on the similarity structure of candidate sets, providing guidance for loss function and fingerprint selection.

标签

质谱 代谢组学 深度学习 分子指纹 损失函数

arXiv 分类

cs.LG