Multimodal Learning 相关度: 9/10

Hierarchy-Guided Multimodal Representation Learning for Taxonomic Inference

Sk Miraj Ahmed, Xi Yu, Yunqi Li, Yuewei Lin, Wei Xu

arXiv: 2603.25573v1 发布: 2026-03-26 更新: 2026-03-26

下载 PDF arXiv 页面

AI 摘要

提出层级引导的多模态表示学习方法，用于解决生物分类推断问题，提升分类准确率。

主要贡献

提出Hierarchical Information Regularization (HiR)进行层级信息编码
设计CLiBD-HiR和CLiBD-HiR-Fuse两种变体
在生物多样性数据集上验证了方法的有效性，特别是在部分/损坏DNA情况下

方法论

通过层级信息正则化(HiR)塑造嵌入空间，并使用轻量级融合预测器处理不同模态数据。

原文摘要

Accurate biodiversity identification from large-scale field data is a foundational problem with direct impact on ecology, conservation, and environmental monitoring. In practice, the core task is taxonomic prediction - inferring order, family, genus, or species from imperfect inputs such as specimen images, DNA barcodes, or both. Existing multimodal methods often treat taxonomy as a flat label space and therefore fail to encode the hierarchical structure of biological classification, which is critical for robustness under noise and missing modalities. We present two end-to-end variants for hierarchy-aware multimodal learning: CLiBD-HiR, which introduces Hierarchical Information Regularization (HiR) to shape embedding geometry across taxonomic levels, yielding structured and noise-robust representations; and CLiBD-HiR-Fuse, which additionally trains a lightweight fusion predictor that supports image-only, DNA-only, or joint inference and is resilient to modality corruption. Across large-scale biodiversity benchmarks, our approach improves taxonomic classification accuracy by over 14 percent compared to strong multimodal baselines, with particularly large gains under partial and corrupted DNA conditions. These results highlight that explicitly encoding biological hierarchy, together with flexible fusion, is key for practical biodiversity foundation models.

arXiv 分类

cs.CV cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类