Multimodal Learning 相关度: 9/10

Assessing Multimodal Chronic Wound Embeddings with Expert Triplet Agreement

Fabian Kabus, Julia Hindel, Jelena Bratulić, Meropi Karakioulaki, Ayush Gupta, Cristina Has, Thomas Brox, Abhinav Valada, Harald Binder
arXiv: 2603.29376v1 发布: 2026-03-31 更新: 2026-03-31

AI 摘要

论文提出TriDerm框架,利用专家知识评估多模态慢性伤口嵌入,提升RDEB疾病相似病例检索效果。

主要贡献

  • 提出TriDerm框架,融合图像、掩码和专家报告学习伤口表示
  • 利用专家三元组判断评估嵌入空间,快速收集临床相似性知识
  • 结合视觉和文本模态,提升RDEB疾病相似病例检索精度

方法论

TriDerm框架通过 wound-level attention pooling、非对比表示学习和软序数嵌入,分别处理视觉和文本模态数据。

原文摘要

Recessive dystrophic epidermolysis bullosa (RDEB) is a rare genetic skin disorder for which clinicians greatly benefit from finding similar cases using images and clinical text. However, off-the-shelf foundation models do not reliably capture clinically meaningful features for this heterogeneous, long-tail disease, and structured measurement of agreement with experts is challenging. To address these gaps, we propose evaluating embedding spaces with expert ordinal comparisons (triplet judgments), which are fast to collect and encode implicit clinical similarity knowledge. We further introduce TriDerm, a multimodal framework that learns interpretable wound representations from small cohorts by integrating wound imagery, boundary masks, and expert reports. On the vision side, TriDerm adapts visual foundation models to RDEB using wound-level attention pooling and non-contrastive representation learning. For text, we prompt large language models with comparison queries and recover medically meaningful representations via soft ordinal embeddings (SOE). We show that visual and textual modalities capture complementary aspects of wound phenotype, and that fusing both modalities yields 73.5% agreement with experts, outperforming the best off-the-shelf single-modality foundation model by over 5.6 percentage points. We make the expert annotation tool, model code and representative dataset samples publicly available.

标签

多模态学习 表征学习 医学图像 自然语言处理 专家知识

arXiv 分类

cs.CV