Multimodal Learning 相关度: 8/10

Counting Without Numbers \& Finding Without Words

Badri Narayana Patro
arXiv: 2603.24470v1 发布: 2026-03-25 更新: 2026-03-25

AI 摘要

提出了一种结合视觉和听觉生物特征的多模态宠物重聚系统,提高了宠物重聚的成功率。

主要贡献

  • 提出了一种结合视觉和听觉的多模态重聚系统
  • 系统能够处理不同频率范围的动物叫声
  • 系统能够容忍因压力引起的宠物外观变化

方法论

该系统结合了视觉和听觉生物特征,通过物种自适应架构处理动物叫声并进行概率视觉匹配。

原文摘要

Every year, 10 million pets enter shelters, separated from their families. Despite desperate searches by both guardians and lost animals, 70% never reunite, not because matches do not exist, but because current systems look only at appearance, while animals recognize each other through sound. We ask, why does computer vision treat vocalizing species as silent visual objects? Drawing on five decades of cognitive science showing that animals perceive quantity approximately and communicate identity acoustically, we present the first multimodal reunification system integrating visual and acoustic biometrics. Our species-adaptive architecture processes vocalizations from 10Hz elephant rumbles to 4kHz puppy whines, paired with probabilistic visual matching that tolerates stress-induced appearance changes. This work demonstrates that AI grounded in biological communication principles can serve vulnerable populations that lack human language.

标签

多模态学习 生物特征识别 动物行为 计算机视觉 音频处理

arXiv 分类

cs.CV cs.AI cs.CL cs.SI