Multimodal Learning 相关度: 7/10

ActivityNarrated: An Open-Ended Narrative Paradigm for Wearable Human Activity Understanding

Lala Shakti Swarup Ray, Mengxi Liu, Alcina Pinto, Deepika Gurung, Daniel Geissler, Paul Lukowoicz, Bo Zhou
arXiv: 2604.00767v1 发布: 2026-04-01 更新: 2026-04-01

AI 摘要

提出了一种开放式可穿戴人体活动理解方法,通过自然语言描述对齐传感器数据。

主要贡献

  • 提出了基于叙事的开放式人体活动理解框架
  • 设计了自然的数据收集和标注流程
  • 定义了基于检索的评估框架

方法论

通过自然语言描述对齐可穿戴传感器数据,使用语言条件学习架构进行传感器到文本的推断。

原文摘要

Wearable HAR has improved steadily, but most progress still relies on closed-set classification, which limits real-world use. In practice, human activity is open-ended, unscripted, personalized, and often compositional, unfolding as narratives rather than instances of fixed classes. We argue that addressing this gap does not require simply scaling datasets or models. It requires a fundamental shift in how wearable HAR is formulated, supervised, and evaluated. This work shows how to model open-ended activity narratives by aligning wearable sensor data with natural-language descriptions in an open-vocabulary setting. Our framework has three core components. First, we introduce a naturalistic data collection and annotation pipeline that combines multi-position wearable sensing with free-form, time-aligned narrative descriptions of ongoing behavior, allowing activity semantics to emerge without a predefined vocabulary. Second, we define a retrieval-based evaluation framework that measures semantic alignment between sensor data and language, enabling principled evaluation without fixed classes while also subsuming closed-set classification as a special case. Third, we present a language-conditioned learning architecture that supports sensor-to-text inference over variable-length sensor streams and heterogeneous sensor placements. Experiments show that models trained with fixed-label objectives degrade sharply under real-world variability, while open-vocabulary sensor-language alignment yields robust and semantically grounded representations. Once this alignment is learned, closed-set activity recognition becomes a simple downstream task. Under cross-participant evaluation, our method achieves 65.3% Macro-F1, compared with 31-34% for strong closed-set HAR baselines. These results establish open-ended narrative modeling as a practical and effective foundation for real-world wearable HAR.

标签

wearable HAR activity recognition natural language processing sensor fusion

arXiv 分类

cs.LG