Multimodal Learning 相关度: 7/10

From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety

Ganen Sethupathy, Lalit Dumka, Jan Schagen
arXiv: 2603.29777v1 发布: 2026-03-31 更新: 2026-03-31

AI 摘要

针对公共安全,提出一种结合骨骼动作分析和视觉-语言模型的混合边缘行为检测系统。

主要贡献

  • 设计并部署混合边缘行为检测系统
  • 比较骨骼动作分析和视觉-语言模型的性能
  • 评估系统在延迟、资源使用方面的表现

方法论

结合骨骼动作分析进行连续监控,利用视觉-语言模型进行上下文理解,在边缘设备上实现并评估系统。

原文摘要

Public spaces such as transport hubs, city centres, and event venues require timely and reliable detection of potentially violent behaviour to support public safety. While automated video analysis has made significant progress, practical deployment remains constrained by latency, privacy, and resource limitations, particularly under edge-computing conditions. This paper presents the design and demonstrator-based deployment of a hybrid edge-based action detection system that combines skeleton-based motion analysis with vision-language models for semantic scene interpretation. Skeleton-based processing enables continuous, privacy-aware monitoring with low computational overhead, while vision-language models provide contextual understanding and zero-shot reasoning capabilities for complex and previously unseen situations. Rather than proposing new recognition models, the contribution focuses on a system-level comparison of both paradigms under realistic edge constraints. The system is implemented on a GPU-enabled edge device and evaluated with respect to latency, resource usage, and operational trade-offs using a demonstrator-based setup. The results highlight the complementary strengths and limitations of motioncentric and semantic approaches and motivate a hybrid architecture that selectively augments fast skeletonbased detection with higher-level semantic reasoning. The presented system provides a practical foundation for privacy-aware, real-time video analysis in public safety applications.

标签

边缘计算 行为检测 公共安全 视觉-语言模型 骨骼动作分析

arXiv 分类

cs.CV cs.AI