LLM Memory & RAG 相关度: 5/10

SoftDTW-CUDA-Torch: Memory-Efficient GPU-Accelerated Soft Dynamic Time Warping for PyTorch

Ron Shapira Weber, Oren Freifeld
arXiv: 2602.17206v1 发布: 2026-02-19 更新: 2026-02-19

AI 摘要

提出了一个GPU加速、内存高效的SoftDTW PyTorch库,解决了现有实现的长度限制、数值不稳定和内存消耗问题。

主要贡献

  • tiled anti-diagonal kernel execution移除序列长度限制
  • log-space backward pass防止浮点溢出
  • fused distance-computation mode减少内存消耗

方法论

通过分块核执行、对数空间反向传播和融合距离计算等技术,优化SoftDTW在GPU上的计算效率和内存使用。

原文摘要

We present softdtw-cuda-torch, an open-source PyTorch library for computing Soft Dynamic Time Warping (SoftDTW) on GPUs. Our implementation addresses three key limitations of existing GPU implementations of SoftDTW: a hard sequence-length cap of 1024, numerical instability in the backward pass for small smoothing parameters, and excessive GPU memory consumption from materializing pairwise distance tensors. We introduce (1) tiled anti-diagonal kernel execution that removes the sequence-length constraint, (2) a log-space back-ward pass that prevents floating-point overflow, and (3) a fused distance-computation mode that eliminates the O(BN M ) intermediate distance tensor, achieving up to 98% memory reduction compared to prior work. The library supports arbitrary sequence lengths, full PyTorch autograd integration, and Soft-DTW Barycenter computation. Code is available at https://github.com/BGU-CS-VIL/sdtw-cuda-torch.

标签

SoftDTW GPU acceleration PyTorch Memory efficiency

arXiv 分类

cs.LG