Multimodal Learning 相关度: 6/10

FastWave: Optimized Diffusion Model for Audio Super-Resolution

Nikita Kuznetsov, Maksim Kaledin

arXiv: 2603.04122v1 发布: 2026-03-04 更新: 2026-03-04

下载 PDF arXiv 页面

AI 摘要

FastWave提出了一种优化的扩散模型用于音频超分辨率，降低了计算成本并提高了训练速度。

主要贡献

提出FastWave模型，参数量小，计算复杂度低
在音频超分辨率任务上，性能优于NU-Wave 2
提供公开代码

方法论

利用扩散模型，针对音频超分辨率任务进行了优化，减少了模型参数和计算复杂度，提高了训练效率。

原文摘要

Audio Super-Resolution is a set of techniques aimed at high-quality estimation of the given signal as if it would be sampled with higher sample rate. Among suggested methods there are diffusion and flow models (which are considered slower), generative adversarial networks (which are considered faster), however both approaches are currently presented by high-parametric networks, requiring high computational costs both for training and inference. We propose a solution to both these problems by re-considering the recent advances in the training of diffusion models and applying them to super-resolution from any to 48 kHz sample rate. Our approach shows better results than NU-Wave 2 and is comparable to state-of-the-art models. Our model called FastWave has around 50 GFLOPs of computational complexity and 1.3 M parameters and can be trained with less resources and significantly faster than the majority of recently proposed diffusion- and flow-based solutions. The code has been made publicly available.

arXiv 分类

cs.SD cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类