Multimodal Learning 相关度: 5/10

Music Genre Classification: A Comparative Analysis of Classical Machine Learning and Deep Learning Approaches

Sachin Prajuli, Abhishek Karna, OmPrakash Dhakl

arXiv: 2603.15440v1 发布: 2026-03-16 更新: 2026-03-16

下载 PDF arXiv 页面

AI 摘要

论文比较了经典机器学习和深度学习方法在尼泊尔音乐流派分类上的应用，并提出了新的数据集。

主要贡献

构建了一个包含8个尼泊尔音乐流派的新数据集
比较了9种分类模型在尼泊尔音乐流派分类上的性能
分析了模型误分类模式的文化原因

方法论

使用经典机器学习和深度学习模型，提取音频特征或直接使用Mel频谱图进行音乐流派分类。

原文摘要

Automatic music genre classification is a long-standing challenge in Music Information Retrieval (MIR); work on non-Western music traditions remains scarce. Nepali music encompasses culturally rich and acoustically diverse genres--from the call-and-response duets of Lok Dohori to the rhythmic poetry of Deuda and the distinctive melodies of Tamang Selo--that have not been addressed by existing classification systems. In this paper, we construct a novel dataset of approximately 8,000 labeled 30-second audio clips spanning eight Nepali music genres and conduct a systematic comparison of nine classification models across two paradigms. Five classical machine learning classifiers (Logistic Regression, SVM, KNN, Random Forest, and XGBoost) are trained on 51 hand-crafted audio features extracted via Librosa, while four deep learning architectures (CNN, RNN, parallel CNN-RNN, and sequential CNN followed by RNN) operate on Mel spectrograms of dimension 640 x 128. Our experiments reveal that the sequential Convolutional Recurrent Neural Network (CRNN)--in which convolutional layers feed into an LSTM--achieves the highest accuracy of 84%, substantially outperforming both the best classical models (Logistic Regression and XGBoost, both at 71%) and all other deep architectures. We provide per-class precision, recall, F1-score, confusion matrices, and ROC analysis for every model, and offer a culturally grounded interpretation of misclassification patterns that reflects genuine overlaps in Nepal's musical traditions.

arXiv 分类

cs.SD cs.AI eess.AS

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类