Multimodal Learning 相关度: 5/10

myMNIST: Benchmark of PETNN, KAN, and Classical Deep Learning Models for Burmese Handwritten Digit Recognition

Ye Kyaw Thu, Thazin Myint Oo, Thepchai Supnithi
arXiv: 2603.18597v1 发布: 2026-03-19 更新: 2026-03-19

AI 摘要

myMNIST缅甸手写数字数据集上,PETNN等模型与经典深度学习模型性能对比基准。

主要贡献

  • 首次在myMNIST数据集上系统评估多种模型
  • 验证了PETNN模型在缅甸手写数字识别上的有效性
  • 建立了可复现的基线,促进未来研究

方法论

在myMNIST数据集上,使用Precision、Recall、F1-Score和Accuracy评估多种模型(CNN, PETNN, KAN等)的性能。

原文摘要

We present the first systematic benchmark on myMNIST (formerly BHDD), a publicly available Burmese handwritten digit dataset important for Myanmar NLP/AI research. We evaluate eleven architectures spanning classical deep learning models (Multi-Layer Perceptron, Convolutional Neural Network, Long Short-Term Memory, Gated Recurrent Unit, Transformer), recent alternatives (FastKAN, EfficientKAN), an energy-based model (JEM), and physics-inspired PETNN variants (Sigmoid, GELU, SiLU). Using Precision, Recall, F1-Score, and Accuracy as evaluation metrics, our results show that the CNN remains a strong baseline, achieving the best overall scores (F1 = 0.9959, Accuracy = 0.9970). The PETNN (GELU) model closely follows (F1 = 0.9955, Accuracy = 0.9966), outperforming LSTM, GRU, Transformer, and KAN variants. JEM, representing energy-based modeling, performs competitively (F1 = 0.9944, Accuracy = 0.9958). KAN-based models (FastKAN, EfficientKAN) trail the top performers but provide a meaningful alternative baseline (Accuracy ~0.992). These findings (i) establish reproducible baselines for myMNIST across diverse modeling paradigms, (ii) highlight PETNN's strong performance relative to classical and Transformer-based models, and (iii) quantify the gap between energy-inspired PETNNs and a true energy-based model (JEM). We release this benchmark to facilitate future research on Myanmar digit recognition and to encourage broader evaluation of emerging architectures on regional scripts.

标签

手写数字识别 缅甸语 深度学习 PETNN 基准测试

arXiv 分类

cs.CV cs.AI cs.CL