Agent Tuning & Optimization 相关度: 5/10

MCEL: Margin-Based Cross-Entropy Loss for Error-Tolerant Quantized Neural Networks

Mikail Yayla, Akash Kumar

arXiv: 2603.05048v1 发布: 2026-03-05 更新: 2026-03-05

下载 PDF arXiv 页面

AI 摘要

提出了一种基于Margin Cross-Entropy Loss(MCEL)的容错量化神经网络训练方法，无需错误注入。

主要贡献

提出了 Margin Cross-Entropy Loss (MCEL)
建立了比特错误容错性和输出层分类margin的直接联系
MCEL能提升量化模型在比特错误下的鲁棒性

方法论

通过分析比特错误容错性与分类margin的关系，设计一种新的损失函数MCEL，显式地增大logit-level margin分离，以提高模型鲁棒性。

原文摘要

Robustness to bit errors is a key requirement for the reliable use of neural networks (NNs) on emerging approximate computing platforms and error-prone memory technologies. A common approach to achieve bit error tolerance in NNs is injecting bit flips during training according to a predefined error model. While effective in certain scenarios, training-time bit flip injection introduces substantial computational overhead, often degrades inference accuracy at high error rates, and scales poorly for larger NN architectures. These limitations make error injection an increasingly impractical solution for ensuring robustness on future approximate computing platforms and error-prone memory technologies. In this work, we investigate the mechanisms that enable NNs to tolerate bit errors without relying on error-aware training. We establish a direct connection between bit error tolerance and classification margins at the output layer. Building on this insight, we propose a novel loss function, the Margin Cross-Entropy Loss (MCEL), which explicitly promotes logit-level margin separation while preserving the favorable optimization properties of the standard cross-entropy loss. Furthermore, MCEL introduces an interpretable margin parameter that allows robustness to be tuned in a principled manner. Extensive experimental evaluations across multiple datasets of varying complexity, diverse NN architectures, and a range of quantization schemes demonstrate that MCEL substantially improves bit error tolerance, up to 15 % in accuracy for an error rate of 1 %. Our proposed MCEL method is simple to implement, efficient, and can be integrated as a drop-in replacement for standard CEL. It provides a scalable and principled alternative to training-time bit flip injection, offering new insights into the origins of NN robustness and enabling more efficient deployment on approximate computing and memory systems.

arXiv 分类

cs.LG cs.AR

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类