Agent Tuning & Optimization 相关度: 5/10

Trainable Bitwise Soft Quantization for Input Feature Compression

Karsten Schrödter, Jan Stenkamp, Nina Herrmann, Fabian Gieseke
arXiv: 2603.05172v1 发布: 2026-03-05 更新: 2026-03-05

AI 摘要

提出了一种可训练的逐位软量化层,用于压缩神经网络的输入特征,以适应物联网设备的资源限制。

主要贡献

  • 提出了可训练的逐位软量化层
  • 实现了任务特定的特征压缩
  • 在保持精度的情况下,实现了较高的压缩率

方法论

使用sigmoid函数近似阶跃函数,实现可训练的量化阈值,通过连接多个sigmoid输出实现逐位软量化。

原文摘要

The growing demand for machine learning applications in the context of the Internet of Things calls for new approaches to optimize the use of limited compute and memory resources. Despite significant progress that has been made w.r.t. reducing model sizes and improving efficiency, many applications still require remote servers to provide the required resources. However, such approaches rely on transmitting data from edge devices to remote servers, which may not always be feasible due to bandwidth, latency, or energy constraints. We propose a task-specific, trainable feature quantization layer that compresses the input features of a neural network. This can significantly reduce the amount of data that needs to be transferred from the device to a remote server. In particular, the layer allows each input feature to be quantized to a user-defined number of bits, enabling a simple on-device compression at the time of data collection. The layer is designed to approximate step functions with sigmoids, enabling trainable quantization thresholds. By concatenating outputs from multiple sigmoids, introduced as bitwise soft quantization, it achieves trainable quantized values when integrated with a neural network. We compare our method to full-precision inference as well as to several quantization baselines. Experiments show that our approach outperforms standard quantization methods, while maintaining accuracy levels close to those of full-precision models. In particular, depending on the dataset, compression factors of $5\times$ to $16\times$ can be achieved compared to $32$-bit input without significant performance loss.

标签

量化 特征压缩 神经网络 物联网 低功耗

arXiv 分类

cs.LG