Yuejiao Wang, Zhong Ma*, Chaojie Yang, Yu Yang, Lu Wei
CMC-Computers, Materials & Continua, Vol.79, No.1, pp. 819-836, 2024, DOI:10.32604/cmc.2024.047108
- 25 April 2024
Abstract The quantization algorithm compresses the original network by reducing the numerical bit width of the model, which improves the computation speed. Because different layers have different redundancy and sensitivity to data bit width. Reducing the data bit width will result in a loss of accuracy. Therefore, it is difficult to determine the optimal bit width for different parts of the network with guaranteed accuracy. Mixed precision quantization can effectively reduce the amount of computation while keeping the model accuracy basically unchanged. In this paper, a hardware-aware mixed precision quantization strategy optimal assignment algorithm adapted to… More >