Fangzhou He1,3, Ke Ding1,2, Dingjiang Yan3, Jie Li3,*, Jiajun Wang1,2, Mingzhe Chen1,2
CMC-Computers, Materials & Continua, Vol.80, No.2, pp. 3021-3045, 2024, DOI:10.32604/cmc.2024.053632
- 15 August 2024
Abstract Massive computational complexity and memory requirement of artificial intelligence models impede their deployability on edge computing devices of the Internet of Things (IoT). While Power-of-Two (PoT) quantization is proposed to improve the efficiency for edge inference of Deep Neural Networks (DNNs), existing PoT schemes require a huge amount of bit-wise manipulation and have large memory overhead, and their efficiency is bounded by the bottleneck of computation latency and memory footprint. To tackle this challenge, we present an efficient inference approach on the basis of PoT quantization and model compression. An integer-only scalar PoT quantization (IOS-PoT)… More >