FPGA Optimized Accelerator of DCNN with Fast Data Readout and Multiplier Sharing Strategy

Tuo Ma; Zhiwei Li; Qingjiang Li; Haijun Liu; Zhongjin Zhao; Yinan Wang

doi:10.32604/cmc.2023.045948

Open Access icon Open Access

ARTICLE

FPGA Optimized Accelerator of DCNN with Fast Data Readout and Multiplier Sharing Strategy

Tuo Ma, Zhiwei Li, Qingjiang Li^*, Haijun Liu, Zhongjin Zhao, Yinan Wang

College of Electronic Science and Technology, National University of Defense Technology, Changsha, 410073, China

* Corresponding Author: Qingjiang Li. Email: email

Computers, Materials & Continua 2023, 77(3), 3237-3263. https://doi.org/10.32604/cmc.2023.045948

Received 12 September 2023; Accepted 03 November 2023; Issue published 26 December 2023

Abstract

With the continuous development of deep learning, Deep Convolutional Neural Network (DCNN) has attracted wide attention in the industry due to its high accuracy in image classification. Compared with other DCNN hardware deployment platforms, Field Programmable Gate Array (FPGA) has the advantages of being programmable, low power consumption, parallelism, and low cost. However, the enormous amount of calculation of DCNN and the limited logic capacity of FPGA restrict the energy efficiency of the DCNN accelerator. The traditional sequential sliding window method can improve the throughput of the DCNN accelerator by data multiplexing, but this method’s data multiplexing rate is low because it repeatedly reads the data between rows. This paper proposes a fast data readout strategy via the circular sliding window data reading method, it can improve the multiplexing rate of data between rows by optimizing the memory access order of input data. In addition, the multiplication bit width of the DCNN accelerator is much smaller than that of the Digital Signal Processing (DSP) on the FPGA, which means that there will be a waste of resources if a multiplication uses a single DSP. A multiplier sharing strategy is proposed, the multiplier of the accelerator is customized so that a single DSP block can complete multiple groups of 4, 6, and 8-bit signed multiplication in parallel. Finally, based on two strategies of appeal, an FPGA optimized accelerator is proposed. The accelerator is customized by Verilog language and deployed on Xilinx VCU118. When the accelerator recognizes the CIRFAR-10 dataset, its energy efficiency is 39.98 GOPS/W, which provides 1.73 × speedup energy efficiency over previous DCNN FPGA accelerators. When the accelerator recognizes the IMAGENET dataset, its energy efficiency is 41.12 GOPS/W, which shows 1.28 × −3.14 × energy efficiency compared with others.

Keywords

FPGA; accelerator; DCNN; fast data readout strategy; multiplier sharing strategy; network quantization; energy efficient

Cite This Article

APA Style

Ma, T., Li, Z., Li, Q., Liu, H., Zhao, Z. et al. (2023). FPGA optimized accelerator of DCNN with fast data readout and multiplier sharing strategy. Computers, Materials & Continua, 77(3), 3237–3263. https://doi.org/10.32604/cmc.2023.045948

Vancouver Style

Ma T, Li Z, Li Q, Liu H, Zhao Z, Wang Y. FPGA optimized accelerator of DCNN with fast data readout and multiplier sharing strategy. Comput Mater Contin. 2023;77(3):3237–3263. https://doi.org/10.32604/cmc.2023.045948

IEEE Style

T. Ma, Z. Li, Q. Li, H. Liu, Z. Zhao, and Y. Wang, “FPGA Optimized Accelerator of DCNN with Fast Data Readout and Multiplier Sharing Strategy,” Comput. Mater. Contin., vol. 77, no. 3, pp. 3237–3263, 2023. https://doi.org/10.32604/cmc.2023.045948

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

FPGA Optimized Accelerator of DCNN with Fast Data Readout and Multiplier Sharing Strategy

Abstract

Keywords

Cite This Article

756

434

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link