FPGA based Flexible Implementation of Light Weight Inference on Deep Convolutional Neural Networks
Standard Convolution (StdConv) is the main technique used in the state of the art Deep Convolutional Neural Networks (DCNNs). Fewer computations are achieved if Depthwise Separable Convolution technique (SepConv) is used as an alternative. A crucial issue in many applications like smart cameras and autonomous vehicles where low latency is essential stems from deploying a lightweight and low cost inference models. An acceptable accuracy should be kept with tolerable computations and memory access load. A flexible architecture for different DCNN convolution types and models is proposed. The flexibility comes from the sharing of one memory access unit with different types of layers regardless of the selected kernel size, by multiplying each weight vector by local operators with variant aperture. Moreover, one depthwise computation unit can be used for both standard and pointwise layers. The learnable parameters are quantized to 8-bits fixed point representation and that gives very limited reduction of accuracy and a considerable reduction of the Field-Programmable Gate Array (FPGA) resources. To reduce processing time, inter layer parallel computations are performed. The experiment is conducted by using grey scale ORL database with shallow Convolutional Neural Network (CNN) and the colored Canadian Institute for Advanced Research 10 classes (CIFAR-10) database with DCNN, and a comparable accuracies of 93% and 85.7% are achieved respectively using very low cost of Spartan 3E and moderate cost of zynq FPGA platforms.
[1] Bai L., Zhao Y., and Huang X., “A CNN Accelerator on FPGA Using Depthwise Separable Convolution,” IEEE Transactions on Circuits and Systems 2: Express Briefs, vol. 65, no. 10, pp. 1415-1419, 2018. DOI:10.1109/TCSII.2018.2865896
[2] Belbachir K. and Tlemsani R., “Temporal Neural System Applied to Arabic Online Characters Recognition,” The International Arab Journal of Information Technology, vol. 16, no. 3A, pp. 514- 424, 2019. https://www.iajit.org/portal/PDF/Special%20Issu e%202019,%20No.%203A/18597.pdf
[3] Dawwd S., “GLCM Based Parallel Texture Segmentation Using A Multicore Processor,” The International Arab Journal of Information Technology, vol. 16, no. 1, pp. 8-16, 2019. https://www.iajit.org/portal/PDF/January%20201 9,%20No.%201/9828.pdf
[4] Dawwd S., “The Multi 2D Systolic Design and Implementation of Convolutional Neural Networks,” in Proceedings of the IEEE 20th International Conference on Electronics, Circuits, and Systems, Abu Dhabi, pp. 221-224, 2013. DOI:10.1109/ICECS.2013.6815394
[5] Dbouk H. and Shanbhag N., Advances in Neural Information Processing Systems, Curran Associates, 2021. https://www.proceedings.com/content/063/06306 9webtoc.pdf
[6] Dhilleswararao P., Boppu S., Manikandan M., and Cenkeramaddi L., “Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey,” IEEE Access, vol. 10, pp. 131788-131828, 2022. DOI:10.1109/ACCESS.2022.3229767
[7] Dhouibi M., Salem A., Saidi A., and Saoud S., “Accelerating Deep Neural Networks Implementation: A Survey,” IET Computers and Digital Techniques, vol. 15, no. 2, pp. 79-96, 2021. https://doi.org/10.1049/cdt2.12016
[8] Geirhos R., Janssen D., Schütt H., Rauber J., Bethge M., and Wichmann F., “Comparing Deep Neural Networks Against Humans: Object Recognition When the Signal Gets Weaker,” arXiv Preprint, arXiv:1706.06969v2, pp. 1-31, 2018. https://arxiv.org/pdf/1706.06969.pdf
[9] Gilan A., Emad M., and Alizadeh B., “FPGA- Based Implementation of a Real-Time Object Recognition System Using Convolutional Neural Network,” IEEE Transactions on Circuits and Systems 2: Express Briefs, vol. 67, no. 4, pp. 755- 759, 2020. DOI:10.1109/TCSII.2019.2922372
[10] Guo K., Zeng S., Yu J., Wang Y., and Yang H., “
[DL] A Survey of FPGA-Based Neural Network Inference Accelerators,” ACM Transactions on Reconfigurable Technology and Systems, vol. 12, no. 1, pp. 1-26, 2018. https://doi.org/10.1145/3289185
[11] Hadnagy A., Feher B., and Kovacshazy T., “Efficient Implementation of Convolutional Neural Networks on FPGA,” in Proceedings of the 19th International Carpathian Control Conference, Szilvasvarad, pp. 359-364, 2018. https://ieeexplore.ieee.org/document/8399656
[12] Hou Y. and Chen Z., “LeNet-5 Improvement Based on FPGA Acceleration,” The Journal of Engineering, vol. 2020, no. 13, pp. 526-528, 2020. https://doi.org/10.1049/joe.2019.1190
[13] Howard A., Zhu M., Chen B., Kalenichenko D., Wang W., and Weyand T., Andreetto M., Adam H., “MobileNet: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv Preprint, arXiv:1704.04861, pp. 1-9, 2017. https://doi.org/10.48550/arXiv.1704.04861
[14] Iba H. and Noman N., Deep Neural Evolution: 416 The International Arab Journal of Information Technology, Vol. 21, No. 3, May 2024 Deep Learning with Evolutionary Computation, Springer, 2020. https://doi.org/10.1007/978-981- 15-3685-4
[15] Isik M., Inadagobo K., and Aktas H., “Design Optimization for High-Performance Computing Using FPGA,” arXiv Preprint, arXiv: 2304.12474, pp. 1-19, 2023. https://arxiv.org/pdf/2304.12474.pdf
[16] Kaiser L., Gomez A., and Chollet F., “Depthwise Separable Convolutions for Neural Machine Translation,” arXiv Preprint, arXiv:1706.03059, pp. 1-10, 2017. https://arxiv.org/pdf/1706.03059.pdf
[17] Kuramochi R. and Nakahara H., “A Low-Latency Inference of RandomlyWired Convolutional Neural Networks on an FPGA,” IEICE Transactions on Information and System, vol. E104.D, no. 12, pp. 2068-2077, 2021. https://doi.org/10.1587/transinf.2021PAP0010
[18] Li Y., Liu Z., Xu H., Yu H., and Ren F., “A GPU- Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks,” ACM Journal on Emerging Technologies in Computing Systems, vol. 14, no. 2, pp. 1-16, 2018. https://doi.org/10.1145/3154839
[19] Liang F., Tian Z., Dong M., Cheng S., Sun L., Li H., Chen Y., and Zhang G., “Efficient Neural Network Using Pointwise Convolution Kernels with Linear Phase Constraint,” Neurocomputing, vol. 423, no. 3, pp. 572-579, 2021. https://doi.org/10.1016/j.neucom.2020.10.067
[20] Oguntimilehin A. and Balogun K., “Real-Time Sign Language Fingerspelling Recognition Using Convolutional Neural Network,” The International Arab Journal of Information Technology, vol. 21, no. 1, pp. 158-165, 2024. https://doi.org/10.34028/iajit/21/1/14
[21] Peng H., Zhou S., Weitze S., Li J., and Islam S., Geng T., Li A., Zhang W., Song M., Xie M., Liu H., and Ding C., “Binary Complex Neural Network Acceleration on FPGA,” in Proceedings of the IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors, New Jersey, pp. 85-92, 2021. DOI:10.1109/ASAP52443.2021.00021
[22] Pushparaj S. and Arumugam S., “Using 3D Convolutional Neural Network in Surveillance Videos for Recognizing Human,” The International Arab Journal of Information Technology, vol. 15, no. 4, pp. 693-700, 2018. https://www.iajit.org/portal/PDF/July%202018,% 20No.%204/8768.pdf
[23] Salehinejad H. and Valaee S., “Edropout: Energy- Based Dropout and Pruning of Deep Neural Networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no.10, pp. 5279-5292, 2022. DOI:10.1109/TNNLS.2021.3069970
[24] Salim U., Dawwd S., and Ali F., “U-Net Cost Analysis Using Roofline Model,” Al-Rafidain Engineering Journal, vol. 27, no. 2, pp. 198-205, 2022. DOI: 10.33899/rengj.2022.133825.1172
[25] Sifre L. and Mallat S., Rigid-Motion Scattering for Image Classification, Ph.D. Thesis, Ecole Polytechnique, CMAP, 2014. https://www.di.ens.fr/data/publications/papers/ph d_sifre.pdf
[26] Wang P., He X., Chen Q., Cheng A., Liu Q., and Cheng J., “Unsupervised Network Quantization Via Fixed-Point Factorization,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 6, pp. 2706-2720, 2021. DOI:10.1109/TNNLS.2020.3007749
[27] Wang Y., Li K., Xu L., Wei Q., Wang F., and Chen Y., “A Depthwise Separable Fully Convolutional ResNet With ConvCRF for Semisupervised Hyperspectral Image Classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 4621-4632, 2021. DOI:10.1109/JSTARS.2021.3073661
[28] Wu D., Zhang Y., Jia X., Tian L., Li T., Sui L., Xie D., and Shan Y., “A High-Performance CNN Processor Based on FPGA for MobileNets,” in Proceedings of the 29th International Conference on Field Programmable Logic and Applications, Barcelona, pp. 136-143, 2019. DOI:10.1109/FPL.2019.00030
[29] Yang G., Lei J., Fang Z., Li Y., Zhang J., and Xie W., “HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural Networks,” ACM Transaction on Reconfigurable Technology and Systems, vol. 1, no. 1, 2023. https://www.sfu.ca/~zhenman/files/J16-FPT- TRETS2023_HyBNN.pdf
[30] Zavalsız M., Alhajj S., Sailunaz K., Ozyer T., and Alhajj R., “A Comparative Study of Different Pre- Trained Deep Learning Models and Custom CNN for Pancreatic Tumor Detection,” The International Arab Journal of Information Technology, vol. 20, no. 3A, pp. 515-526, 2023. https://doi.org/10.34028/iajit/20/3A/9