Course Content

Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors

Abstract

Deep learning is getting more and more attentions in recent years. Many hardware architectures have been proposed for efficient implementation of deep neural network. The arith[1]metic unit, as a core processing part of the hardware architecture, can determine the functionality of the whole architecture. In this paper, an efficient fixed/floating-point merged multiply[1]accumulate unit for deep learning processor is proposed. The proposed architecture supports 16-bit half-precision floating[1]point multiplication with 32-bit single-precision accumulation for training operations of deep learning algorithm. In addition, within the same hardware, the proposed architecture also sup[1]ports two parallel 8-bit fixed-point multiplications and accumu[1]lating the products to 32-bit fixed-point number. This will enable higher throughput for inference operations of deep learning algorithms. Compared to a half-precision multiply-accumulate unit (accumulating to single-precision), the proposed architecture has only 4.6% area overhead. With the proposed multiply[1]accumulate unit, the deep learning processor can support both training and high-throughput inference.

Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors

Course Content

Information

Customer Service

Extra

My Account

Help & Support

Connect Us