题目/Title:Hardware Implementation of Approximate Fixed-point Divider for Machine Learning Optimization Algorithm
作者/Author:
Gandong Han, Weiyi Zhang, Liting Niu, Chun Zhang, Zhihua Wang, Ziqiang Wang
会议/Conference:PrimeAsia 2022
地点/Location:Shenzhen, China
年份/Issue Date:2022.11-13 Nov.
页码/pages:pp.22-25
摘要/Abstract:
Division operation is necessary for many applications, especially optimization algorithms for machine learning. Usually, a certain degree of loss is acceptable in calculating nonsignificant intermediate variables for a considerable speed improvement. This paper proposes a specialized divider to accelerate machine learning optimization algorithm implementation on hardware. Inspired by the fast inverse square root algorithm, we designed a hardware implementation method according to the algorithm, which generates an approximate division result with conversion between floating-point and fixed-point numbers and multiplication. This paper includes three versions of divider: fastDiv_accuracy, a conventional design with a 35% less delay and minimal error compared to delay-minimized standard divider from the Synopsys DesignWare library; fastDiv_area, an area-oriented design with a 67% less delay and acceptable error compared to the standard divider constrained to the same area size; fastDiv_speed, the fastest design with a 54% less delay compared to delay-minimized standard divider. All these three versions can be applied in deploying optimization algorithms in FPGA or ASIC design on demand.