TY - GEN
T1 - A highly parallel FPGA based IEEE-754 compliant double-precision binary floating-point multiplication algorithm
AU - Venishetti, Sandeep K.
AU - Akoglu, Ali
PY - 2007
Y1 - 2007
N2 - There is increasing demand for fast floating-point arithmetic support to make Field Programmable Gate Arrays (FPGAs) a practical option for scientific applications. We propose a new IEEE-754 compliant double-precision floating-point multiplication algorithm that supports denormal numbers, NaN and exception handling. Solution involves bit-level operations with minimum dependency between partial products through a specialized adder tree structure tailored to make use of modular and parallel nature of FPGAs. We achieve maximum operational frequency of 274MHz for mantissa multiplication and 228MHz for the overall system on Xilinx Virtex-4 platform. Our design carries performance benefits similar to ASIC based algorithms; and routing benefits similar to ripple carry array and carry save multipliers. Proposed approach outperforms algorithm and IP-Core solutions in the academia and Xilinx LogiCORE multiplier when no embedded resources are used. Algorithm allows reaching double-double precision level with much less performance degradation and pipelining demand than IP-Core based approaches.
AB - There is increasing demand for fast floating-point arithmetic support to make Field Programmable Gate Arrays (FPGAs) a practical option for scientific applications. We propose a new IEEE-754 compliant double-precision floating-point multiplication algorithm that supports denormal numbers, NaN and exception handling. Solution involves bit-level operations with minimum dependency between partial products through a specialized adder tree structure tailored to make use of modular and parallel nature of FPGAs. We achieve maximum operational frequency of 274MHz for mantissa multiplication and 228MHz for the overall system on Xilinx Virtex-4 platform. Our design carries performance benefits similar to ASIC based algorithms; and routing benefits similar to ripple carry array and carry save multipliers. Proposed approach outperforms algorithm and IP-Core solutions in the academia and Xilinx LogiCORE multiplier when no embedded resources are used. Algorithm allows reaching double-double precision level with much less performance degradation and pipelining demand than IP-Core based approaches.
UR - http://www.scopus.com/inward/record.url?scp=50149085733&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=50149085733&partnerID=8YFLogxK
U2 - 10.1109/FPT.2007.4439243
DO - 10.1109/FPT.2007.4439243
M3 - Conference contribution
AN - SCOPUS:50149085733
SN - 1424414725
SN - 9781424414727
T3 - ICFPT 2007 - International Conference on Field Programmable Technology
SP - 145
EP - 152
BT - ICFPT 2007 - International Conference on Field Programmable Technology
T2 - International Conference on Field Programmable Technology, ICFPT 2007
Y2 - 12 December 2007 through 14 December 2007
ER -