Efficient Floating-Point Logarithm Unit for FPGAs
The Exelixis Lab,
- Dept. of Computer Science,
Efficient Floating-Point Logarithm Unit for FPGAs Nikolaos Alachiotis - - PowerPoint PPT Presentation
Efficient Floating-Point Logarithm Unit for FPGAs Nikolaos Alachiotis , Alexandros Stamatakis The Exelixis Lab, Dept. of Computer Science, TUM, Munich, Germany PRESENTATION OVERVIEW Introduction Approximation Strategy Reconfigurable
Tip probability vector Ancestral probability vector Virtual Root
51-q MSBs 63 62 downto 52 51 downto 0 Sign Exponent Mantissa Sign Exponent Mantissa
63 62 downto 52 51 downto 0
Sign Exponent Mantissa
Sign Exponent Mantissa
1 0 FP VAL log(2) FP VAL MAN LUT EXP LUT SUB 2046 P R MULT 1 0 CASE DETECT ADD
63 62 downto 52 51 downto 0
Sign Exponent Mantissa
Sign Exponent Mantissa
1 0 FP VAL log(2) FP VAL MAN LUT EXP LUT SUB 2046 P R MULT 1 0 CASE DETECT ADD INPUT CASE DETECTION log(Negative number)=nan log(Nan)=nan log(Inf)=Inf log(-Inf)=nan
63 62 downto 52 51 downto 0
Sign Exponent Mantissa
Sign Exponent Mantissa
1 0 FP VAL log(2) FP VAL MAN LUT EXP LUT SUB 2046 P R MULT 1 0 CASE DETECT ADD CREATE THE EXPLUT INDEX Decimal value Exponent
1
… … 1022
1023 1024 1 … ... 2046 1023
63 62 downto 52 51 downto 0
Sign Exponent Mantissa
Sign Exponent Mantissa
1 0 FP VAL log(2) FP VAL MAN LUT EXP LUT SUB 2046 P R MULT 1 0 CASE DETECT ADD CREATE THE EXPLUT INDEX Decimal value Exponent
1
… … 1022
1023 1024 1 … ... 2046 1023
63 62 downto 52 51 downto 0
Sign Exponent Mantissa
Sign Exponent Mantissa
1 0 FP VAL log(2) FP VAL MAN LUT EXP LUT SUB 2046 P R MULT 1 0 CASE DETECT ADD CREATE THE EXPLUT INDEX Decimal value Exponent
1
… … 1022
1023 1024 1 … ... 2046 1023 EXP LUT EXP LUT
63 62 downto 52 51 downto 0
Sign Exponent Mantissa
Sign Exponent Mantissa
1 0 FP VAL log(2) FP VAL MAN LUT EXP LUT SUB 2046 P R MULT 1 0 CASE DETECT ADD CREATE THE EXPLUT INDEX Decimal value Exponent
1
… … 1022
1023 1024 1 … ... 2046 1023 EXP LUT EXP LUT
63 62 downto 52 51 downto 0
Sign Exponent Mantissa
Sign Exponent Mantissa
1 0 FP VAL log(2) FP VAL MAN LUT EXP LUT SUB 2046 P R MULT 1 0 CASE DETECT ADD CREATE THE EXPLUT INDEX Decimal value Exponent
1
… … 1022
1023 1024 1 … ... 2046 1023 EXP LUT EXP LUT
63 62 downto 52 51 downto 0
Sign Exponent Mantissa
Sign Exponent Mantissa
1 0 FP VAL log(2) FP VAL MAN LUT EXP LUT SUB 2046 P R MULT 1 0 CASE DETECT ADD FLOATING-POINT VALUE Single-precision values Single-precision MULT and ADD For single-precision inputs EXPLUT containts 128 entries to construct a single-precision value For double-precision inputs EXPLUT contains 1024 entries to construct a single-precision value
63 62 downto 52 51 downto 0
Sign Exponent Mantissa
Sign Exponent Mantissa
1 0 FP VAL log(2) FP VAL MAN LUT EXP LUT SUB 2046 P R MULT 1 0 CASE DETECT ADD MANTISSA LUT ICSILog 0.6 software
10 20 30 40 50 60 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
Resources (Number of 18Kb block rams) Average Error (x103)
10 20 30 40 50 60 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
Resources (Number of 18Kb block rams) Average Error (x103)
6 block rams = 4096 LUT entries Dataset (Organisms) DP-GNU DP-ICSILog 150
218
140 (Prot)
VIRTEX 5 SX95T for mapping and verification XILINX ISE 10.1 and CHIPSCOPE Pro Analyzer
“Generating high-performance custom floating-point pipelines,” Proc. of FPL 2009.
Slice Registers Slice LUTs Occupied Slices 200 400 600 800 1000 1200 SP-FPLog SP-LAU
BRAMs 18k BRAMs 36k DSP48Es 1 2 3 4 5 6 SP-FPLog SP-LAU
BRAMs 18k BRAMs 36k DSP48Es 1 2 3 4 5 6 SP-FPLog SP-LAU
FPLog LAU Clock Latency 20 22 Max Frequency 244.7 353.5
Slice Registers Slice LUTs Occupied Slices 500 1000 1500 2000 2500 3000 DP-FPLog DP-LAU
BRAMs 18k BRAMs 36k DSP48Es 2 4 6 8 10 12 14 16 18 20 DP-FPLog DP-LAU
BRAMs 18k BRAMs 36k DSP48Es 2 4 6 8 10 12 14 16 18 20 DP-FPLog DP-LAU
FPLog LAU Clock Latency 34 22 Max Frequency 192.3 320.6
Slice Registers Slice LUTs Occupied Slices 200 400 600 800 1000 1200 DP-FPLog DP-LAU
BRAMs 18k BRAMs 36k DSP48Es 0.5 1 1.5 2 2.5 3 3.5 DP-FPLog DP-LAU
FPLog LAU Clock Latency 20 22 Max Frequency 239.6 320.6
BRAMs 18k BRAMs 36k DSP48Es 0.5 1 1.5 2 2.5 3 3.5 DP-FPLog DP-LAU
Single Precision Double Precision 1000 2000 3000 4000 5000 6000 7000 GNU Log (gnu) MKL Log (icc) SP-ICSILog DP-ICSILog SP-LAU DP-LAU
100000000 logarithm calculations time in milliseconds
SP-LAU VS GNU-LOG : 11X MKL-LOG : 1.6X DP-LAU VS GNU-LOG: 18X MKL-LOG: 2.5X Intel Core2 DUO T9600 @ 2.8GHz 6MB L2 Cache
AVAILABILITY DP-ICSILog C Implementation and SP/DP LAU FPGA core for Virtex4 and Virtex5 FPGAs http://wwwkrammer.in.tum.de/exelixis/nikos/ipcores.html Or OpenCores.org: Project name: fp_log http://www.opencores.org/project,fp_log
RELATED PROJECTS Implementation of a UDP/IP core for Virtex 5 FPGAs (optimized for PC-FPGA communication) http://wwwkrammer.in.tum.de/exelixis/nikos/ipcores.html Or OpenCores.org: Project name: udp_ip__core http://www.opencores.org/project,udp_ip__core FUTURE WORK Implementation of a resource-efficient exponential function Integration of the LOG and EXP cores into the general Phylogenetic Architecture