Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel - - PowerPoint PPT Presentation
Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel - - PowerPoint PPT Presentation
Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019 Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019 Context FPGAs intersting neural network training
Hybrid Dot-Product Design for FP-Enabled FPGAs
Bogdan Pasca
Intel
ARITH 2019, June 10-12, 2019
Context
FPGAs intersting neural network training accelerators training: mostly dot-products in forward+backward propagation current industry-standard: bfloat16 multiplications, SP reduction bfloat16 vs SP: 2X bandwidth
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Context
FPGAs intersting neural network training accelerators training: mostly dot-products in forward+backward propagation current industry-standard: bfloat16 multiplications, SP reduction bfloat16 vs SP: 2X bandwidth Goal → Find a dot-product implementation that: maintains an accuracy comparable to bfloat16+SP maximizes the dot-product density for a given FPGA
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density
FPGA devices have a various mix of resources increasing compute density → make efficient use of existing mix
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density
FPGA devices have a various mix of resources increasing compute density → make efficient use of existing mix Focus on ”Core Logic Fabric” and VP DSP Blocks
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density - some current devices
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density - some current devices
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density - some current devices
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density - some current devices
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density - some current devices
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Background
Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs
A B C D E F G H AB+CD EF+GH AB+CD
32 32 32 32 32 32 32 32 32 32 32 32
EF+GH AB+CD+EF+GH
32 32 32 32
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Background
Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs
A B C D E F G H AB+CD EF+GH AB+CD
32 32 32 32 32 32 32 32 32 32 32 32
EF+GH AB+CD+EF+GH
32 32 32 32
soft-logic-only solution
bfloat16 multiplier → 2/DSP SP FP adder 1 → 1/DSP N-element dot product: CDSP = N/2+ N − 1
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Background
Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs
A B C D E F G H AB+CD EF+GH AB+CD
32 32 32 32 32 32 32 32 32 32 32 32
EF+GH AB+CD+EF+GH
32 32 32 32
soft-logic-only solution
bfloat16 multiplier → 2/DSP SP FP adder 1 → 1/DSP N-element dot product: CDSP = N/2+ N − 1 adjust ratio: migrate SP FP adders to logic (300-400 ALMs/add)
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Background
Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs
A B C D E F G H AB+CD EF+GH AB+CD
32 32 32 32 32 32 32 32 32 32 32 32
EF+GH AB+CD+EF+GH
32 32 32 32
soft-logic-only solution
bfloat16 multiplier → 2/DSP SP FP adder 1 → 1/DSP N-element dot product: CDSP = N/2+ N − 1 adjust ratio: migrate SP FP adders to logic (300-400 ALMs/add) solution is too large
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
How do we solve this?
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Our implementation
dot−prodct soft−logic
- f the dot−product
hard FP part
α
A1..α
β P
ACC
β α
Pb Bα+1..α+β B1..α Aα+1..α+β Pl Pg
N = α+β CALM = f(α,w) CDSP = α/4+β
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Our implementation
dot−prodct soft−logic
- f the dot−product
hard FP part
α
A1..α
β P
ACC
β α
Pb Bα+1..α+β B1..α Aα+1..α+β Pl Pg
N = α+β CALM = f(α,w) CDSP = α/4+β Objective: CALM/CDSP ≈ device ALM/DSP ratio
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Hard FP part
ACC A3 B3
Pl
32 32 32 32 32 32 32 32 32 32
A2 B2
32
A3B3+ACC A2B2+(A3B3+ACC) B1 A1 A0 B0
32 32 32 32 32
A0B0+A1B1
Pg Pb P
SP accumulation integrated Pg will merge with the logic-based dot product Pl recirculated, added with Pb using spare adder
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Soft FP part
dot2 dot2 dot2 dot2 dot2 dot2
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
+ + +
CONV
18x18
two
+
18x18
two
+
18x18
two
+
NORM A/B 0-1 6-7 8-9 10-11 4-5 2-3
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Soft FP part
dot2 dot2 dot2 dot2 dot2 dot2
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
+ + +
CONV
18x18
two
+
18x18
two
+
18x18
two
+
NORM A/B 0-1 6-7 8-9 10-11 4-5 2-3
FPDSP
P A/B 13 A/B 12 ACC A/B 15 A/B 14
FPDSP Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Soft FP part
dot2 dot2 dot2 dot2 dot2 dot2
... 8 w ... 8 w ... 8 w ... 8 w ... 8 w ... 8 w
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
+ + +
CONV
18x18
two
+
18x18
two
+
18x18
two
+
NORM A/B 0-1 6-7 8-9 10-11 4-5 2-3
FPDSP
P A/B 13 A/B 12 ACC A/B 15 A/B 14
FPDSP Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Soft FP part - fused
Multipliers 1DSP = 2× 18x18 = 4× 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN → RZw extend exponent to avoid overflow/underlow
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Soft FP part - fused
Multipliers 1DSP = 2× 18x18 = 4× 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN → RZw extend exponent to avoid overflow/underlow Adders (except first layer inputs) operate on 2’s complement mantissas mantissa grows by 1 (+1 optional) bit(s) every stage mantissa format changes from (SM, 1, wF)→ (2C, 1+1+L, w+L) after final adder, normalization converts to SP
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Soft FP part - fused
Multipliers 1DSP = 2× 18x18 = 4× 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN → RZw extend exponent to avoid overflow/underlow Adders (except first layer inputs) operate on 2’s complement mantissas mantissa grows by 1 (+1 optional) bit(s) every stage mantissa format changes from (SM, 1, wF)→ (2C, 1+1+L, w+L) after final adder, normalization converts to SP intermediary normalization may be introduced for large α
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Accuracy (Average)
w - knob to control the accuracy ec - exponents centered, es - the exponent span ec = 0,es = 10 - inputs generated in (2−10 · 2,210 · 2)
Table: Average relative error comparison between the proposed hybrid dot-product and a typical AI bfloat16+SP implementation for n = 16, α = 12,
β = 4, βg = 2, βb = 2
Config Param Proposed AI ec = 0, es = 5 w = 7 1.287601e-02 4.570449e-03 w = 8 6.172194e-03 w = 9 2.935275e-03 ec = 0, es = 10 w = 7 7.934867e-03 3.402314e-03 w = 8 4.120781e-03 w = 9 1.864206e-03 ec = 0, es = 20 w = 7 6.672454e-03 2.996574e-03 w = 8 3.161355e-03 w = 9 1.588372e-03
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density
rdot = CDSP/ALMs
Config Param ALMs DSPs rdot n = 16 w = 7 1030 7 147
α = 12,β = 4
w = 8 1075 153
βg = 2,βb = 2
w = 9 1141 163 n = 16 w = 7 863 8.5 102
α = 10,β = 6
w = 8 894 106
βg = 4,βb = 2
w = 9 948 112
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density
rdot = CDSP/ALMs
Config Param ALMs DSPs rdot n = 16 w = 7 1030 7 147
α = 12,β = 4
w = 8 1075 153
βg = 2,βb = 2
w = 9 1141 163 n = 16 w = 7 863 8.5 102
α = 10,β = 6
w = 8 894 106
βg = 4,βb = 2
w = 9 948 112
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Open questions
reduction tree topology?
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Open questions
reduction tree topology? accuracy?
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Open questions
reduction tree topology? accuracy? resource ratio - account for plumbing?
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Open questions
reduction tree topology? accuracy? resource ratio - account for plumbing? integration/syncronization?
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Open questions
reduction tree topology? accuracy? resource ratio - account for plumbing? integration/syncronization? design/portability?
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Open questions
reduction tree topology? accuracy? resource ratio - account for plumbing? integration/syncronization? design/portability? routability?
Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs