hybrid dot product design for fp enabled fpgas
play

Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel - PowerPoint PPT Presentation

Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019 Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019 Context FPGAs intersting neural network training


  1. Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019

  2. Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019

  3. Context FPGAs intersting neural network training accelerators training: mostly dot-products in forward+backward propagation current industry-standard: bfloat16 multiplications, SP reduction bfloat16 vs SP: 2X bandwidth Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  4. Context FPGAs intersting neural network training accelerators training: mostly dot-products in forward+backward propagation current industry-standard: bfloat16 multiplications, SP reduction bfloat16 vs SP: 2X bandwidth Goal → Find a dot-product implementation that: maintains an accuracy comparable to bfloat16+SP maximizes the dot-product density for a given FPGA Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  5. Density FPGA devices have a various mix of resources increasing compute density → make efficient use of existing mix Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  6. Density FPGA devices have a various mix of resources increasing compute density → make efficient use of existing mix Focus on ”Core Logic Fabric” and VP DSP Blocks Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  7. Density - some current devices Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  8. Density - some current devices Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  9. Density - some current devices Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  10. Density - some current devices Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  11. Density - some current devices Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  12. Background Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs A B C D AB+CD E F EF+GH G H 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 AB+CD AB+CD+EF+GH EF+GH Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  13. Background Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs A B C D AB+CD E F EF+GH G H 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 AB+CD AB+CD+EF+GH EF+GH soft-logic-only solution bfloat16 multiplier → 2/DSP SP FP adder 1 → 1/DSP N-element dot product: C DSP = N / 2 + N − 1 Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  14. Background Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs A B C D AB+CD E F EF+GH G H 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 AB+CD AB+CD+EF+GH EF+GH soft-logic-only solution bfloat16 multiplier → 2/DSP SP FP adder 1 → 1/DSP N-element dot product: C DSP = N / 2 + N − 1 adjust ratio: migrate SP FP adders to logic (300-400 ALMs/add) Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  15. Background Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs A B C D AB+CD E F EF+GH G H 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 AB+CD AB+CD+EF+GH EF+GH soft-logic-only solution bfloat16 multiplier → 2/DSP SP FP adder 1 → 1/DSP N-element dot product: C DSP = N / 2 + N − 1 adjust ratio: migrate SP FP adders to logic (300-400 ALMs/add) solution is too large Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  16. How do we solve this? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  17. Our implementation A 1 .. α B 1 .. α A α + 1 .. α + β B α + 1 .. α + β ACC α α β β soft−logic hard FP part P b dot−prodct P g of the dot−product P l P N = α + β C ALM = f ( α , w ) C DSP = α / 4 + β Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  18. Our implementation A 1 .. α B 1 .. α A α + 1 .. α + β B α + 1 .. α + β ACC α α β β soft−logic hard FP part P b dot−prodct P g of the dot−product P l P N = α + β C ALM = f ( α , w ) C DSP = α / 4 + β Objective: C ALM / C DSP ≈ device ALM/DSP ratio Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  19. Hard FP part P l A0 B0 A1 B1 A2 B2 A3 B3 ACC 32 32 32 32 32 32 32 32 32 32 32 32 A2B2+(A3B3+ACC) A3B3+ACC P b 32 32 32 32 P g P A0B0+A1B1 SP accumulation integrated P g will merge with the logic-based dot product P l recirculated, added with P b using spare adder Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  20. Soft FP part A/B 0-1 2-3 4-5 6-7 8-9 10-11 1 2 two two two 3 18x18 18x18 18x18 4 dot2 dot2 dot2 dot2 dot2 dot2 5 6 7 8 + + + 9 CONV 10 11 + + 12 13 14 + 15 16 17 NORM 18 19 20 21 22 23 Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  21. Soft FP part A/B 0-1 2-3 4-5 6-7 8-9 10-11 A/B 12 A/B 13 1 FPDSP 2 two two two 3 18x18 18x18 18x18 4 dot2 dot2 dot2 dot2 dot2 dot2 5 6 7 8 + + + 9 CONV 10 11 + + 12 13 14 A/B 14 A/B 15 ACC + 15 16 FPDSP 17 NORM 18 19 20 21 22 23 P Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  22. Soft FP part A/B 0-1 2-3 4-5 6-7 8-9 10-11 A/B 12 A/B 13 1 FPDSP 2 two two two 3 18x18 18x18 18x18 8 4 dot2 dot2 dot2 dot2 dot2 dot2 w 5 ... 6 8 7 w ... 8 + + + 9 8 CONV 10 w ... 11 + + 12 8 13 w ... 14 A/B 14 A/B 15 ACC + 15 8 16 FPDSP w ... 17 NORM 8 18 w 19 ... 20 21 22 23 P Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  23. Soft FP part - fused Multipliers 1 DSP = 2 × 18 x 18 = 4 × 8 x 8 mantissa multipliers (+ALMs) skip multiplier normalization RN → RZ w extend exponent to avoid overflow/underlow Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  24. Soft FP part - fused Multipliers 1 DSP = 2 × 18 x 18 = 4 × 8 x 8 mantissa multipliers (+ALMs) skip multiplier normalization RN → RZ w extend exponent to avoid overflow/underlow Adders (except first layer inputs) operate on 2’s complement mantissas mantissa grows by 1 (+1 optional) bit(s) every stage mantissa format changes from (SM, 1, wF) → (2C, 1+1+L, w+L) after final adder, normalization converts to SP Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  25. Soft FP part - fused Multipliers 1 DSP = 2 × 18 x 18 = 4 × 8 x 8 mantissa multipliers (+ALMs) skip multiplier normalization RN → RZ w extend exponent to avoid overflow/underlow Adders (except first layer inputs) operate on 2’s complement mantissas mantissa grows by 1 (+1 optional) bit(s) every stage mantissa format changes from (SM, 1, wF) → (2C, 1+1+L, w+L) after final adder, normalization converts to SP intermediary normalization may be introduced for large α Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  26. Accuracy (Average) w - knob to control the accuracy e c - exponents centered, e s - the exponent span e c = 0 , e s = 10 - inputs generated in ( 2 − 10 · 2 , 2 10 · 2 ) Table: Average relative error comparison between the proposed hybrid dot-product and a typical AI bfloat16+SP implementation for n = 16, α = 12, β = 4, β g = 2, β b = 2 Config Param Proposed AI w = 7 1.287601e-02 e c = 0, e s = 5 w = 8 6.172194e-03 4.570449e-03 w = 9 2.935275e-03 w = 7 7.934867e-03 e c = 0, e s = 10 w = 8 4.120781e-03 3.402314e-03 w = 9 1.864206e-03 w = 7 6.672454e-03 e c = 0, e s = 20 w = 8 3.161355e-03 2.996574e-03 w = 9 1.588372e-03 Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  27. Density r dot = C DSP / ALMs Config Param ALMs DSPs r dot n = 16 w = 7 1030 147 α = 12 , β = 4 w = 8 1075 7 153 β g = 2 , β b = 2 w = 9 1141 163 n = 16 w = 7 863 102 α = 10 , β = 6 w = 8 894 8.5 106 β g = 4 , β b = 2 w = 9 948 112 Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  28. Density r dot = C DSP / ALMs Config Param ALMs DSPs r dot n = 16 w = 7 1030 147 α = 12 , β = 4 w = 8 1075 7 153 β g = 2 , β b = 2 w = 9 1141 163 n = 16 w = 7 863 102 α = 10 , β = 6 w = 8 894 8.5 106 β g = 4 , β b = 2 w = 9 948 112 Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  29. Open questions reduction tree topology? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  30. Open questions reduction tree topology? accuracy? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  31. Open questions reduction tree topology? accuracy? resource ratio - account for plumbing? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  32. Open questions reduction tree topology? accuracy? resource ratio - account for plumbing? integration/syncronization? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  33. Open questions reduction tree topology? accuracy? resource ratio - account for plumbing? integration/syncronization? design/portability? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

  34. Open questions reduction tree topology? accuracy? resource ratio - account for plumbing? integration/syncronization? design/portability? routability? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend