Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel - - PowerPoint PPT Presentation

hybrid dot product design for fp enabled fpgas
SMART_READER_LITE
LIVE PREVIEW

Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel - - PowerPoint PPT Presentation

Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019 Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019 Context FPGAs intersting neural network training


slide-1
SLIDE 1

Hybrid Dot-Product Design for FP-Enabled FPGAs

Bogdan Pasca

Intel

ARITH 2019, June 10-12, 2019

slide-2
SLIDE 2

Hybrid Dot-Product Design for FP-Enabled FPGAs

Bogdan Pasca

Intel

ARITH 2019, June 10-12, 2019

slide-3
SLIDE 3

Context

FPGAs intersting neural network training accelerators training: mostly dot-products in forward+backward propagation current industry-standard: bfloat16 multiplications, SP reduction bfloat16 vs SP: 2X bandwidth

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-4
SLIDE 4

Context

FPGAs intersting neural network training accelerators training: mostly dot-products in forward+backward propagation current industry-standard: bfloat16 multiplications, SP reduction bfloat16 vs SP: 2X bandwidth Goal → Find a dot-product implementation that: maintains an accuracy comparable to bfloat16+SP maximizes the dot-product density for a given FPGA

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-5
SLIDE 5

Density

FPGA devices have a various mix of resources increasing compute density → make efficient use of existing mix

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-6
SLIDE 6

Density

FPGA devices have a various mix of resources increasing compute density → make efficient use of existing mix Focus on ”Core Logic Fabric” and VP DSP Blocks

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-7
SLIDE 7

Density - some current devices

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-8
SLIDE 8

Density - some current devices

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-9
SLIDE 9

Density - some current devices

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-10
SLIDE 10

Density - some current devices

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-11
SLIDE 11

Density - some current devices

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-12
SLIDE 12

Background

Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs

A B C D E F G H AB+CD EF+GH AB+CD

32 32 32 32 32 32 32 32 32 32 32 32

EF+GH AB+CD+EF+GH

32 32 32 32

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-13
SLIDE 13

Background

Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs

A B C D E F G H AB+CD EF+GH AB+CD

32 32 32 32 32 32 32 32 32 32 32 32

EF+GH AB+CD+EF+GH

32 32 32 32

soft-logic-only solution

bfloat16 multiplier → 2/DSP SP FP adder 1 → 1/DSP N-element dot product: CDSP = N/2+ N − 1

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-14
SLIDE 14

Background

Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs

A B C D E F G H AB+CD EF+GH AB+CD

32 32 32 32 32 32 32 32 32 32 32 32

EF+GH AB+CD+EF+GH

32 32 32 32

soft-logic-only solution

bfloat16 multiplier → 2/DSP SP FP adder 1 → 1/DSP N-element dot product: CDSP = N/2+ N − 1 adjust ratio: migrate SP FP adders to logic (300-400 ALMs/add)

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-15
SLIDE 15

Background

Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs

A B C D E F G H AB+CD EF+GH AB+CD

32 32 32 32 32 32 32 32 32 32 32 32

EF+GH AB+CD+EF+GH

32 32 32 32

soft-logic-only solution

bfloat16 multiplier → 2/DSP SP FP adder 1 → 1/DSP N-element dot product: CDSP = N/2+ N − 1 adjust ratio: migrate SP FP adders to logic (300-400 ALMs/add) solution is too large

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-16
SLIDE 16

How do we solve this?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-17
SLIDE 17

Our implementation

dot−prodct soft−logic

  • f the dot−product

hard FP part

α

A1..α

β P

ACC

β α

Pb Bα+1..α+β B1..α Aα+1..α+β Pl Pg

N = α+β CALM = f(α,w) CDSP = α/4+β

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-18
SLIDE 18

Our implementation

dot−prodct soft−logic

  • f the dot−product

hard FP part

α

A1..α

β P

ACC

β α

Pb Bα+1..α+β B1..α Aα+1..α+β Pl Pg

N = α+β CALM = f(α,w) CDSP = α/4+β Objective: CALM/CDSP ≈ device ALM/DSP ratio

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-19
SLIDE 19

Hard FP part

ACC A3 B3

Pl

32 32 32 32 32 32 32 32 32 32

A2 B2

32

A3B3+ACC A2B2+(A3B3+ACC) B1 A1 A0 B0

32 32 32 32 32

A0B0+A1B1

Pg Pb P

SP accumulation integrated Pg will merge with the logic-based dot product Pl recirculated, added with Pb using spare adder

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-20
SLIDE 20

Soft FP part

dot2 dot2 dot2 dot2 dot2 dot2

23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

+ + +

CONV

18x18

two

+

18x18

two

+

18x18

two

+

NORM A/B 0-1 6-7 8-9 10-11 4-5 2-3

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-21
SLIDE 21

Soft FP part

dot2 dot2 dot2 dot2 dot2 dot2

23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

+ + +

CONV

18x18

two

+

18x18

two

+

18x18

two

+

NORM A/B 0-1 6-7 8-9 10-11 4-5 2-3

FPDSP

P A/B 13 A/B 12 ACC A/B 15 A/B 14

FPDSP Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-22
SLIDE 22

Soft FP part

dot2 dot2 dot2 dot2 dot2 dot2

... 8 w ... 8 w ... 8 w ... 8 w ... 8 w ... 8 w

23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

+ + +

CONV

18x18

two

+

18x18

two

+

18x18

two

+

NORM A/B 0-1 6-7 8-9 10-11 4-5 2-3

FPDSP

P A/B 13 A/B 12 ACC A/B 15 A/B 14

FPDSP Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-23
SLIDE 23

Soft FP part - fused

Multipliers 1DSP = 2× 18x18 = 4× 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN → RZw extend exponent to avoid overflow/underlow

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-24
SLIDE 24

Soft FP part - fused

Multipliers 1DSP = 2× 18x18 = 4× 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN → RZw extend exponent to avoid overflow/underlow Adders (except first layer inputs) operate on 2’s complement mantissas mantissa grows by 1 (+1 optional) bit(s) every stage mantissa format changes from (SM, 1, wF)→ (2C, 1+1+L, w+L) after final adder, normalization converts to SP

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-25
SLIDE 25

Soft FP part - fused

Multipliers 1DSP = 2× 18x18 = 4× 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN → RZw extend exponent to avoid overflow/underlow Adders (except first layer inputs) operate on 2’s complement mantissas mantissa grows by 1 (+1 optional) bit(s) every stage mantissa format changes from (SM, 1, wF)→ (2C, 1+1+L, w+L) after final adder, normalization converts to SP intermediary normalization may be introduced for large α

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-26
SLIDE 26

Accuracy (Average)

w - knob to control the accuracy ec - exponents centered, es - the exponent span ec = 0,es = 10 - inputs generated in (2−10 · 2,210 · 2)

Table: Average relative error comparison between the proposed hybrid dot-product and a typical AI bfloat16+SP implementation for n = 16, α = 12,

β = 4, βg = 2, βb = 2

Config Param Proposed AI ec = 0, es = 5 w = 7 1.287601e-02 4.570449e-03 w = 8 6.172194e-03 w = 9 2.935275e-03 ec = 0, es = 10 w = 7 7.934867e-03 3.402314e-03 w = 8 4.120781e-03 w = 9 1.864206e-03 ec = 0, es = 20 w = 7 6.672454e-03 2.996574e-03 w = 8 3.161355e-03 w = 9 1.588372e-03

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-27
SLIDE 27

Density

rdot = CDSP/ALMs

Config Param ALMs DSPs rdot n = 16 w = 7 1030 7 147

α = 12,β = 4

w = 8 1075 153

βg = 2,βb = 2

w = 9 1141 163 n = 16 w = 7 863 8.5 102

α = 10,β = 6

w = 8 894 106

βg = 4,βb = 2

w = 9 948 112

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-28
SLIDE 28

Density

rdot = CDSP/ALMs

Config Param ALMs DSPs rdot n = 16 w = 7 1030 7 147

α = 12,β = 4

w = 8 1075 153

βg = 2,βb = 2

w = 9 1141 163 n = 16 w = 7 863 8.5 102

α = 10,β = 6

w = 8 894 106

βg = 4,βb = 2

w = 9 948 112

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-29
SLIDE 29

Open questions

reduction tree topology?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-30
SLIDE 30

Open questions

reduction tree topology? accuracy?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-31
SLIDE 31

Open questions

reduction tree topology? accuracy? resource ratio - account for plumbing?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-32
SLIDE 32

Open questions

reduction tree topology? accuracy? resource ratio - account for plumbing? integration/syncronization?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-33
SLIDE 33

Open questions

reduction tree topology? accuracy? resource ratio - account for plumbing? integration/syncronization? design/portability?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

slide-34
SLIDE 34

Open questions

reduction tree topology? accuracy? resource ratio - account for plumbing? integration/syncronization? design/portability? routability?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs