Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst - - PowerPoint PPT Presentation

chapter 7 systolic architecture design
SMART_READER_LITE
LIVE PREVIEW

Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst - - PowerPoint PPT Presentation

Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst olic ar chit ect ur es ar e designed by using linear mapping t echniques on r egular dependence gr aphs (DG). Regular Dependence Gr aph : The pr esence of an edge in a


slide-1
SLIDE 1

Chapter 7: Systolic Architecture Design

Keshab K. Parhi

slide-2
SLIDE 2
  • Chap. 7

2

  • Syst olic ar chit ect ur es ar e designed by using linear

mapping t echniques on r egular dependence gr aphs (DG).

  • Regular Dependence Gr aph : The pr esence of an edge in

a cer t ain dir ect ion at any node in t he DG r epr esent s pr esence of an edge in t he same dir ect ion at all nodes in t he DG.

  • DG cor r esponds t o space r epr esent at ion no t ime

inst ance is assigned t o any comput at ion ⇒ t =0.

  • Syst olic ar chit ect ur es have a space-t ime

r epr esent at ion wher e each node is mapped t o a cer t ain pr ocessing element (PE) and is scheduled at a par t icular t ime inst ance.

  • Syst olic design met hodology maps an N-dimensional DG

t o a lower dimensional syst olic ar chit ect ur e.

  • Mapping of N-dimensional DG t o (N-1) dimensional

syst olic ar r ay is consider ed.

slide-3
SLIDE 3
  • Chap. 7

3

  • Def init ions :

P roj ect ion vect or (also called it erat ion vect or),

      =

2 1

d d d

Two nodes t hat are displaced by d or mult iples of d are execut ed by t he same processor. P rocessor space vect or,

( )

2 1

p p pT =

Any node wit h index I T=(i,j ) would be execut ed by proc- essor;

( )

        = j i p p I pT

2 1

Scheduling vect or, sT = (s1 s2). Any node wit h index I would would be execut ed at t ime, sTI . Hardware Ut ilizat ion Ef f iciency, HUE = 1/ |STd| . This is because t wo t asks execut ed by t he same processor are spaced | STd| t ime unit s apart . P rocessor space vect or and proj ect ion vect or must be

  • rt hogonal t o each ot her ⇒ pTd = 0.
slide-4
SLIDE 4
  • Chap. 7

4

I f A and B are mapped t o t he same processor, t hen t hey cannot be execut ed at t he same t ime, i.e., STI A ≠ STI B, i.e., STd ≠ 0. Edge mapping : I f an edge e exist s in t he space represent at ion or DG, t hen an edge pTe is int roduced in t he syst olic array wit h sTe delays. A DG can be t ransf ormed t o a space-t ime represent at ion by int erpret ing one of t he spat ial dimensions as t emporal

  • dimension. For a 2-D DG, t he general t ransf ormat ion is

described by i’ = t = 0, j ’ = pTI , and t ’ = sTI , i.e.,

                    =           =           t j i s p t j i T t j i ' ' 1 ' ' '

j ’ ⇒ processor axis t ’ ⇒ scheduling t ime inst ance

slide-5
SLIDE 5
  • Chap. 7

5

FI R Filter Design B1(Broadcast I nputs, Move Results,

Weights Stay) dT = (1 0), pT = (0 1), sT = (1 0) Any node wit h index I T = (i , j ) is mapped t o processor pTI =j . is execut ed at t ime sTI =i. Since sTd=1 we have HUE = 1/ | sTd| = 1. Edge mapping : The 3 f undament al edges corresponding t o weight , input , and result can be mapped t o corresponding edges in t he syst olic array as per t he f ollowing t able: 1

  • 1

result (1 –1) 1 i/ p(0 1) 1 wt (1 0) sTe pTe e

slide-6
SLIDE 6
  • Chap. 7

6

Block diagram of B1 design Low-level implement at ion of B1 design

slide-7
SLIDE 7
  • Chap. 7

7

Space-t ime represent at ion of B1 design

slide-8
SLIDE 8
  • Chap. 7

8

Design B2(Broadcast I nputs, Move Weights, Results Stay)

dT = (1 -1), pT = (1 1), sT = (1 0) Any node wit h index I T = (i , j ) is mapped t o processor pTI =i+j . is execut ed at t ime sTI =i. Since sTd=1 we have HUE = 1/ | sTd| = 1. Edge mapping : 1 result (1 –1) 1 i/ p(0 1) 1 1 wt (1 0) sTe pTe e

slide-9
SLIDE 9
  • Chap. 7

9

Block diagram of B2 design Low-level implement at ion of B2 design

slide-10
SLIDE 10
  • Chap. 7

10

  • Applying space t ime t ransf ormat ion we get :

j ’ = pT(i j )T = i + j t ’ = sT(i j )T = i Space-t ime represent at ion of B2 design

slide-11
SLIDE 11
  • Chap. 7

11

Design F(Fan- I n Results, Move I nputs, Weights Stay)

dT = (1 0), pT = (0 1), sT = (1 1) Since sTd=1 we have HUE = 1/ | sTd| = 1. Edge mapping :

  • 1

result (1 –1) 1 1 i/ p(0 1) 1 wt (1 0) sTe pTe e Block diagram of F design

slide-12
SLIDE 12
  • Chap. 7

12

Low-level implement at ion of F design Space-t ime represent at ion of F design

slide-13
SLIDE 13
  • Chap. 7

13

Design R1(Results Stay, I nputs and Weights Move in

Opposite Direction) dT = (1 -1), pT = (1 1), sT = (1 -1) Since sTd=2 we have HUE = 1/ |sTd| = ½ . Edge mapping : 2 result (1 –1) 1

  • 1

i/ p(0 -1) 1 1 wt (1 0) sTe pTe e Block diagram of R

1 design

slide-14
SLIDE 14
  • Chap. 7

14

Low-level implement at ion of R

1 design

Not e : R

1 can be obt ained f rom B2 by 2-slow t ransf ormat ion

and t hen ret iming af t er changing t he direct ion of signal x.

slide-15
SLIDE 15

15

Design R2 and Dual R

2(Results Stay, I nputs and

Weights Move in Same Direction but at Dif f erent Speeds) dT = (1 -1), pT = (1 1), R2 : sT = (2 1); Dual R

2 : sT = (1 2);

Since sTd=1 f or bot h of t hem we have HUE = 1/ |sTd| = 1 f or bot h. Edge mapping : 1 result (-1, 1) 1 result (1, -1) 2 1 i/ p(0,1) 1 1 i/ p(0,1) 1 1 wt (1, 0) 2 1 wt (1, 0) sTe pTe e sTe pTe e

Dual R2

R2 Not e : The result edge in design dual R2has been reversed t o

Guarant ee sTe ≥ 0.

slide-16
SLIDE 16
  • Chap. 7

16

Design W 1 (Weights Stay, I nputs and Results Move in

Opposite Directions) dT = (1 0), pT = (0 1), sT = (2 1) Since sTd=2 f or bot h of t hem we have HUE = 1/ | sTd| = ½ . Edge mapping : 1

  • 1

result (1 –1) 1 1 i/ p(0 -1) 2 wt (1 0) sTe pTe e

slide-17
SLIDE 17
  • Chap. 7

17

Design W 2 and Dual W 2(Weights Stay, I nputs and

Results Move in Same Direction but at Dif f erent Speeds) dT = (1 0), pT = (0 1), W2 : sT = (1 2); Dual W2 : sT = (1 -1); Since sTd=1 f or bot h of t hem we have HUE = 1/ |sTd| = 1 f or bot h. Edge mapping : 2

  • 1

result (1, -1) 1 1 result (1, -1) 1

  • 1

i/ p(0,-1) 2 1 i/ p(0,1) 1 wt (1, 0) 1 wt (1, 0) sTe pTe e sTe pTe e

Dual W2

W2

slide-18
SLIDE 18
  • Chap. 7

18

  • Relat ing Syst olic Designs Using Transf ormat ions :

FI R syst olic archit ect ures obt ained using t he same proj ect ion vect or and processor vect or, but dif f erent scheduling vect ors, can be derived f rom each ot her by using t ransf ormat ions like edge reversal, associat ivit y, slow-down, ret iming and pipelining.

  • Example 1 : R

1 can be obt ained f r om B2 by slow-

down, edge reversal and ret iming.

slide-19
SLIDE 19
  • Chap. 7

19

  • Example 2:

Derivat ion of design F f rom B1 using cut set ret iming

slide-20
SLIDE 20
  • Chap. 7

20

Select ion of sT based on scheduling inequalit ies:

For a dependence relat ion X Y, where I x

T= (ix, j x)T and I y T=

(iy, j y)T are respect ively t he indices of t he nodes X and Y. The scheduling inequalit y f or t his dependence is given by, Sy ≥ Sx + Tx where Tx is t he comput at ion t ime of node X. The scheduling equat ions can be classif ied int o t he f ollowing t wo t ypes : Linear scheduling, where Sx = sT I x = (s1 s2)(ix j x)T Sy = sT I y = (s1 s2)(iy j y)T Af f ine Scheduling, where Sx = sT I x + γx= (s1 s2)(ix j x)T + γx Sx = sT I x + γy = (s1 s2)(ix j x)T + γy So scheduling equat ion f or af f ine scheduling is as f ollows: sT I x + γy ≥ sT I x + γx + Tx

slide-21
SLIDE 21
  • Chap. 7

21

Each edge of a DG leads t o an inequalit y f or select ion of t he scheduling vect ors which consist s of 2 st eps. – Capt ure all f undament al edges. The reduced dependence graph (RDG) is used t o capt ure t he f undament al edges and t he regular it erat ive algorit hm (RI A) descript ion of t he corresponding problem is used t o const ruct RDGs. – Const ruct t he scheduling inequalit ies according t o sT I x + γy ≥ sT I x + γx + Tx

and solve t hem f or f easible sT.

slide-22
SLIDE 22
  • Chap. 7

22

  • RI A Descript ion : The RI A has t wo f orms

⇒ The RI A is in st andard input RI A f orm if t he index of t he input s are t he same f or all equat ions. ⇒ The RI A is in st andard out put RI A f orm if all t he out put indices are t he same.

  • For t he FI R f ilt ering example we have,

W(i+1, j ) = W(i, j ) X(i, j +1) = X(i, j ) Y(i+1, j -1) = Y(i, j ) + W(i+1, j -1)X(i+1, j -1) The FI R f ilt ering problem cannot be expressed in st andard input RI A f orm. Expressing it in st andard out put RI A f orm we get , W(i, j ) = W(i-1, j ) X(i, j ) = X(i, j -1) Y(i, j ) = Y(i-1, j +1) + W(i, j )X(i, j )

slide-23
SLIDE 23
  • Chap. 7

23

  • The reduced DG f or FI R f ilt ering is shown below.

Example : Tmult = 5, Tadd = 2, Tcom = 1 Applying t he scheduling equat ions t o t he f ive edges of t he above f igure we get ; W--> Y : e = (0 0)T , γx - γw ≥ 0 X --> X : e = (0 1)T , s2 + γx - γx ≥ 1 W--> W: e = (1 0)T , s1 + γw - γw ≥ 1 X --> Y : e = (0 0)T , γy - γx ≥ 0 Y --> Y: e = (1 -1)T , s1 - s2 + γy - γy ≥ 5 + 2 + 1 For linear scheduling γx =γy = γw = 0. Solving we get , s1 ≥ 1, s2 ≥ 1 and s1 - s2 ≥ 8.

slide-24
SLIDE 24
  • Chap. 7

24

  • Taking sT = (9 1), d = (1 -1) such t hat sTd ≠ 0 and pT = (1,1)

such t hat pTd = 0 we get HUE = 1/ 8. The edge mapping is as f ollows : 8 result (1 –1) 1 1 i/ p(0 1) 9 1 wt (1 0) sTe pTe e Syst olic archit ect ure f or t he example

slide-25
SLIDE 25
  • Chap. 7

25

Mat rix-Mat rix mult iplicat ion and 2-D Syst olic Array Design C

11 = a11b11 + a12 b21

C

12 = a11b12 + a12 b22

C

21 = a21b11 + a22 b21

C

22 = a21b12 + a22 b22

The it erat ion in st andard out put RI A f orm is as f ollows : a(i,j ,k) = a(i,j -1,k) b(i,j ,k) = b(i-1,j ,k) c(i,j ,k) = c(i,j ,k-1) + a(i,j ,k) b(i,j ,k)

slide-26
SLIDE 26
  • Chap. 7

26

  • Applying scheduling inequalit y wit h

Tmult -add = 1, and Tcom = 0 we get s2 ≥ 0, s1 ≥ 0, s3 ≥ 1, γc - γa ≥ 0 and γc - γb ≥ 0. Take γa =γb = γc = 0 f or linear scheduling.

  • Solut ion 1 :

sT = (1,1,1), dT = (0,0,1), p1 = (1,0,0), p2 = (0,1,0), P

T = (p1 p2)T

slide-27
SLIDE 27
  • Chap. 7

27

  • Solut ion 2 :

sT = (1,1,1), dT = (1,1,-1), p1 = (1,0,1), p2 = (0,1,1), P

T = (p1 p2)T

1 (1, 1) 1 (0, 0) C(0, 0, 1) 1 (1, 0) 1 (1, 0) b(1, 0, 0) 1 (0, 1) 1 (0, 1) a(0, 1, 0) sTe pTe e sTe pTe e

  • Sol. 2
  • Sol. 1

a(0, 1, 0) b(1, 0, 0) C(0, 0, 1)