Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst - - PowerPoint PPT Presentation
Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst - - PowerPoint PPT Presentation
Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst olic ar chit ect ur es ar e designed by using linear mapping t echniques on r egular dependence gr aphs (DG). Regular Dependence Gr aph : The pr esence of an edge in a
- Chap. 7
2
- Syst olic ar chit ect ur es ar e designed by using linear
mapping t echniques on r egular dependence gr aphs (DG).
- Regular Dependence Gr aph : The pr esence of an edge in
a cer t ain dir ect ion at any node in t he DG r epr esent s pr esence of an edge in t he same dir ect ion at all nodes in t he DG.
- DG cor r esponds t o space r epr esent at ion no t ime
inst ance is assigned t o any comput at ion ⇒ t =0.
- Syst olic ar chit ect ur es have a space-t ime
r epr esent at ion wher e each node is mapped t o a cer t ain pr ocessing element (PE) and is scheduled at a par t icular t ime inst ance.
- Syst olic design met hodology maps an N-dimensional DG
t o a lower dimensional syst olic ar chit ect ur e.
- Mapping of N-dimensional DG t o (N-1) dimensional
syst olic ar r ay is consider ed.
- Chap. 7
3
- Def init ions :
P roj ect ion vect or (also called it erat ion vect or),
=
2 1
d d d
Two nodes t hat are displaced by d or mult iples of d are execut ed by t he same processor. P rocessor space vect or,
( )
2 1
p p pT =
Any node wit h index I T=(i,j ) would be execut ed by proc- essor;
( )
= j i p p I pT
2 1
Scheduling vect or, sT = (s1 s2). Any node wit h index I would would be execut ed at t ime, sTI . Hardware Ut ilizat ion Ef f iciency, HUE = 1/ |STd| . This is because t wo t asks execut ed by t he same processor are spaced | STd| t ime unit s apart . P rocessor space vect or and proj ect ion vect or must be
- rt hogonal t o each ot her ⇒ pTd = 0.
- Chap. 7
4
I f A and B are mapped t o t he same processor, t hen t hey cannot be execut ed at t he same t ime, i.e., STI A ≠ STI B, i.e., STd ≠ 0. Edge mapping : I f an edge e exist s in t he space represent at ion or DG, t hen an edge pTe is int roduced in t he syst olic array wit h sTe delays. A DG can be t ransf ormed t o a space-t ime represent at ion by int erpret ing one of t he spat ial dimensions as t emporal
- dimension. For a 2-D DG, t he general t ransf ormat ion is
described by i’ = t = 0, j ’ = pTI , and t ’ = sTI , i.e.,
= = t j i s p t j i T t j i ' ' 1 ' ' '
j ’ ⇒ processor axis t ’ ⇒ scheduling t ime inst ance
- Chap. 7
5
FI R Filter Design B1(Broadcast I nputs, Move Results,
Weights Stay) dT = (1 0), pT = (0 1), sT = (1 0) Any node wit h index I T = (i , j ) is mapped t o processor pTI =j . is execut ed at t ime sTI =i. Since sTd=1 we have HUE = 1/ | sTd| = 1. Edge mapping : The 3 f undament al edges corresponding t o weight , input , and result can be mapped t o corresponding edges in t he syst olic array as per t he f ollowing t able: 1
- 1
result (1 –1) 1 i/ p(0 1) 1 wt (1 0) sTe pTe e
- Chap. 7
6
Block diagram of B1 design Low-level implement at ion of B1 design
- Chap. 7
7
Space-t ime represent at ion of B1 design
- Chap. 7
8
Design B2(Broadcast I nputs, Move Weights, Results Stay)
dT = (1 -1), pT = (1 1), sT = (1 0) Any node wit h index I T = (i , j ) is mapped t o processor pTI =i+j . is execut ed at t ime sTI =i. Since sTd=1 we have HUE = 1/ | sTd| = 1. Edge mapping : 1 result (1 –1) 1 i/ p(0 1) 1 1 wt (1 0) sTe pTe e
- Chap. 7
9
Block diagram of B2 design Low-level implement at ion of B2 design
- Chap. 7
10
- Applying space t ime t ransf ormat ion we get :
j ’ = pT(i j )T = i + j t ’ = sT(i j )T = i Space-t ime represent at ion of B2 design
- Chap. 7
11
Design F(Fan- I n Results, Move I nputs, Weights Stay)
dT = (1 0), pT = (0 1), sT = (1 1) Since sTd=1 we have HUE = 1/ | sTd| = 1. Edge mapping :
- 1
result (1 –1) 1 1 i/ p(0 1) 1 wt (1 0) sTe pTe e Block diagram of F design
- Chap. 7
12
Low-level implement at ion of F design Space-t ime represent at ion of F design
- Chap. 7
13
Design R1(Results Stay, I nputs and Weights Move in
Opposite Direction) dT = (1 -1), pT = (1 1), sT = (1 -1) Since sTd=2 we have HUE = 1/ |sTd| = ½ . Edge mapping : 2 result (1 –1) 1
- 1
i/ p(0 -1) 1 1 wt (1 0) sTe pTe e Block diagram of R
1 design
- Chap. 7
14
Low-level implement at ion of R
1 design
Not e : R
1 can be obt ained f rom B2 by 2-slow t ransf ormat ion
and t hen ret iming af t er changing t he direct ion of signal x.
15
Design R2 and Dual R
2(Results Stay, I nputs and
Weights Move in Same Direction but at Dif f erent Speeds) dT = (1 -1), pT = (1 1), R2 : sT = (2 1); Dual R
2 : sT = (1 2);
Since sTd=1 f or bot h of t hem we have HUE = 1/ |sTd| = 1 f or bot h. Edge mapping : 1 result (-1, 1) 1 result (1, -1) 2 1 i/ p(0,1) 1 1 i/ p(0,1) 1 1 wt (1, 0) 2 1 wt (1, 0) sTe pTe e sTe pTe e
Dual R2
R2 Not e : The result edge in design dual R2has been reversed t o
Guarant ee sTe ≥ 0.
- Chap. 7
16
Design W 1 (Weights Stay, I nputs and Results Move in
Opposite Directions) dT = (1 0), pT = (0 1), sT = (2 1) Since sTd=2 f or bot h of t hem we have HUE = 1/ | sTd| = ½ . Edge mapping : 1
- 1
result (1 –1) 1 1 i/ p(0 -1) 2 wt (1 0) sTe pTe e
- Chap. 7
17
Design W 2 and Dual W 2(Weights Stay, I nputs and
Results Move in Same Direction but at Dif f erent Speeds) dT = (1 0), pT = (0 1), W2 : sT = (1 2); Dual W2 : sT = (1 -1); Since sTd=1 f or bot h of t hem we have HUE = 1/ |sTd| = 1 f or bot h. Edge mapping : 2
- 1
result (1, -1) 1 1 result (1, -1) 1
- 1
i/ p(0,-1) 2 1 i/ p(0,1) 1 wt (1, 0) 1 wt (1, 0) sTe pTe e sTe pTe e
Dual W2
W2
- Chap. 7
18
- Relat ing Syst olic Designs Using Transf ormat ions :
FI R syst olic archit ect ures obt ained using t he same proj ect ion vect or and processor vect or, but dif f erent scheduling vect ors, can be derived f rom each ot her by using t ransf ormat ions like edge reversal, associat ivit y, slow-down, ret iming and pipelining.
- Example 1 : R
1 can be obt ained f r om B2 by slow-
down, edge reversal and ret iming.
- Chap. 7
19
- Example 2:
Derivat ion of design F f rom B1 using cut set ret iming
- Chap. 7
20
Select ion of sT based on scheduling inequalit ies:
For a dependence relat ion X Y, where I x
T= (ix, j x)T and I y T=
(iy, j y)T are respect ively t he indices of t he nodes X and Y. The scheduling inequalit y f or t his dependence is given by, Sy ≥ Sx + Tx where Tx is t he comput at ion t ime of node X. The scheduling equat ions can be classif ied int o t he f ollowing t wo t ypes : Linear scheduling, where Sx = sT I x = (s1 s2)(ix j x)T Sy = sT I y = (s1 s2)(iy j y)T Af f ine Scheduling, where Sx = sT I x + γx= (s1 s2)(ix j x)T + γx Sx = sT I x + γy = (s1 s2)(ix j x)T + γy So scheduling equat ion f or af f ine scheduling is as f ollows: sT I x + γy ≥ sT I x + γx + Tx
- Chap. 7
21
Each edge of a DG leads t o an inequalit y f or select ion of t he scheduling vect ors which consist s of 2 st eps. – Capt ure all f undament al edges. The reduced dependence graph (RDG) is used t o capt ure t he f undament al edges and t he regular it erat ive algorit hm (RI A) descript ion of t he corresponding problem is used t o const ruct RDGs. – Const ruct t he scheduling inequalit ies according t o sT I x + γy ≥ sT I x + γx + Tx
and solve t hem f or f easible sT.
- Chap. 7
22
- RI A Descript ion : The RI A has t wo f orms
⇒ The RI A is in st andard input RI A f orm if t he index of t he input s are t he same f or all equat ions. ⇒ The RI A is in st andard out put RI A f orm if all t he out put indices are t he same.
- For t he FI R f ilt ering example we have,
W(i+1, j ) = W(i, j ) X(i, j +1) = X(i, j ) Y(i+1, j -1) = Y(i, j ) + W(i+1, j -1)X(i+1, j -1) The FI R f ilt ering problem cannot be expressed in st andard input RI A f orm. Expressing it in st andard out put RI A f orm we get , W(i, j ) = W(i-1, j ) X(i, j ) = X(i, j -1) Y(i, j ) = Y(i-1, j +1) + W(i, j )X(i, j )
- Chap. 7
23
- The reduced DG f or FI R f ilt ering is shown below.
Example : Tmult = 5, Tadd = 2, Tcom = 1 Applying t he scheduling equat ions t o t he f ive edges of t he above f igure we get ; W--> Y : e = (0 0)T , γx - γw ≥ 0 X --> X : e = (0 1)T , s2 + γx - γx ≥ 1 W--> W: e = (1 0)T , s1 + γw - γw ≥ 1 X --> Y : e = (0 0)T , γy - γx ≥ 0 Y --> Y: e = (1 -1)T , s1 - s2 + γy - γy ≥ 5 + 2 + 1 For linear scheduling γx =γy = γw = 0. Solving we get , s1 ≥ 1, s2 ≥ 1 and s1 - s2 ≥ 8.
- Chap. 7
24
- Taking sT = (9 1), d = (1 -1) such t hat sTd ≠ 0 and pT = (1,1)
such t hat pTd = 0 we get HUE = 1/ 8. The edge mapping is as f ollows : 8 result (1 –1) 1 1 i/ p(0 1) 9 1 wt (1 0) sTe pTe e Syst olic archit ect ure f or t he example
- Chap. 7
25
Mat rix-Mat rix mult iplicat ion and 2-D Syst olic Array Design C
11 = a11b11 + a12 b21
C
12 = a11b12 + a12 b22
C
21 = a21b11 + a22 b21
C
22 = a21b12 + a22 b22
The it erat ion in st andard out put RI A f orm is as f ollows : a(i,j ,k) = a(i,j -1,k) b(i,j ,k) = b(i-1,j ,k) c(i,j ,k) = c(i,j ,k-1) + a(i,j ,k) b(i,j ,k)
- Chap. 7
26
- Applying scheduling inequalit y wit h
Tmult -add = 1, and Tcom = 0 we get s2 ≥ 0, s1 ≥ 0, s3 ≥ 1, γc - γa ≥ 0 and γc - γb ≥ 0. Take γa =γb = γc = 0 f or linear scheduling.
- Solut ion 1 :
sT = (1,1,1), dT = (0,0,1), p1 = (1,0,0), p2 = (0,1,0), P
T = (p1 p2)T
- Chap. 7
27
- Solut ion 2 :
sT = (1,1,1), dT = (1,1,-1), p1 = (1,0,1), p2 = (0,1,1), P
T = (p1 p2)T
1 (1, 1) 1 (0, 0) C(0, 0, 1) 1 (1, 0) 1 (1, 0) b(1, 0, 0) 1 (0, 1) 1 (0, 1) a(0, 1, 0) sTe pTe e sTe pTe e
- Sol. 2
- Sol. 1
a(0, 1, 0) b(1, 0, 0) C(0, 0, 1)