chapter 7 systolic architecture design
play

Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst - PowerPoint PPT Presentation

Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst olic ar chit ect ur es ar e designed by using linear mapping t echniques on r egular dependence gr aphs (DG). Regular Dependence Gr aph : The pr esence of an edge in a


  1. Chapter 7: Systolic Architecture Design Keshab K. Parhi

  2. • Syst olic ar chit ect ur es ar e designed by using linear mapping t echniques on r egular dependence gr aphs (DG). • Regular Dependence Gr aph : The pr esence of an edge in a cer t ain dir ect ion at any node in t he DG r epr esent s pr esence of an edge in t he same dir ect ion at all nodes in t he DG. DG cor r esponds t o space r epr esent at ion � no t ime • inst ance is assigned t o any comput at ion ⇒ t =0. • Syst olic ar chit ect ur es have a space-t ime r epr esent at ion wher e each node is mapped t o a cer t ain pr ocessing element (PE) and is scheduled at a par t icular t ime inst ance. • Syst olic design met hodology maps an N-dimensional DG t o a lower dimensional syst olic ar chit ect ur e. • Mapping of N-dimensional DG t o (N-1) dimensional syst olic ar r ay is consider ed. Chap. 7 2

  3. • Def init ions :   d = 1   � P d roj ect ion vect or (also called it erat ion vect or),   d 2 Two nodes t hat are displaced by d or mult iples of d are execut ed by t he same processor. p T = ( ) p p � P rocessor space vect or , 1 2 Any node wit h index I T =(i,j ) would be execut ed by proc-   essor; i ( )   = p T I p p   1 2   j � Scheduling vect or, s T = (s 1 s 2 ). Any node wit h index I would would be execut ed at t ime, s T I . � Hardware Ut ilizat ion Ef f iciency, HUE = 1/ |S T d| . This is because t wo t asks execut ed by t he same processor are spaced | S T d| t ime unit s apart . � P rocessor space vect or and proj ect ion vect or must be ort hogonal t o each ot her ⇒ p T d = 0. Chap. 7 3

  4. � I f A and B are mapped t o t he same processor, t hen t hey cannot be execut ed at t he same t ime, i.e., S T I A ≠ S T I B , i.e., S T d ≠ 0. � Edge mapping : I f an edge e exist s in t he space represent at ion or DG, t hen an edge p T e is int roduced in t he syst olic array wit h s T e delays. � A DG can be t ransf ormed t o a space-t ime represent at ion by int erpret ing one of t he spat ial dimensions as t emporal dimension. For a 2-D DG, t he general t ransf ormat ion is described by i’ = t = 0, j ’ = p T I , and t ’ = s T I , i.e.,         i ' i 0 0 1 i         = =         ' ' 0 j T j p j                 ' ' 0 t t s t j ’ ⇒ processor axis t ’ ⇒ scheduling t ime inst ance Chap. 7 4

  5. FI R Filter Design B 1 ( Broadcast I nputs, Move Results, Weights Stay) d T = (1 0), p T = (0 1), s T = (1 0) � Any node wit h index I T = (i , j ) � is mapped t o processor p T I =j . � is execut ed at t ime s T I =i. � Since s T d=1 we have HUE = 1/ | s T d| = 1. � Edge mapping : The 3 f undament al edges corresponding t o weight , input , and result can be mapped t o corresponding edges in t he syst olic array as per t he f ollowing t able: e p T e s T e wt (1 0) 0 1 i/ p(0 1) 1 0 result (1 –1) -1 1 Chap. 7 5

  6. Block diagram of B 1 design Low-level implement at ion of B 1 design Chap. 7 6

  7. Space-t ime represent at ion of B 1 design Chap. 7 7

  8. Design B 2 ( Broadcast I nputs, Move Weights, Results Stay) d T = (1 -1), p T = (1 1), s T = (1 0) � Any node wit h index I T = (i , j ) � is mapped t o processor p T I =i+j . � is execut ed at t ime s T I =i. � Since s T d=1 we have HUE = 1/ | s T d| = 1. � Edge mapping : e p T e s T e wt (1 0) 1 1 i/ p(0 1) 1 0 result (1 –1) 0 1 Chap. 7 8

  9. Block diagram of B 2 design Low-level implement at ion of B 2 design Chap. 7 9

  10. • Applying space t ime t ransf ormat ion we get : j ’ = p T (i j ) T = i + j t ’ = s T (i j ) T = i Space-t ime represent at ion of B 2 design Chap. 7 10

  11. Design F(Fan- I n Results , Move I nputs, Weights Stay) d T = (1 0), p T = (0 1), s T = (1 1) � Since s T d=1 we have HUE = 1/ | s T d| = 1. � Edge mapping : e p T e s T e wt (1 0) 0 1 i/ p(0 1) 1 1 result (1 –1) -1 0 Block diagram of F design Chap. 7 11

  12. Low-level implement at ion of F design Space-t ime represent at ion of F design Chap. 7 12

  13. Design R 1 (Results Stay , I nputs and Weights Move in Opposite Direction) d T = (1 -1), p T = (1 1), s T = (1 -1) � Since s T d=2 we have HUE = 1/ |s T d| = ½ . � Edge mapping : e p T e s T e wt (1 0) 1 1 i/ p(0 -1) -1 1 result (1 –1) 0 2 Block diagram of R 1 design Chap. 7 13

  14. Low-level implement at ion of R 1 design Not e : R 1 can be obt ained f rom B 2 by 2-slow t ransf ormat ion and t hen ret iming af t er changing t he direct ion of signal x. Chap. 7 14

  15. Design R 2 and Dual R 2 (Results Stay , I nputs and Weights Move in Same Direction but at Dif f erent Speeds) d T = (1 -1), p T = (1 1), R 2 : s T = (2 1); Dual R 2 : s T = (1 2); � Since s T d=1 f or bot h of t hem we have HUE = 1/ |s T d| = 1 f or bot h. � Edge mapping : R 2 Dual R 2 e p T e s T e e p T e s T e wt (1, 0) 1 2 wt (1, 0) 1 1 i/ p(0,1) 1 1 i/ p(0,1) 1 2 result (1, -1) 0 1 result (-1, 1) 0 1 Not e : The result edge in design dual R 2 has been reversed t o Guarant ee s T e ≥ 0. 15

  16. Design W 1 (Weights Stay , I nputs and Results Move in Opposite Directions) d T = (1 0), p T = (0 1), s T = (2 1) � Since s T d=2 f or bot h of t hem we have HUE = 1/ | s T d| = ½ . � Edge mapping : e p T e s T e wt (1 0) 0 2 i/ p(0 -1) 1 1 result (1 –1) -1 1 Chap. 7 16

  17. Design W 2 and Dual W 2 (Weights Stay , I nputs and Results Move in Same Direction but at Dif f erent Speeds) d T = (1 0), p T = (0 1), W 2 : s T = (1 2); Dual W 2 : s T = (1 -1); � Since s T d=1 f or bot h of t hem we have HUE = 1/ |s T d| = 1 f or bot h. � Edge mapping : W 2 Dual W 2 e p T e s T e e p T e s T e wt (1, 0) 0 1 wt (1, 0) 0 1 i/ p(0,1) 1 2 i/ p(0,-1) -1 1 result (1, -1) 1 1 result (1, -1) -1 2 Chap. 7 17

  18. • Relat ing Syst olic Designs Using Transf ormat ions : � FI R syst olic archit ect ures obt ained using t he same proj ect ion vect or and processor vect or, but dif f erent scheduling vect ors, can be derived f rom each ot her by using t ransf ormat ions like edge reversal, associat ivit y, slow-down, ret iming and pipelining . • Example 1 : R 1 can be obt ained f r om B 2 by slow- down, edge reversal and ret iming. Chap. 7 18

  19. • Example 2: Derivat ion of design F f rom B 1 using cut set ret iming Chap. 7 19

  20. � Select ion of s T based on scheduling inequalit ies: For a dependence relat ion X � Y, where I x T = (i x , j x ) T and I y T = (i y , j y ) T are respect ively t he indices of t he nodes X and Y. The scheduling inequalit y f or t his dependence is given by, S y ≥ S x + T x where T x is t he comput at ion t ime of node X. The scheduling equat ions can be classif ied int o t he f ollowing t wo t ypes : � Linear scheduling , where S x = s T I x = (s 1 s 2 )(i x j x ) T S y = s T I y = (s 1 s 2 )(i y j y ) T � Af f ine Scheduling, where S x = s T I x + γ x = (s 1 s 2 )(i x j x ) T + γ x S x = s T I x + γ y = (s 1 s 2 )(i x j x ) T + γ y So scheduling equat ion f or af f ine scheduling is as f ollows: s T I x + γ y ≥ s T I x + γ x + T x Chap. 7 20

  21. Each edge of a DG leads t o an inequalit y f or select ion of t he scheduling vect ors which consist s of 2 st eps. – Capt ure all f undament al edges. The reduced dependence graph (RDG) is used t o capt ure t he f undament al edges and t he regular it erat ive algorit hm (RI A) descript ion of t he corresponding problem is used t o const ruct RDGs. – Const ruct t he scheduling inequalit ies according t o s T I x + γ y ≥ s T I x + γ x + T x and solve t hem f or f easible s T . Chap. 7 21

  22. • RI A Descript ion : The RI A has t wo f orms ⇒ The RI A is in st andard input RI A f orm if t he index of t he input s are t he same f or all equat ions. ⇒ The RI A is in st andard out put RI A f orm if all t he out put indices are t he same. • For t he FI R f ilt ering example we have, W(i+1, j ) = W(i, j ) X(i, j +1) = X(i, j ) Y(i+1, j -1) = Y(i, j ) + W(i+1, j -1)X(i+1, j -1) The FI R f ilt ering problem cannot be expressed in st andard input RI A f orm. Expressing it in st andard out put RI A f orm we get , W(i, j ) = W(i-1, j ) X(i, j ) = X(i, j -1) Y(i, j ) = Y(i-1, j +1) + W(i, j )X(i, j ) Chap. 7 22

  23. • The reduced DG f or FI R f ilt ering is shown below. Example : T mult = 5, T add = 2, T com = 1 Applying t he scheduling equat ions t o t he f ive edges of t he above f igure we get ; Y : e = (0 0) T , γ x - γ w ≥ 0 W--> X : e = (0 1) T , s 2 + γ x - γ x ≥ 1 X --> W: e = (1 0) T , s 1 + γ w - γ w ≥ 1 W--> Y : e = (0 0) T , γ y - γ x ≥ 0 X --> Y: e = (1 -1) T , s 1 - s 2 + γ y - γ y ≥ 5 + 2 + 1 Y --> For linear scheduling γ x = γ y = γ w = 0. Solving we get , s 1 ≥ 1, s 2 ≥ 1 and s 1 - s 2 ≥ 8. Chap. 7 23

  24. Taking s T = (9 1), d = (1 -1) such t hat s T d ≠ 0 and p T = (1,1) • such t hat p T d = 0 we get HUE = 1/ 8. The edge mapping is as f ollows : e p T e s T e wt (1 0) 1 9 i/ p(0 1) 1 1 result (1 –1) 0 8 Syst olic archit ect ure f or t he example Chap. 7 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend