Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst - PowerPoint PPT Presentation

Chapter 7: Systolic Architecture Design Keshab K. Parhi

• Syst olic ar chit ect ur es ar e designed by using linear mapping t echniques on r egular dependence gr aphs (DG). • Regular Dependence Gr aph : The pr esence of an edge in a cer t ain dir ect ion at any node in t he DG r epr esent s pr esence of an edge in t he same dir ect ion at all nodes in t he DG. DG cor r esponds t o space r epr esent at ion � no t ime • inst ance is assigned t o any comput at ion ⇒ t =0. • Syst olic ar chit ect ur es have a space-t ime r epr esent at ion wher e each node is mapped t o a cer t ain pr ocessing element (PE) and is scheduled at a par t icular t ime inst ance. • Syst olic design met hodology maps an N-dimensional DG t o a lower dimensional syst olic ar chit ect ur e. • Mapping of N-dimensional DG t o (N-1) dimensional syst olic ar r ay is consider ed. Chap. 7 2

• Def init ions :   d = 1   � P d roj ect ion vect or (also called it erat ion vect or),   d 2 Two nodes t hat are displaced by d or mult iples of d are execut ed by t he same processor. p T = ( ) p p � P rocessor space vect or , 1 2 Any node wit h index I T =(i,j ) would be execut ed by proc-   essor; i ( )   = p T I p p   1 2   j � Scheduling vect or, s T = (s 1 s 2 ). Any node wit h index I would would be execut ed at t ime, s T I . � Hardware Ut ilizat ion Ef f iciency, HUE = 1/ |S T d| . This is because t wo t asks execut ed by t he same processor are spaced | S T d| t ime unit s apart . � P rocessor space vect or and proj ect ion vect or must be ort hogonal t o each ot her ⇒ p T d = 0. Chap. 7 3

� I f A and B are mapped t o t he same processor, t hen t hey cannot be execut ed at t he same t ime, i.e., S T I A ≠ S T I B , i.e., S T d ≠ 0. � Edge mapping : I f an edge e exist s in t he space represent at ion or DG, t hen an edge p T e is int roduced in t he syst olic array wit h s T e delays. � A DG can be t ransf ormed t o a space-t ime represent at ion by int erpret ing one of t he spat ial dimensions as t emporal dimension. For a 2-D DG, t he general t ransf ormat ion is described by i’ = t = 0, j ’ = p T I , and t ’ = s T I , i.e.,         i ' i 0 0 1 i         = =         ' ' 0 j T j p j                 ' ' 0 t t s t j ’ ⇒ processor axis t ’ ⇒ scheduling t ime inst ance Chap. 7 4

FI R Filter Design B 1 ( Broadcast I nputs, Move Results, Weights Stay) d T = (1 0), p T = (0 1), s T = (1 0) � Any node wit h index I T = (i , j ) � is mapped t o processor p T I =j . � is execut ed at t ime s T I =i. � Since s T d=1 we have HUE = 1/ | s T d| = 1. � Edge mapping : The 3 f undament al edges corresponding t o weight , input , and result can be mapped t o corresponding edges in t he syst olic array as per t he f ollowing t able: e p T e s T e wt (1 0) 0 1 i/ p(0 1) 1 0 result (1 –1) -1 1 Chap. 7 5

Block diagram of B 1 design Low-level implement at ion of B 1 design Chap. 7 6

Space-t ime represent at ion of B 1 design Chap. 7 7

Design B 2 ( Broadcast I nputs, Move Weights, Results Stay) d T = (1 -1), p T = (1 1), s T = (1 0) � Any node wit h index I T = (i , j ) � is mapped t o processor p T I =i+j . � is execut ed at t ime s T I =i. � Since s T d=1 we have HUE = 1/ | s T d| = 1. � Edge mapping : e p T e s T e wt (1 0) 1 1 i/ p(0 1) 1 0 result (1 –1) 0 1 Chap. 7 8

Block diagram of B 2 design Low-level implement at ion of B 2 design Chap. 7 9

• Applying space t ime t ransf ormat ion we get : j ’ = p T (i j ) T = i + j t ’ = s T (i j ) T = i Space-t ime represent at ion of B 2 design Chap. 7 10

Design F(Fan- I n Results , Move I nputs, Weights Stay) d T = (1 0), p T = (0 1), s T = (1 1) � Since s T d=1 we have HUE = 1/ | s T d| = 1. � Edge mapping : e p T e s T e wt (1 0) 0 1 i/ p(0 1) 1 1 result (1 –1) -1 0 Block diagram of F design Chap. 7 11

Low-level implement at ion of F design Space-t ime represent at ion of F design Chap. 7 12

Design R 1 (Results Stay , I nputs and Weights Move in Opposite Direction) d T = (1 -1), p T = (1 1), s T = (1 -1) � Since s T d=2 we have HUE = 1/ |s T d| = ½ . � Edge mapping : e p T e s T e wt (1 0) 1 1 i/ p(0 -1) -1 1 result (1 –1) 0 2 Block diagram of R 1 design Chap. 7 13

Low-level implement at ion of R 1 design Not e : R 1 can be obt ained f rom B 2 by 2-slow t ransf ormat ion and t hen ret iming af t er changing t he direct ion of signal x. Chap. 7 14

Design R 2 and Dual R 2 (Results Stay , I nputs and Weights Move in Same Direction but at Dif f erent Speeds) d T = (1 -1), p T = (1 1), R 2 : s T = (2 1); Dual R 2 : s T = (1 2); � Since s T d=1 f or bot h of t hem we have HUE = 1/ |s T d| = 1 f or bot h. � Edge mapping : R 2 Dual R 2 e p T e s T e e p T e s T e wt (1, 0) 1 2 wt (1, 0) 1 1 i/ p(0,1) 1 1 i/ p(0,1) 1 2 result (1, -1) 0 1 result (-1, 1) 0 1 Not e : The result edge in design dual R 2 has been reversed t o Guarant ee s T e ≥ 0. 15

Design W 1 (Weights Stay , I nputs and Results Move in Opposite Directions) d T = (1 0), p T = (0 1), s T = (2 1) � Since s T d=2 f or bot h of t hem we have HUE = 1/ | s T d| = ½ . � Edge mapping : e p T e s T e wt (1 0) 0 2 i/ p(0 -1) 1 1 result (1 –1) -1 1 Chap. 7 16

Design W 2 and Dual W 2 (Weights Stay , I nputs and Results Move in Same Direction but at Dif f erent Speeds) d T = (1 0), p T = (0 1), W 2 : s T = (1 2); Dual W 2 : s T = (1 -1); � Since s T d=1 f or bot h of t hem we have HUE = 1/ |s T d| = 1 f or bot h. � Edge mapping : W 2 Dual W 2 e p T e s T e e p T e s T e wt (1, 0) 0 1 wt (1, 0) 0 1 i/ p(0,1) 1 2 i/ p(0,-1) -1 1 result (1, -1) 1 1 result (1, -1) -1 2 Chap. 7 17

• Relat ing Syst olic Designs Using Transf ormat ions : � FI R syst olic archit ect ures obt ained using t he same proj ect ion vect or and processor vect or, but dif f erent scheduling vect ors, can be derived f rom each ot her by using t ransf ormat ions like edge reversal, associat ivit y, slow-down, ret iming and pipelining . • Example 1 : R 1 can be obt ained f r om B 2 by slow- down, edge reversal and ret iming. Chap. 7 18

• Example 2: Derivat ion of design F f rom B 1 using cut set ret iming Chap. 7 19

� Select ion of s T based on scheduling inequalit ies: For a dependence relat ion X � Y, where I x T = (i x , j x ) T and I y T = (i y , j y ) T are respect ively t he indices of t he nodes X and Y. The scheduling inequalit y f or t his dependence is given by, S y ≥ S x + T x where T x is t he comput at ion t ime of node X. The scheduling equat ions can be classif ied int o t he f ollowing t wo t ypes : � Linear scheduling , where S x = s T I x = (s 1 s 2 )(i x j x ) T S y = s T I y = (s 1 s 2 )(i y j y ) T � Af f ine Scheduling, where S x = s T I x + γ x = (s 1 s 2 )(i x j x ) T + γ x S x = s T I x + γ y = (s 1 s 2 )(i x j x ) T + γ y So scheduling equat ion f or af f ine scheduling is as f ollows: s T I x + γ y ≥ s T I x + γ x + T x Chap. 7 20

Each edge of a DG leads t o an inequalit y f or select ion of t he scheduling vect ors which consist s of 2 st eps. – Capt ure all f undament al edges. The reduced dependence graph (RDG) is used t o capt ure t he f undament al edges and t he regular it erat ive algorit hm (RI A) descript ion of t he corresponding problem is used t o const ruct RDGs. – Const ruct t he scheduling inequalit ies according t o s T I x + γ y ≥ s T I x + γ x + T x and solve t hem f or f easible s T . Chap. 7 21

• RI A Descript ion : The RI A has t wo f orms ⇒ The RI A is in st andard input RI A f orm if t he index of t he input s are t he same f or all equat ions. ⇒ The RI A is in st andard out put RI A f orm if all t he out put indices are t he same. • For t he FI R f ilt ering example we have, W(i+1, j ) = W(i, j ) X(i, j +1) = X(i, j ) Y(i+1, j -1) = Y(i, j ) + W(i+1, j -1)X(i+1, j -1) The FI R f ilt ering problem cannot be expressed in st andard input RI A f orm. Expressing it in st andard out put RI A f orm we get , W(i, j ) = W(i-1, j ) X(i, j ) = X(i, j -1) Y(i, j ) = Y(i-1, j +1) + W(i, j )X(i, j ) Chap. 7 22

• The reduced DG f or FI R f ilt ering is shown below. Example : T mult = 5, T add = 2, T com = 1 Applying t he scheduling equat ions t o t he f ive edges of t he above f igure we get ; Y : e = (0 0) T , γ x - γ w ≥ 0 W--> X : e = (0 1) T , s 2 + γ x - γ x ≥ 1 X --> W: e = (1 0) T , s 1 + γ w - γ w ≥ 1 W--> Y : e = (0 0) T , γ y - γ x ≥ 0 X --> Y: e = (1 -1) T , s 1 - s 2 + γ y - γ y ≥ 5 + 2 + 1 Y --> For linear scheduling γ x = γ y = γ w = 0. Solving we get , s 1 ≥ 1, s 2 ≥ 1 and s 1 - s 2 ≥ 8. Chap. 7 23

Taking s T = (9 1), d = (1 -1) such t hat s T d ≠ 0 and p T = (1,1) • such t hat p T d = 0 we get HUE = 1/ 8. The edge mapping is as f ollows : e p T e s T e wt (1 0) 1 9 i/ p(0 1) 1 1 result (1 –1) 0 8 Syst olic archit ect ure f or t he example Chap. 7 24

Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst - PowerPoint PPT Presentation

Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst olic ar chit ect ur es ar e designed by using linear mapping t echniques on r egular dependence gr aphs (DG). Regular Dependence Gr aph : The pr esence of an edge in a

VLSI programming Systolic Design Book Parhi, Chp. 7 Rudolf Mak r.h.mak@tue.nl 18-May-16

Contemporary Management of Diabetic Diabetes Cardiomyopathy Systolic Heart Failure Obesity

Cross- -sectional Association of Job Strain and Systolic sectional Association of Job Strain and

On the explicit systolic inequality from the cup-product Hoil Ryu Graduate School of

Dont Use a Single Large Systolic Array, Use Many Small Ones Instead H. T. Kung Harvard

A Systolic FFT Architecture for Real Time FPGA Systems Preston Jackson, Cy Chan, Charles Rader,

A Hybrid Systolic-Dataflow Architecture for In Inductive Matrix Alg lgorithms Jian Weng, Sihao

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 Inheritance Concepts

With Left Ventricular Systolic Dysfunction Undergoing Cardiac Surgery With Cardiopulmonary Bypass

Proposing a Fast and Scalable Systolic Array for Matrix Multiplication Bahar Asgari , , Ra

The "Wrap" Systolic Pipe for Sliding Window Compression input: a stream of

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array

Discrete Systolic Inequalities and Decompositions of Triangulated Surfaces ric Colin de

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Architecture: Culture and Space Architecture: Culture and Space Architecture: Culture and Space

Perceptual Color Spaces John C. Hart CS 418 Interactive Computer Graphics Selecting Colors H

Enumerations http://cs.mst.edu Your First Enums // syntax enum enum_name {enumerator1,

Introduction to Seaborn IN TRODUCTION TO S EABORN Erin Case Data Scientist What is Seaborn?

Topography T. Perron 12.001 We ll s pe nd a lar ge

Greater Value Portfolio 2019 Overview of Grant Program & Application Process Recorded

Communication System Kai Zhang, Chenshu Wu^, Chaofan Yang, Yi Zhao, Kehong Huang, Chunyi Peng

CSSE463: Image Recognition Day 2 Roll call Announcements: Reinstall Matlab if you are

Dr. Hoang Huu Hanh, OST - Hue University hanh-at-hueuni.edu.vn Clarification: Cl ifi ti

Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst - PowerPoint PPT Presentation

Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst olic ar chit ect ur es ar e designed by using linear mapping t echniques on r egular dependence gr aphs (DG). Regular Dependence Gr aph : The pr esence of an edge in a

VLSI programming Systolic Design Book Parhi, Chp. 7 Rudolf Mak r.h.mak@tue.nl 18-May-16

Contemporary Management of Diabetic Diabetes Cardiomyopathy Systolic Heart Failure Obesity

Cross- -sectional Association of Job Strain and Systolic sectional Association of Job Strain and

On the explicit systolic inequality from the cup-product Hoil Ryu Graduate School of

Dont Use a Single Large Systolic Array, Use Many Small Ones Instead H. T. Kung Harvard

A Systolic FFT Architecture for Real Time FPGA Systems Preston Jackson, Cy Chan, Charles Rader,

A Hybrid Systolic-Dataflow Architecture for In Inductive Matrix Alg lgorithms Jian Weng, Sihao

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 Inheritance Concepts

With Left Ventricular Systolic Dysfunction Undergoing Cardiac Surgery With Cardiopulmonary Bypass

Proposing a Fast and Scalable Systolic Array for Matrix Multiplication Bahar Asgari , , Ra

The &quot;Wrap&quot; Systolic Pipe for Sliding Window Compression input: a stream of

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array

Discrete Systolic Inequalities and Decompositions of Triangulated Surfaces ric Colin de

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Architecture: Culture and Space Architecture: Culture and Space Architecture: Culture and Space

Perceptual Color Spaces John C. Hart CS 418 Interactive Computer Graphics Selecting Colors H

Enumerations http://cs.mst.edu Your First Enums // syntax enum enum_name {enumerator1,

Introduction to Seaborn IN TRODUCTION TO S EABORN Erin Case Data Scientist What is Seaborn?

Topography T. Perron 12.001 We ll s pe nd a lar ge

Greater Value Portfolio 2019 Overview of Grant Program &amp; Application Process Recorded

Communication System Kai Zhang*, Chenshu Wu^, Chaofan Yang*, Yi Zhao*, Kehong Huang*, Chunyi Peng

CSSE463: Image Recognition Day 2 Roll call Announcements: Reinstall Matlab if you are

Dr. Hoang Huu Hanh, OST - Hue University hanh-at-hueuni.edu.vn Clarification: Cl ifi ti

The "Wrap" Systolic Pipe for Sliding Window Compression input: a stream of

Greater Value Portfolio 2019 Overview of Grant Program & Application Process Recorded

Communication System Kai Zhang, Chenshu Wu^, Chaofan Yang, Yi Zhao, Kehong Huang, Chunyi Peng