VLSI programming Systolic Design Book Parhi, Chp. 7 Rudolf Mak - PowerPoint PPT Presentation

VLSI programming Systolic Design Book Parhi, Chp. 7 Rudolf Mak r.h.mak@tue.nl 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1

Agenda • Systolic arrays (what, where) • Regular Iterative Algorithms (RIAs) • Dependence graphs (regular, reduced) • Systolic design techniques – Binding (computations to PEs) – Scheduling (computations to time slots) • Examples – Fir filters, matrix multipliers 18-May-16 Rudolf Mak TU/e Computer Science Systolic 2

FSM reminder Moore machine Mealy machine CL CL state state Chaining Mealy machines may lead too long critical paths! 18-May-16 Rudolf Mak TU/e Computer Science Systolic 3

Systolic system (Leiserson) A systolic system is a set of interconnected Moore machines that operate synchronously and satisfy certain smallness (boundedness) conditions : 1. # states is bounded 2. # input ports is bounded 3. # output ports is bounded 4. # neighbor machines is bounded “#” stands for “number of” 18-May-16 Rudolf Mak TU/e Computer Science Systolic 4

Systolic = Uniform Pipelined SDF • Uniform: – Each PE (Moore machine) computes the same set of combinatorial functions. • Regular: – All PEs are connected to a small finite number of neighboring PEs via one or more D-elements according to a regular topology. All connections are point-to-point connections. • Synchronous operation: – All PEs operate in lock step (fire concurrently) ; data is pumped through the system, much like the hart pumps blood through the body (hence the name systolic). 18-May-16 Rudolf Mak TU/e Computer Science Systolic 5

Relaxations • To obtain better systems small relaxations to the systolic model are allowed: 1. Not all PEs are identical, small deviations are allowed especially for PEs at the border of the system. 2. (A limited form) of broadcasting is allowed. This means that PEs have become Mealy machines. 1. These systems are called semi-systolic by Leiserson. 2. Parhi does not make the distinction. Instead he uses the notion fully pipelined for the Moore machine variant. 3. Connections need not be to nearest neighbors, but locality needs to be maintained. 18-May-16 Rudolf Mak TU/e Computer Science Systolic 6

Such as a Systolic system Power PC on a FPGA Host Turing-equivalent machine PE PE PE PE PE Systolic array: Moore machines Such as a dedicated computing engine on a FPGA 18-May-16 Rudolf Mak TU/e Computer Science Systolic 7

Application areas • Computationally intensive, regular – Basic linear algebra operations – Signal processing – Image processing – Order statistics, sorting – Dynamic programming – High performance computing • e.g., many particle simulations (in chemistry, physics or astronomy) 18-May-16 Rudolf Mak TU/e Computer Science Systolic 8

FIR filter (N-tap) Spec � � � � � � � � � � , 0 � � �� , � � � � � � � � � � � , 0 � � � � �� , 0 � �� RIA � �, � � � � � � � � � � � � � 1 � � � 1 � � �� 1 � � � 1 � � � � � � �� 1, � � 1� does not work!!! � �, � � 0 � �, � � � � � �� 1, �� or � ��, � � 1� � �, � � � � � ��, � � 1� 18-May-16 Rudolf Mak TU/e Computer Science Systolic 9

Regular Iterative Algorithm ��, �� is input A RIA is a triple consisting of { ��, �� | 0 � �, 0 � � � � 1. An index space ! �, �, � 2. A finite set of variables 3. A set of direct dependencies among indexed variables (given as equalities) • with associated index displacement vectors • also called fundamental edges by Parhi Canonical forms : 1. Standard input 2. Standard output 18-May-16 Rudolf Mak TU/e Computer Science Systolic 10

FIR-filter: RIA description Standard output canonical form: � ��, �� , �� , �� 1, � � 1�, ��, �� 0 � 1, �1 � ��, �� 1, �� , ��1, �� = �, � � ��, �� , � � 1� ��, �1� � �� , � � 1� ��, �� Index displacement vectors: LHS = RHS + IDV � → � � → � � → � � → � � → � �#�$ → %�$� �0, 1� �1, 0� �0, 0� �0, 0� �1, �1� �0, �1� 18-May-16 Rudolf Mak TU/e Computer Science Systolic 11

Computational node ( & ( � + 1 �&, '� node g ) � + 2 �&, '� * � + 3 �&, '� ) + 1 . � + 2 + 3 * ' 18-May-16 Rudolf Mak TU/e Computer Science Systolic 12

Computational node from RIA �� 1, � � 1� ��, �� I( g ) �� 1, �� , �� I( g ) is the index vector, i.e., the sequence of ��, �� , � � 1� coordinates of g in index-space � ��, �� , �� , �� 1, � � 1� 18-May-16 Rudolf Mak TU/e Computer Science Systolic 13

Dependence graphs 1. The nodes of a dependence graph represent ( small ) computations . There is a separate node for each computation. 2. The edges of a dependence graph represent causal dependencies between computations, i.e., an edge from node � to node � indicates that the result of the computation performed by � is used in the computation performed by � . 3. There is no notion of time in a dependence graph. It is an (index-)space representation. 18-May-16 Rudolf Mak TU/e Computer Science Systolic 14

FIR: Dependence graph �� 0�� 1�� 1� � ��2�� 2� x(0) x(1) x(2) x(3) x(4) h(2) h(1) � h(0) y(0) y(1) y(2) y(3) y(4) � 18-May-16 Rudolf Mak TU/e Computer Science Systolic 15

FIR: Dependence graph �� 0�� 1�� 1� � ��2�� 2� x(0) x(1) x(2) x(3) x(4) 0 0 0 0 0 h(2) 0 h(1) 0 � h(0) y(0) y(1) y(2) y(3) y(4) � 18-May-16 Rudolf Mak TU/e Computer Science Systolic 16

Regular dependence graphs A dependence graph / is regular when: 1. There is a injective mapping 0 from the nodes of / to a grid of points in the � - dimensional index space. 2. There exists a finite set 1 of vectors, called fundamental edges , such that every pair ��, �� of neighboring nodes is mapped to a pair of grid locations that differ by a fundamental edge 2 ∈ 1 , i.e., 0 � � 0 � � 2 . 18-May-16 Rudolf Mak TU/e Computer Science Systolic 17

FIR: DG in space representation x(0) x(1) x(2) x(3) x(4) h(2) h(1) (1,-1) (0,1) h(0) (1,0) y(0) y(1) y(2) y(3) y(4) 1 � 2 4 2 5 |2 6 � � 1 0 1 fundamental edges 0 1 �1 18-May-16 Rudolf Mak TU/e Computer Science Systolic 18

Systolic array design The design of a systolic array for a computation given in the form of a regular dependence graph involves: 1. Choosing a processor space, i.e., a set of dimensions and a number of PEs per dimension (the array). 2. Mapping each computational node of the graph to a PE of the array. Similar to folding 3. For each PE scheduling the computations of the nodes mapped onto it, i.e., assigning each individual computation to a distinct time slot. 18-May-16 Rudolf Mak TU/e Computer Science Systolic 19

Design parameters An �� 1 )-dimensional systolic design for an � -dimensional regular dependence graph is characterized by: A � 7 �� 1� processor space matrix 8 : 1. : 0�� is the processor that executes node � 9 A � -dimensional scheduling vector ; : 2. : 0�� is the time slot at which node x is executed < A projection (iteration) vector = : 3. : 0�� 9 @ 0�� 0�� – 0�� ? = implies 9 18-May-16 Rudolf Mak TU/e Computer Science Systolic 20

Design constraints • Computations whose grid locations differ by a multiple of the projection vector execute on the same PE : 0�� 9 @ 0�� – 0�� – 0�� ? = implies 9 : = � 0 9 – hence • Computations that execute on the same PE must be scheduled in different time slots : 0�� is the time slot at which node � is – < executed : = A 0 – hence ; 18-May-16 Rudolf Mak TU/e Computer Science Systolic 21

B : � �0, 1� Processor allocation: ) : � �1, 0� x(0) x(1) x(2) x(3) x(4) h(2) processors h(1) h(0) y(0) y(1) y(2) y(3) y(4) B : � � � � 18-May-16 Rudolf Mak TU/e Computer Science Systolic 22

C : � �1, 0� Scheduling: ) : � �1, 0� x(0) x(1) x(2) x(3) x(4) 1 2 4 0 3 h(2) 1 2 4 0 3 h(1) 1 2 4 0 3 h(0) y(0) y(1) y(2) y(3) y(4) C : � � � � time 18-May-16 Rudolf Mak TU/e Computer Science Systolic 23

VLSI programming Systolic Design Book Parhi, Chp. 7 Rudolf Mak - PowerPoint PPT Presentation

VLSI programming Systolic Design Book Parhi, Chp. 7 Rudolf Mak r.h.mak@tue.nl 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 Agenda Systolic arrays (what, where) Regular Iterative Algorithms (RIAs) Dependence graphs

VLSI Design Styles Basic Concepts in VLSI Physical Design Automation 1 VLSI Design Cycle

VLSI Design Styles Basic Concepts in VLSI Physical Design Automation VLSI Design Cycle

CS/EE 6710 Digital VLSI Design CS/EE 6710 Digital VLSI Design 1 CS/EE 6710 Digital VLSI

CS/ECE 5710/6710 Digital VLSI Design CS/ECE 5710/6710 Digital VLSI Design 1 CS/EE 5710/6710

VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital Signal Processing Systems

Objectives VLSI technology trend Design challenges Custom and semi-custom VLSI

Contemporary Management of Diabetic Diabetes Cardiomyopathy Systolic Heart Failure Obesity

Cross- -sectional Association of Job Strain and Systolic sectional Association of Job Strain and

On the explicit systolic inequality from the cup-product Hoil Ryu Graduate School of

Dont Use a Single Large Systolic Array, Use Many Small Ones Instead H. T. Kung Harvard

CMPE 646: VLSI Design Verification and Test Course: CMPE 646: VLSI Design Verification and Test,

CS/EE 6710 CS/EE 6710 Digital VLSI Design Web Page - all sorts of information! T Th

Introduction to Digital VLSI Design

Introduction to Digital VLSI Design VLSI Verilog

Introduction to Digital VLSI Design VLSI Verilog

Introduction to Digital VLSI Design VLSI Verilog

Probabilistic modeling of sensor artifacts in critical care Norm Aleks and Stuart J. Russell

Realization theory for systems biology Mihly Petreczky CNRS Ecole Central Lille, France

Model-checking in systems biology - From Micro to Macro 1 / 62 00001 - 00:00:01 Model-checking

Visual comparisons Comparing distributions: Part 1 R.W. Oldford The Titanic The data set

Approach in ML Architecture" Professor Uri Weiser Viterbi Faculty of Electrical Engineering

CS137: Today Electronic Design Automation Sequential Sorting Building on Parallel

Algorithm-SoC Co-Design for Mobile Continuous Vision Yuhao Zhu Department of Computer Science

Using TPUs to Design TPUs Cliff Young, Google AI AIDArc Keynote 3 June 2018 Why Were at

VLSI programming Systolic Design Book Parhi, Chp. 7 Rudolf Mak - PowerPoint PPT Presentation

VLSI programming Systolic Design Book Parhi, Chp. 7 Rudolf Mak r.h.mak@tue.nl 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 Agenda Systolic arrays (what, where) Regular Iterative Algorithms (RIAs) Dependence graphs

VLSI Design Styles Basic Concepts in VLSI Physical Design Automation 1 VLSI Design Cycle

VLSI Design Styles Basic Concepts in VLSI Physical Design Automation VLSI Design Cycle

CS/EE 6710 Digital VLSI Design CS/EE 6710 Digital VLSI Design 1 CS/EE 6710 Digital VLSI

CS/ECE 5710/6710 Digital VLSI Design CS/ECE 5710/6710 Digital VLSI Design 1 CS/EE 5710/6710

VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital Signal Processing Systems

Objectives VLSI technology trend Design challenges Custom and semi-custom VLSI

Contemporary Management of Diabetic Diabetes Cardiomyopathy Systolic Heart Failure Obesity

Cross- -sectional Association of Job Strain and Systolic sectional Association of Job Strain and

On the explicit systolic inequality from the cup-product Hoil Ryu Graduate School of

Dont Use a Single Large Systolic Array, Use Many Small Ones Instead H. T. Kung Harvard

CMPE 646: VLSI Design Verification and Test Course: CMPE 646: VLSI Design Verification and Test,

CS/EE 6710 CS/EE 6710 Digital VLSI Design Web Page - all sorts of information! T Th

Introduction to Digital VLSI Design

Introduction to Digital VLSI Design VLSI Verilog

Introduction to Digital VLSI Design VLSI Verilog

Introduction to Digital VLSI Design VLSI Verilog

Probabilistic modeling of sensor artifacts in critical care Norm Aleks and Stuart J. Russell

Realization theory for systems biology Mihly Petreczky CNRS Ecole Central Lille, France

Model-checking in systems biology - From Micro to Macro 1 / 62 00001 - 00:00:01 Model-checking

Visual comparisons Comparing distributions: Part 1 R.W. Oldford The Titanic The data set

Approach in ML Architecture&quot; Professor Uri Weiser Viterbi Faculty of Electrical Engineering

CS137: Today Electronic Design Automation Sequential Sorting Building on Parallel

Algorithm-SoC Co-Design for Mobile Continuous Vision Yuhao Zhu Department of Computer Science

Using TPUs to Design TPUs Cliff Young, Google AI AIDArc Keynote 3 June 2018 Why Were at

Approach in ML Architecture" Professor Uri Weiser Viterbi Faculty of Electrical Engineering