Pipelined Computations In the pipeline technique, the problem is - PDF document

Pipelined Computations In the pipeline technique, the problem is divided into a series of tasks that have to be completed one after the other. In fact, this is the basis of sequential programming. Each task will be executed by a separate process or processor. P 0 P 1 P 2 P 3 P 4 P 5 Figure 5.1 Pipelined processes. This parallelism can be viewed as a form of functional decomposition . The problem is divided into separate functions that must be performed, but in this case, the functions are performed in succession. As we shall see, the input data is often broken up and processed separately. 153 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Example Add all the elements of array a to an accumulating sum: for (i = 0; i < n; i++) sum = sum + a[i]; The loop could be “unfolded” to yield sum = sum + a[0]; sum = sum + a[1]; sum = sum + a[2]; sum = sum + a[3]; sum = sum + a[4]; . One pipeline solution: a[0] a[1] a[2] a[3] a[4] a a a a a sum s in s out s in s out s in s out s in s out s in s out Figure 5.2 Pipeline for an unfolded loop. Stage i performs s out = s in + a[i]; 154 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Example A frequency filter - The objective here is to remove specific frequencies (say the frequencies f 0 , f 1 , f 2 , f 3 , etc.) from a (digitized) signal, f ( t ). The signal could enter the pipeline from the left: Signal without Signal without Signal without Signal without frequency f 0 frequency f 1 frequency f 2 frequency f 3 f 0 f 1 f 2 f 3 f 4 Filtered signal f in f out f in f out f in f out f in f out f in f out f ( t ) Figure 5.3 Pipeline for a frequency filter. 155 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Given that the problem can be divided into a series of sequential tasks, the pipelined approach can provide increased speed under the following three types of computations: 1. If more than one instance of the complete problem is to be executed 2. If a series of data items must be processed, each requiring multiple operations 3. If information to start the next process can be passed forward before the process has completed all its internal operations 156 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

“Type 1” Pipeline S pace-Time Diagram p − 1 m Instance Instance Instance Instance Instance P 5 1 2 3 4 5 Instance Instance Instance Instance Instance Instance P 4 1 2 3 4 5 6 Instance Instance Instance Instance Instance Instance Instance P 3 1 2 3 4 5 6 7 Instance Instance Instance Instance Instance Instance Instance P 2 1 2 3 4 5 6 7 Instance Instance Instance Instance Instance Instance Instance P 1 1 2 3 4 5 6 7 Instance Instance Instance Instance Instance Instance Instance P 0 1 2 3 4 5 6 7 Time Figure 5.4 Space-time diagram of a pipeline. 157 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Instance 0 P 0 P 1 P 2 P 3 P 4 P 5 Instance 1 P 0 P 1 P 2 P 3 P 4 P 5 Instance 2 P 0 P 1 P 2 P 3 P 4 P 5 Instance 3 P 0 P 1 P 2 P 3 P 4 P 5 Instance 4 P 0 P 1 P 2 P 3 P 4 P 5 Time Figure 5.5 Alternative space-time diagram. 158 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

“Type 2” Pipeline S pace-Time Diagram Input sequence d 9 d 8 d 7 d 6 d 5 d 4 d 3 d 2 d 1 d 0 P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 (a) Pipeline structure p − 1 n d 0 d 1 d 2 d 3 d 4 d 5 d 6 P 9 d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 P 8 d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 P 7 d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9 P 6 d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9 P 5 d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9 P 4 d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9 P 3 d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9 P 2 d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9 P 1 P 0 d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9 Time (b) Timing diagram Figure 5.6 Pipeline processing 10 data elements. 159 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

“Type 3” Pipeline S pace-Time Diagram P 5 P 5 P 4 P 4 Information P 3 P 3 transfer sufficient to P 2 P 2 start next process P 1 P 1 Information passed P 0 to next stage P 0 Time Time (a) Processes with the same (b) Processes not with the execution time same execution time Figure 5.7 Pipeline processing where information passes to next stage before end of process. 160 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

If the number of stages is larger than the number of processors in any pipeline, a group of stages can be assigned to each processor: Processor 0 Processor 1 Processor 2 P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P 10 P 11 Figure 5.8 Partitioning processes onto processors. 161 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Computing Platform for Pipelined Applications Multiprocessor Host computer Figure 5.9 Multiprocessor system with a line configuration. 162 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Pipeline Program Examples Adding Numbers 2 3 4 5 Σ Σ Σ Σ Σ i i i i i 1 1 1 1 1 P 0 P 1 P 2 P 3 P 4 Figure 5.10 Pipelined addition. The basic code for process P i : recv(&accumulation, P i-1 ); accumulation = accumulation + number; send(&accumulation, P i+1 ); except for the first process, P 0 , which is send(&number, P 1 ); and the last process, P n − 1 , which is recv(&number, P n-2 ); accumulation = accumulation + number; SPMD program if (process > 0) { recv(&accumulation, P i-1 ); accumulation = accumulation + number; } if (process < n-1) send(&accumulation, P i+1 ); The final result is in the last process. Instead of addition, other arithmetic operations could be done. 163 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Master process Slaves d n − 1 … d 2 d 1 d 0 P 0 P 1 P 2 P n − 1 Sum Figure 5.11 Pipelined addition numbers with a master process and ring configuration. 164 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Master process Numbers d 0 d 1 d n − 1 Slaves P 1 P 0 P 2 P n − 1 Sum Figure 5.12 Pipelined addition of numbers with direct access to slave processes. 165 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Analysis Our first pipeline example is Type 1. We will assume that each process performs similar actions in each pipeline cycle. Then we will work out the computation and communication required in a pipeline cycle. The total execution time: t total = (time for one pipeline cycle)(number of cycles) t total = ( t comp + t comm )( m + p − 1) where there are m instances of the problem and p pipeline stages (processes). The average time for a computation is given by t total t a = --------- - m Single Instance of Problem t comp = 1 t comm = 2( t startup + t data ) t total = (2( t startup + t data ) + 1) n The time complexity = Ο ( n ). Multiple Instances of Problem t total = (2( t startup + t data ) + 1)( m + n − 1) t total ≈ 2( t startup + t data ) + 1 t a = --------- - m That is, one pipeline cycle Data Partitioning with Multiple Instances of Problem t comp = d t comm = 2( t startup + t data ) t total = (2( t startup + t data ) + d )( m + n/d − 1) As we increase the d , the data partition, the impact of the communication diminishes. But increasing the data partition decreases the parallelism and often increases the execution time. 166 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Pipelined Computations In the pipeline technique, the problem is - PDF document

Pipelined Computations In the pipeline technique, the problem is divided into a series of tasks that have to be completed one after the other. In fact, this is the basis of sequential programming. Each task will be executed by a separate process

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

CSEE 3827: Fundamentals of Computer Systems, Spring 2011 9. Pipelined MIPS Processor Prof.

LECTURE 9 Pipeline Hazards PIPELINED DATAPATH AND CONTROL In the previous lecture, we

The computations of acting agents and the agents acting in computations Philipp Hennig ICERM 5

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &

Embarrassingly Parallel Computations Embarrassingly Parallel Computations A computation that

for Optimization and Analysis of Floating-Point Computations Heiko Becker, Pavel Panchekha, Eva

Interval Computations as Why Intervals? Applied Constructive Interval Computations . . . Wiener

OC-3072 Packet Classification Using BDDs and Pipelined SRAMs Amit Prakash Adnan Aziz Department

A Novel Scalable IPv6 Lookup Scheme Using Compressed Pipelined Tries Michel Hanna, Sangyeun Cho,

IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya EECS Intl Graduate Program

Integrated Meshing in STAR-CCM+ Aly Khawaja Overview The STAR-CCM+ meshing suite a

Learning Artificial Intelligence in Large-Scale Video Games A First Case Study with

Theory of Computer Games: An A.I. Oriented Introduction Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

SMALL SCALE, LOW-DENSE CITY Urban planning, climate and energy policy Chris Butters, Ali

2016 Christmas Special Three Portraits of the Birth of the Messiah - Lesson #2 December 18,

Sets Measures of Central Tendency Standard Deviation and Normal Distribution Two-Way Frequency

University of Helsinki Ask Questions Dont be shy, I like to answer them I tell lots

Chapter6 In AI we limit games to: - deterministic - turn-taking - two-player -

I nt roduct ion So f ar we have st udied environment s where t here is only a single-agent

Pipelined Computations In the pipeline technique, the problem is - PDF document

Pipelined Computations In the pipeline technique, the problem is divided into a series of tasks that have to be completed one after the other. In fact, this is the basis of sequential programming. Each task will be executed by a separate process

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

CSEE 3827: Fundamentals of Computer Systems, Spring 2011 9. Pipelined MIPS Processor Prof.

LECTURE 9 Pipeline Hazards PIPELINED DATAPATH AND CONTROL In the previous lecture, we

The computations of acting agents and the agents acting in computations Philipp Hennig ICERM 5

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &amp;

Embarrassingly Parallel Computations Embarrassingly Parallel Computations A computation that

for Optimization and Analysis of Floating-Point Computations Heiko Becker, Pavel Panchekha, Eva

Interval Computations as Why Intervals? Applied Constructive Interval Computations . . . Wiener

OC-3072 Packet Classification Using BDDs and Pipelined SRAMs Amit Prakash Adnan Aziz Department

A Novel Scalable IPv6 Lookup Scheme Using Compressed Pipelined Tries Michel Hanna, Sangyeun Cho,

IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya EECS Intl Graduate Program

Integrated Meshing in STAR-CCM+ Aly Khawaja Overview The STAR-CCM+ meshing suite a

Learning Artificial Intelligence in Large-Scale Video Games A First Case Study with

Theory of Computer Games: An A.I. Oriented Introduction Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

SMALL SCALE, LOW-DENSE CITY Urban planning, climate and energy policy Chris Butters, Ali

2016 Christmas Special Three Portraits of the Birth of the Messiah - Lesson #2 December 18,

Sets Measures of Central Tendency Standard Deviation and Normal Distribution Two-Way Frequency

University of Helsinki Ask Questions Dont be shy, I like to answer them I tell lots

Chapter6 In AI we limit games to: - deterministic - turn-taking - two-player -

I nt roduct ion So f ar we have st udied environment s where t here is only a single-agent

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &