Parallel Programming
Overview and Concepts
Parallel Programming Overview and Concepts Practical Outline - - PowerPoint PPT Presentation
Parallel Programming Overview and Concepts Practical Outline Decomposition Geometric decomposition Task farm Pipeline Loop parallelism General parallelisation considerations Parallel code performance metrics and
Overview and Concepts
Practical
It is harder than serial so why bother?
counterpart
between different processors
processors
communication
1.
Limit communication (especially the number of messages)
2.
Balance the load so all processors are equally busy
between their parallel tasks
interaction between their parallel tasks
extremes Sharpen
How do we split problems up to solve efficiently in parallel?
decisions is how to split the problem up
decompositions CFD
defined intervals
information on the boundaries
result in far greater
i.e. they should be load balanced
Fractal
number of workers and tasks get allocated to idle workers Master Worker 3 Worker 2 Worker 1 Worker n … Fractal
acknowledge receipt of results
work
function mapper(String name, String document): for each word w in document: emit (w, 1) function reducer(String word, Iterator partialCounts): sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum) hello test this is a test hello (hello,1), (test,1), (this,1), (is,1), (a,1), (test,1), (hello,1) (hello,1,1), (test,1,1), (this,1), (is,1), (a,1) (hello,2), (test,2), (this,1), (is,1), (a,1) grouper
Mapper (user supplies this code) Take a (local) list of key-value pairs, and for each pair, return another (intermediate) key-value pair Reducer (user supplies this code) One reducer for each intermediate key. Takes the intermediate key-value pairs, performs a reduction and returns another (usually) shorter list of final key-values. Grouper (part of runtime) Groups by intermediate key
flowing through a sequence of stages and being operated
communicates with the processor holding the next stage
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Data Result
becoming more and more relevant to large, distributed scientific workflows
computationally intensive loops.
a working code
can not be parallelised then these factors can dominate (Amdahl’s law.) OpenMP Sharpen
run in serial
How is my parallel code performing and scaling?
Where N is the size of the problem and P the number of processors
“The performance improvement to be gained by parallelisation is limited by the proportion of the code which is serial” Gene Amdahl, 1967
Sharpen & CFD
less important
process/task then the serial component will not dominate
CFD Due to the scaling of N, effectively the serial fraction becomes ∝/P
changes as the number of processors is increased
number of processors, keeping the amount of work per processor the same
to achieve than weak scaling
5 10 15 20 25 1 n
Example runtime vs No. of processors
Runtime (s)
50 100 150 200 250 300 50 100 150 200 250 300 Speed-up No of processors
Speed-up vs No of processors
linear actual
machine it can take advantage of
performs and scales
approaches to parallelising a serial problem