Parallel Programming Patterns
Overview and Concepts
Parallel Programming Patterns Overview and Concepts Practical - - PowerPoint PPT Presentation
Parallel Programming Patterns Overview and Concepts Practical Outline Why parallel programming? Decomposition Geometric decomposition Task farm Pipeline Loop parallelism Performance metrics and scaling Amdahls
Overview and Concepts
Practical
It is harder than serial so why bother?
counterpart
between different processors
processors
communication
1.
Limit communication (especially the number of messages)
2.
Balance the load so all processors are equally busy
between their parallel tasks
interaction between their parallel tasks
extremes Sharpen
How do we split problems up to solve efficiently in parallel?
decisions is how to split the problem up
decompositions CFD
Image from ITWM: http://www.itwm.fraunhofer.de/en/departments/flow-and- material-simulation/mechanics-of-materials/domain-decomposition-and-parallel- mesh-generation.html
defined intervals
information on the boundaries
result in far greater
i.e. they should be load balanced
Fractal
number of workers and tasks get allocated to idle workers Master Worker 3 Worker 2 Worker 1 Worker n … Fractal
acknowledge receipt of results
work
flowing through a sequence of stages and being operated
communicates with the processor holding the next stage
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Data Result
different task or stage of the pipeline
through the pipeline
Data Result 1 2 1 3 2 1 4 3 2 1
becoming more and more relevant to large, distributed scientific workflows
computationally intensive loops.
working code
can not be parallelised then these factors can dominate (Amdahl’s law.) OpenMP Sharpen
run in serial
How is my parallel code performing and scaling?
Where N is the size of the problem and P the number of processors
changes as the number of processors is increased
number of processors, keeping the amount of work per processor the same
to achieve than weak scaling
50 100 150 200 250 300 50 100 150 200 250 300 Speed-up No of processors
Speed-up vs No of processors
linear actual
2 4 6 8 10 12 14 16 18 20 1 n Actual Ideal
Runtime (s)
“The performance improvement to be gained by parallelisation is limited by the proportion of the code which is serial” Gene Amdahl, 1967
Sharpen & CFD
T(N, P) =αT(N,1)+ (1−α)T(N,1) P S(N, P) = T(N,1) T(N, P) = P αP +(1−α)
less important
S(P,P) = α + (1-α) P
E(P,P) = α/P + (1-α)
T(N, P) = Tserial(N, P)+Tparallel(N, P) =αT(1,1)+ (1−α) N T(1,1) P
S(N, P) = T(N,1) T(N, P) = α +(1−α)N α +(1−α) N
P
T(N,1) =αT(1,1)+(1−α) N T(1,1)
task then the serial component will not dominate
problem size
CFD
Due to the scaling
fraction effectively becomes α/P
Number of processors Strong scaling (Amdahl’s law) Weak scaling (Gustafson’s law) 16 6.4 14.5 1024 9.9 921.7
number of processors
Person Anna Paul David Helen Total # boxes 6 1 3 2 12
LIF = maximum load / average load
balanced load
parallelising a serial problem
can take advantage of
and scales