parallel programming and high performance computing
play

Parallel Programming and High-Performance Computing Part 3: - PowerPoint PPT Presentation

Technische Universitt Mnchen Parallel Programming and High-Performance Computing Part 3: Foundations Dr. Ralf-Peter Mundani CeSIM / IGSSE Technische Universitt Mnchen 3 Foundations Overview terms and definitions process


  1. Technische Universität München Parallel Programming and High-Performance Computing Part 3: Foundations Dr. Ralf-Peter Mundani CeSIM / IGSSE

  2. Technische Universität München 3 Foundations Overview • terms and definitions • process interaction for MemMS • process interaction for MesMS • example of a parallel program A distributed system is the one that prevents you from working because of the failure of a machine that you had never heard of. —Leslie Lamport 3 − 2 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

  3. Technische Universität München 3 Foundations Terms and Definitions • sequential vs. parallel: an algorithm analysis – sequential algorithms are characterised that way • all instructions U are processed in a certain sequence • this sequence is given due to the causal ordering of U, i. e. the causal dependencies from another instructions’ results – hence, for the set U a partial order ≤ can be declared • x ≤ y for x, y ∈ U • ≤ representing a reflexive, antisymmetric, transitive relation – often, for (U, ≤ ) more than one sequence can be found so that all computations (on the monoprocessor) are executed correctly sequence 1 ≤ ≤ sequence 2 (blockwise) sequence 3 3 − 3 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

  4. Technische Universität München 3 Foundations Terms and Definitions • sequential vs. parallel: an algorithm analysis (cont’d) – first step towards a parallel program: concurrency • via (U, ≤ ) identification of independent blocks (of instructions) • simple parallel processing of independent blocks possible (due to only a few communication / synchronisation points) 1 2 3 4 5 time ≤ ≤ – suited for both parallel processing (multiprocessor) and distributed processing (metacomputer, grid) 3 − 4 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

  5. Technische Universität München 3 Foundations Terms and Definitions • sequential vs. parallel: an algorithm analysis (cont’d) – further parallelisation of sequential blocks • subdivision of suitable blocks (loop constructs, e. g.) for parallel processing • here, communication / synchronisation indispensable 1 2 3 4 5 time ≤ ≤ A B C D E – mostly suitable for parallel processing (MemMS and MesMS) 3 − 5 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

  6. Technische Universität München 3 Foundations Terms and Definitions • general design questions – several considerations have to be taken into account for writing a parallel program (either from scratch or based on an existing sequential program) – standard questions comprise • which part of the (sequential) program can be parallelised • what kind of structure to be used for parallelisation • which parallel programming model to be used • which parallel programming language to be used • what kind of compiler to be used • what about load balancing strategies • what kind of architecture is the target machine 3 − 6 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

  7. Technische Universität München 3 Foundations Terms and Definitions • dependence analysis – processes / (blocks of) instructions cannot be executed simultaneously if there exist dependencies between them – hence, a dependence analysis of a given algorithm is necessary – example for_all_processes (i = 0; i < N; ++ i) a [ i ] = 0 – what about the following code for_all_processes (i = 1; i < N; ++ i) x = i − 2*i + i*i a [ i ] = a [ x ] – as it is not always obvious, an algorithmic way of recognising dependencies (via the compiler, e. g.) would preferable 3 − 7 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

  8. Technische Universität München 3 Foundations Terms and Definitions • dependence analysis (cont’d) – B ERNSTEIN (1966) established a set of conditions, sufficient for determining whether two processes can be executed in parallel – definitions • I i (input): set of memory locations read by process P i • O i (output): set of memory locations written by process P i – B ERNSTEIN ’s conditions I 1 ∩ O 2 = ∅ I 2 ∩ O 1 = ∅ O 1 ∩ O 2 = ∅ – example P 1 : a = x + y P 2 : b = x + z I 1 = { x, y } , O 1 = { a } , I 2 = { x, z } , O 2 = { b } � all conditions fulfilled 3 − 8 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

  9. Technische Universität München 3 Foundations Terms and Definitions • dependence analysis (cont’d) – further example P 1 : a = x + y P 2 : b = a + b I 1 = { x, y } , O 1 = { a } , I 2 = { a, b } , O 2 = { b } � I 2 ∩ O 1 ≠ ∅ – B ERNSTEIN ’s conditions help to identify instruction-level parallelism or coarser parallelism (loops, e. g.) – hence, sometimes dependencies within loops can be solved – example: two loops with dependencies – which to be solved loop A: loop B: for (i = 2; i < 100; ++ i) for (i = 2; i < 100; ++ i) a [ i ] = a [ i − 1 ] + 4 a [ i ] = a [ i − 2 ] + 4 3 − 9 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

  10. Technische Universität München 3 Foundations Terms and Definitions • dependence analysis (cont’d) – expansion of loop B a [ 2 ] = a [ 0 ] + 4 a [ 3 ] = a [ 1 ] + 4 a [ 4 ] = a [ 2 ] + 4 a [ 5 ] = a [ 3 ] + 4 a [ 6 ] = a [ 4 ] + 4 a [ 7 ] = a [ 5 ] + 4 … … – hence, a [ 3 ] can only be computed after a [ 1 ] , a [ 4 ] after a [ 2 ] , … � computation can be split into two independent loops a [ 0 ] = … a [ 1 ] = … for (i = 1; i < 50; ++ i) for (i = 1; i < 50; ++ i) j = 2*i j = 2*i + 1 a [ j ] = a [ j − 2 ] + 4 a [ j ] = a [ j − 2 ] + 4 – many other techniques for recognising / creating parallelism exist (see also Chapter 4: Dependence Analysis) 3 − 10 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

  11. Technische Universität München 3 Foundations Terms and Definitions • structures of parallel programs – examples of structures parallel program … function data competitive parallelism parallelism parallelism … macropipelining static dynamic commissioning order acceptance 3 − 11 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

  12. Technische Universität München 3 Foundations Terms and Definitions • structures of parallel programs (cont’d) – function parallelism • parallel execution (on different processors) of components such as functions, procedures, or blocks of instructions • drawback – separate program for each processor necessary – limited degree of parallelism � limited scalability • macropipelining for data transfer between single components – overlapping parallelism similar to pipelining in processors – one component (producer) hands its processed data to the next one (consumer) � stream of results – components should be of same complexity ( � idle times) – data transfer can either be synchronous (all components communicate simultaneously) or asynchronous (buffered) 3 − 12 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

  13. Technische Universität München 3 Foundations Terms and Definitions • structures of parallel programs (cont’d) – data parallelism (1) • parallel execution of same instructions (functions or even programs) on different parts of the data (SIMD) • advantages – only one program for all processors necessary – in most cases ideal scalability • drawback: explicit distribution of data necessary (MesMS) • structuring of data parallel programs – static : compiler decides about parallel and sequential processing of concurrent parts – dynamic : decision about parallel processing at run time, i. e. dynamic structure allows for load balancing (at the expenses of higher organisation / synchronisation costs) 3 − 13 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

  14. Technische Universität München 3 Foundations Terms and Definitions • structures of parallel programs (cont’d) – data parallelism (2) • dynamic structuring – commissioning ( master-slave ) » one master process assigns data to slave processes » both master and slave program necessary » master becomes potential bottleneck in case of too much slaves ( � hierarchical organisation) – order polling ( bag-of-tasks ) » processes pick the next part of available data “from a bag” as soon as they have finished their computations » mostly suitable for MemMS as bag has to be accessible from all processes ( � communication overhead for MesMS) 3 − 14 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend