Parallel Programming and High-Performance Computing Part 3: - PowerPoint PPT Presentation

Technische Universität München Parallel Programming and High-Performance Computing Part 3: Foundations Dr. Ralf-Peter Mundani CeSIM / IGSSE

Technische Universität München 3 Foundations Overview • terms and definitions • process interaction for MemMS • process interaction for MesMS • example of a parallel program A distributed system is the one that prevents you from working because of the failure of a machine that you had never heard of. —Leslie Lamport 3 − 2 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Technische Universität München 3 Foundations Terms and Definitions • sequential vs. parallel: an algorithm analysis – sequential algorithms are characterised that way • all instructions U are processed in a certain sequence • this sequence is given due to the causal ordering of U, i. e. the causal dependencies from another instructions’ results – hence, for the set U a partial order ≤ can be declared • x ≤ y for x, y ∈ U • ≤ representing a reflexive, antisymmetric, transitive relation – often, for (U, ≤ ) more than one sequence can be found so that all computations (on the monoprocessor) are executed correctly sequence 1 ≤ ≤ sequence 2 (blockwise) sequence 3 3 − 3 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Technische Universität München 3 Foundations Terms and Definitions • sequential vs. parallel: an algorithm analysis (cont’d) – first step towards a parallel program: concurrency • via (U, ≤ ) identification of independent blocks (of instructions) • simple parallel processing of independent blocks possible (due to only a few communication / synchronisation points) 1 2 3 4 5 time ≤ ≤ – suited for both parallel processing (multiprocessor) and distributed processing (metacomputer, grid) 3 − 4 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Technische Universität München 3 Foundations Terms and Definitions • sequential vs. parallel: an algorithm analysis (cont’d) – further parallelisation of sequential blocks • subdivision of suitable blocks (loop constructs, e. g.) for parallel processing • here, communication / synchronisation indispensable 1 2 3 4 5 time ≤ ≤ A B C D E – mostly suitable for parallel processing (MemMS and MesMS) 3 − 5 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Technische Universität München 3 Foundations Terms and Definitions • general design questions – several considerations have to be taken into account for writing a parallel program (either from scratch or based on an existing sequential program) – standard questions comprise • which part of the (sequential) program can be parallelised • what kind of structure to be used for parallelisation • which parallel programming model to be used • which parallel programming language to be used • what kind of compiler to be used • what about load balancing strategies • what kind of architecture is the target machine 3 − 6 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Technische Universität München 3 Foundations Terms and Definitions • dependence analysis – processes / (blocks of) instructions cannot be executed simultaneously if there exist dependencies between them – hence, a dependence analysis of a given algorithm is necessary – example for_all_processes (i = 0; i < N; ++ i) a [ i ] = 0 – what about the following code for_all_processes (i = 1; i < N; ++ i) x = i − 2*i + i*i a [ i ] = a [ x ] – as it is not always obvious, an algorithmic way of recognising dependencies (via the compiler, e. g.) would preferable 3 − 7 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Technische Universität München 3 Foundations Terms and Definitions • dependence analysis (cont’d) – B ERNSTEIN (1966) established a set of conditions, sufficient for determining whether two processes can be executed in parallel – definitions • I i (input): set of memory locations read by process P i • O i (output): set of memory locations written by process P i – B ERNSTEIN ’s conditions I 1 ∩ O 2 = ∅ I 2 ∩ O 1 = ∅ O 1 ∩ O 2 = ∅ – example P 1 : a = x + y P 2 : b = x + z I 1 = { x, y } , O 1 = { a } , I 2 = { x, z } , O 2 = { b } � all conditions fulfilled 3 − 8 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Technische Universität München 3 Foundations Terms and Definitions • dependence analysis (cont’d) – further example P 1 : a = x + y P 2 : b = a + b I 1 = { x, y } , O 1 = { a } , I 2 = { a, b } , O 2 = { b } � I 2 ∩ O 1 ≠ ∅ – B ERNSTEIN ’s conditions help to identify instruction-level parallelism or coarser parallelism (loops, e. g.) – hence, sometimes dependencies within loops can be solved – example: two loops with dependencies – which to be solved loop A: loop B: for (i = 2; i < 100; ++ i) for (i = 2; i < 100; ++ i) a [ i ] = a [ i − 1 ] + 4 a [ i ] = a [ i − 2 ] + 4 3 − 9 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Technische Universität München 3 Foundations Terms and Definitions • dependence analysis (cont’d) – expansion of loop B a [ 2 ] = a [ 0 ] + 4 a [ 3 ] = a [ 1 ] + 4 a [ 4 ] = a [ 2 ] + 4 a [ 5 ] = a [ 3 ] + 4 a [ 6 ] = a [ 4 ] + 4 a [ 7 ] = a [ 5 ] + 4 … … – hence, a [ 3 ] can only be computed after a [ 1 ] , a [ 4 ] after a [ 2 ] , … � computation can be split into two independent loops a [ 0 ] = … a [ 1 ] = … for (i = 1; i < 50; ++ i) for (i = 1; i < 50; ++ i) j = 2*i j = 2*i + 1 a [ j ] = a [ j − 2 ] + 4 a [ j ] = a [ j − 2 ] + 4 – many other techniques for recognising / creating parallelism exist (see also Chapter 4: Dependence Analysis) 3 − 10 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Technische Universität München 3 Foundations Terms and Definitions • structures of parallel programs – examples of structures parallel program … function data competitive parallelism parallelism parallelism … macropipelining static dynamic commissioning order acceptance 3 − 11 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Technische Universität München 3 Foundations Terms and Definitions • structures of parallel programs (cont’d) – function parallelism • parallel execution (on different processors) of components such as functions, procedures, or blocks of instructions • drawback – separate program for each processor necessary – limited degree of parallelism � limited scalability • macropipelining for data transfer between single components – overlapping parallelism similar to pipelining in processors – one component (producer) hands its processed data to the next one (consumer) � stream of results – components should be of same complexity ( � idle times) – data transfer can either be synchronous (all components communicate simultaneously) or asynchronous (buffered) 3 − 12 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Technische Universität München 3 Foundations Terms and Definitions • structures of parallel programs (cont’d) – data parallelism (1) • parallel execution of same instructions (functions or even programs) on different parts of the data (SIMD) • advantages – only one program for all processors necessary – in most cases ideal scalability • drawback: explicit distribution of data necessary (MesMS) • structuring of data parallel programs – static : compiler decides about parallel and sequential processing of concurrent parts – dynamic : decision about parallel processing at run time, i. e. dynamic structure allows for load balancing (at the expenses of higher organisation / synchronisation costs) 3 − 13 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Technische Universität München 3 Foundations Terms and Definitions • structures of parallel programs (cont’d) – data parallelism (2) • dynamic structuring – commissioning ( master-slave ) » one master process assigns data to slave processes » both master and slave program necessary » master becomes potential bottleneck in case of too much slaves ( � hierarchical organisation) – order polling ( bag-of-tasks ) » processes pick the next part of available data “from a bag” as soon as they have finished their computations » mostly suitable for MemMS as bag has to be accessible from all processes ( � communication overhead for MesMS) 3 − 14 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

Parallel Programming and High-Performance Computing Part 3: - PowerPoint PPT Presentation

Technische Universitt Mnchen Parallel Programming and High-Performance Computing Part 3: Foundations Dr. Ralf-Peter Mundani CeSIM / IGSSE Technische Universitt Mnchen 3 Foundations Overview terms and definitions process

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel

Parallel Computing the Why and the How Albert-Jan Yzelman February, 2010 Albert-Jan Yzelman

Cluster Basics Hana Sevcikova University of Washington DataCamp Parallel Programming in R

Parallel Programming and High-Performance Computing Part 7: Examples of Parallel Algorithms Dr.

Parallel Programming and High-Performance Computing Part 2: High-Performance Networks Dr.

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Outline Overview Theoretical background Parallel computing systems Parallel

Overview Parallel computing platforms Approaches to building parallel computers

PARALLEL Joachim Nitschke PROGRAMMING Project Seminar Parallel Programming, Summer

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &

Introduction to OpenMP ! Introduction to parallel computing ! Classification of parallel

Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a

Parallel Programming and High-Performance Computing Part 4: Programming Memory-Coupled Systems

Parallel Programming and High-Performance Computing Part 5: Programming Message-Coupled Systems

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Linux Overview Amir Hossein Payberah payberah@gmail.com 1 Agenda Linux Overview Linux

Synchronization and Communication Making processes/threads work together Computadores II /

I t Introduction to NTCIR-7 d ti t NTCIR 7 N Noriko Kando k K d National Institute of

Software development Several users work on a same project Remote or collocated users

28. Parallel Programming II C++ Threads, Shared Memory, Concurrency, Excursion: lock algorithm

Automatic Identifjcation and Precise Attribution of DRAM Bandwidth Contention Christian Helm and

Automatic Generation of I/O Kernels for HPC Applications Babak Behzad 1 , Hoang-Vu Dang 1 , Farah

Automatically Identifying Automatically Identifying and Georeferencing Georeferencing and

Parallel Programming and High-Performance Computing Part 3: - PowerPoint PPT Presentation

Technische Universitt Mnchen Parallel Programming and High-Performance Computing Part 3: Foundations Dr. Ralf-Peter Mundani CeSIM / IGSSE Technische Universitt Mnchen 3 Foundations Overview terms and definitions process

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel

Parallel Computing the Why and the How Albert-Jan Yzelman February, 2010 Albert-Jan Yzelman

Cluster Basics Hana Sevcikova University of Washington DataCamp Parallel Programming in R

Parallel Programming and High-Performance Computing Part 7: Examples of Parallel Algorithms Dr.

Parallel Programming and High-Performance Computing Part 2: High-Performance Networks Dr.

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Outline Overview Theoretical background Parallel computing systems Parallel

Overview Parallel computing platforms Approaches to building parallel computers

PARALLEL Joachim Nitschke PROGRAMMING Project Seminar Parallel Programming, Summer

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &amp;

Introduction to OpenMP ! Introduction to parallel computing ! Classification of parallel

Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a

Parallel Programming and High-Performance Computing Part 4: Programming Memory-Coupled Systems

Parallel Programming and High-Performance Computing Part 5: Programming Message-Coupled Systems

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Linux Overview Amir Hossein Payberah payberah@gmail.com 1 Agenda Linux Overview Linux

Synchronization and Communication Making processes/threads work together Computadores II /

I t Introduction to NTCIR-7 d ti t NTCIR 7 N Noriko Kando k K d National Institute of

Software development Several users work on a same project Remote or collocated users

28. Parallel Programming II C++ Threads, Shared Memory, Concurrency, Excursion: lock algorithm

Automatic Identifjcation and Precise Attribution of DRAM Bandwidth Contention Christian Helm and

Automatic Generation of I/O Kernels for HPC Applications Babak Behzad 1 , Hoang-Vu Dang 1 , Farah

Automatically Identifying Automatically Identifying and Georeferencing Georeferencing and

Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &