Principle Of Parallel Algorithm Design (cont.) Alexandre David - PowerPoint PPT Presentation

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Today � Characteristics of Tasks and Interactions (3.3). � Mapping Techniques for Load Balancing (3.4). � Methods for Containing Interaction Overhead (3.5). � Parallel Algorithm Models (3.6). 24-02-2006 Alexandre David, MVP'06 2

So Far… � Decomposition techniques. � Identify tasks. � Analyze with task dependency & interaction graphs. � Map tasks to processes. � Now properties of tasks that affect a good mapping. � Task generation, size, and size of data. 24-02-2006 Alexandre David, MVP'06 3

Task Generation � Static task generation. � Tasks are known beforehand. � Apply to well-structured problems. � Dynamic task generation. � Tasks generated on-the-fly. � Tasks & task dependency graph not available beforehand. 24-02-2006 Alexandre David, MVP'06 4

Task Sizes � Relative amount of time for completion. � Uniform – same size for all tasks. � Matrix multiplication. � Non-uniform. � Optimization & search problems. 24-02-2006 Alexandre David, MVP'06 5

Size of Data Associated with Tasks � Important because of locality reasons. � Different types of data with different sizes � Input/output/intermediate data. � Size of context – cheap or expensive communication with other tasks. 24-02-2006 Alexandre David, MVP'06 6

Characteristics of Task Interactions � Static interactions. � Tasks and interactions known beforehand. � And interaction at pre-determined times. � Dynamic interactions. � Timing of interaction unknown. � Or set of tasks not known in advance. 24-02-2006 Alexandre David, MVP'06 7

Characteristics of Task Interactions � Regular interactions. � The interaction graph follows a pattern. � Irregular interactions. � No pattern. 24-02-2006 Alexandre David, MVP'06 8

Example: Image Dithering 24-02-2006 Alexandre David, MVP'06 9

Example: Sparse Matrix* Vector 24-02-2006 Alexandre David, MVP'06 10

Characteristics of Task Interactions � Data sharing interactions: � Read-only interactions. � Read only data associated with other tasks. � Read-write interactions. � Read & modify data of other tasks. 24-02-2006 Alexandre David, MVP'06 11

Characteristics of Task Interactions � One-way interactions. � Only one task initiates and completes the communication without interrupting the other one. � Two-way interactions. � Producer – consumer model. 24-02-2006 Alexandre David, MVP'06 12

Mapping Techniques for Load Balancing � Map tasks onto processes. � Goal: minimize overheads. � Communication. � Idling. � Uneven load distribution may cause idling. � Constraints from task dependency → wait for other tasks. 24-02-2006 Alexandre David, MVP'06 13

14 Alexandre David, MVP'06 Example 24-02-2006

Mapping Techniques � Static mapping. � NP-complete problem for non-uniform tasks. � Large data compared to computation. � Dynamic mapping. � Dynamically generated tasks. � Task size unknown. 24-02-2006 Alexandre David, MVP'06 15

Schemes for Static Mapping � Mappings based on data partitioning. � Mappings based on task graph partitioning. � Hybrid mappings. 24-02-2006 Alexandre David, MVP'06 16

Array Distribution Scheme � Combine with “owner computes” rule to partition into sub-tasks. 1-D block distribution scheme. 24-02-2006 Alexandre David, MVP'06 17

Block Distribution cont. Generalize to higher dimensions: 4x4, 2x8. 24-02-2006 Alexandre David, MVP'06 18

Example: Matrix* Matrix � Partition output of C= A* B. � Each entry needs the same amount of computation. � Blocks on 1 or 2 dimensions. � Different data sharing patterns. � Higher dimensional distributions � means we can use more processes . � sometimes reduces interaction. 24-02-2006 Alexandre David, MVP'06 19

20 Alexandre David, MVP'06 24-02-2006

Imbalance Problem � If the amount of computation associated with data varies a lot then block decomposition leads to imbalances . � Example: LU factorization (or Gaussian elimination). Computations 24-02-2006 Alexandre David, MVP'06 21

LU Factorization � Non singular square matrix A (invertible). � A = L* U. � Useful for solving linear equations. U A L 24-02-2006 Alexandre David, MVP'06 22

LU Factorization In practice we work on A. N steps 24-02-2006 Alexandre David, MVP'06 23

LU Algorithm Proc LU(A) begin U[k,k] for k := 1 to n-1 do for j := k+1 to n do Normalize L A[j,k] := A[j,k]/A[k,k] U[k,j] := A[k,j]/L[k,k] endfor L[j,k] for j := k+1 to n do for i := k+1 to n do A A[i,j] := A[i,j] – A[i,k]*A[k,j] endfor U endfor L[i,k] U[k,j] L endfor end 24-02-2006 Alexandre David, MVP'06 24

Another Variant for k := 1 to n-1 do for j := k+1 to n do A[k,j] := A[k,j]/A[k,k] for i := k+1 to n do A[i,j] := A[i,j] – A[i,k]*A[k,j] endfor endfor endfor 24-02-2006 Alexandre David, MVP'06 25

Decomposition 24-02-2006 Alexandre David, MVP'06 26

Cyclic and Block-Cyclic Distributions � Idea: � Partition an array into many more blocks than available processes . � Assign partitions (tasks) to processes in a round-robin manner. � → each process gets several non adjacent blocks. 24-02-2006 Alexandre David, MVP'06 27

Block-Cyclic Distributions a) Partition 16x16 into 2*4 groups of 2 rows. α p groups of n/ α p rows. b) Partition 16x16 into square blocks of size 4*4 distributed on 2*2 processes. α 2 p groups of n/ α 2 p squares. 24-02-2006 Alexandre David, MVP'06 28

Randomized Distributions Irregular distribution with regular mapping! Not good. 24-02-2006 Alexandre David, MVP'06 29

1-D Randomized Distribution Permutation 24-02-2006 Alexandre David, MVP'06 30

2-D Randomized Distribution 2-D block random distribution. Block mapping. 24-02-2006 Alexandre David, MVP'06 31

Graph Partitioning � For sparse data structures and data dependent interaction patterns. � Numerical simulations. Discretize the problem and represent it as a mesh. � Sparse matrix: assign equal number of nodes to processes & minimize interaction. � Example: simulation of dispersion of a water contaminant in Lake Superior. 24-02-2006 Alexandre David, MVP'06 32

Discretization 24-02-2006 Alexandre David, MVP'06 33

Partitioning Lake Superior Random partitioning. Partitioning with minimum edge cut. Finding an exact optimal partitioning is an NP-complete problem. 24-02-2006 Alexandre David, MVP'06 34

Mappings Based on Task Partitioning � Partition the task dependency graph. � Good when static task dependency graph with known task sizes. Mapping on 8 processes. 24-02-2006 Alexandre David, MVP'06 35

Sparse Matrix* Vector 24-02-2006 Alexandre David, MVP'06 36

Sparse Matrix* Vector 24-02-2006 Alexandre David, MVP'06 37

Hierarchical Mappings � Combine several mapping techniques in a structured (hierarchical) way. � Task mapping of a binary tree (quicksort) does not use all processors. � Mapping based on task dependency graph (hierarchy) & block. 24-02-2006 Alexandre David, MVP'06 38

Binary Tree -> Hierarchical Block Mapping 24-02-2006 Alexandre David, MVP'06 39

Schemes for Dynamic Mapping � Centralized Schemes. � Master manages pool of tasks. � Slaves obtain work. � Limited scalability. � Distributed Schemes. � Processes exchange tasks to balance work. � Not simple, many issues. 24-02-2006 Alexandre David, MVP'06 40

Minimizing Interaction Overheads � Maximize data locality. � Minimize volume of data-exchange. � Minimize frequency of interactions. � Minimize contention and hot spots. � Share a link, same memory block, etc… � Re-design original algorithm to change the interaction pattern. 24-02-2006 Alexandre David, MVP'06 41

Minimizing Interaction Overheads � Overlapping computations with interactions – to reduce idling. � Initiate interactions in advance. � Non-blocking communications. � Multi-threading. � Replicating data or computation. � Group communication instead of point to point. � Overlapping interactions. 24-02-2006 Alexandre David, MVP'06 42

Overlapping Interactions 24-02-2006 Alexandre David, MVP'06 43

Parallel Algorithm Models � Data parallel model. � Tasks statically mapped. � Similar operations on different data. � SIMD. � Task graph model. � Start from task dependency graph. � Use task interaction graph to promote locality. 24-02-2006 Alexandre David, MVP'06 44

Parallel Algorithm Models � Work pool (or task pool) model. � No pre-mapping – centralized or not. � Master-slave model. � Master generates work for slaves – allocation static or dynamic. � Pipeline or producer – consumer model. � Stream of data traverses processes – stream parallelism. 24-02-2006 Alexandre David, MVP'06 45

Principle Of Parallel Algorithm Design (cont.) Alexandre David - PowerPoint PPT Presentation

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction Overhead (3.5).

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Cleani ning C ng Cont ontract Cleani ning C ng Cont ontract Cleani ning C ng Cont

Why LINEX Our Explanation (cont-d) Our Explanation (cont-d) (Linear Exponential) Our

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Class 14 Slides SLIDE what is the designing principle how does designing principle

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.1 Parallel Algorithm

Minimal ConT EXt Distribution Mojca Miklavec, BachoT EX 2008 Specifics of ConT EXt

Migration to ConT EXt? First experience with ConT EXt typesetting Tom Hla KONVOJ

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

CS171 Introduction to Computer Science II Recursion (cont.) + MergeSort Recursion (cont.) +

Principle of Parallel Algorithm Design Alexandre David B2-206 Today Preliminaries (3.1).

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Reducing Extraneous Processing Modality Principle Jan L. Plass, ECT Coherence Principle

End-to-End principle End-to-end Principle Broad networking principle First implementation

2/2/2015 FUNDAMENTAL LEGAL PRINCIPLES Principle of Indemnity Principle of Insurable

8. Mid-level processing (data preparation) Andr Jalobeanu LSIIT / MIV / PASEO group Jan. 2006

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

The NISP Spectroscopy performance Evalua8on done for the MPDR A.Ealet CPPM WITH J.Amiaux,

One-bit compressed sensing with Gaussian circulant matrices Sjoerd Dirksen (RWTH Aachen

Intra-Pulse Beam-Beam Scans at the NLC IP Steve Smith SLAC Nanobeams 2002 Beam-Beam Scans

(NEMO) Intent Language Bert Wijnen bwietf@bwijnen.net NEMO Language IETF95 1

Cyberinfrastructure Futures IaaS, SDN, S-DMZ, and All That

Hercules Bulk Data Transfer over SCION Presented by Matthias Frei and Franois Wirz Project