Parallel Numerical Algorithms Chapter 3 Dense Linear Systems - PowerPoint PPT Presentation

Triangular Systems Parallel Algorithms Wavefront Algorithms Parallel Numerical Algorithms Chapter 3 – Dense Linear Systems Section 3.3 – Triangular Linear Systems Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 42

Triangular Systems Parallel Algorithms Wavefront Algorithms Outline Triangular Systems 1 Parallel Algorithms 2 Wavefront Algorithms 3 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 2 / 42

Triangular Systems Parallel Algorithms Wavefront Algorithms Triangular Matrices Matrix L is lower triangular if all entries above its main diagonal are zero, ℓ ij = 0 for i < j Matrix U is upper triangular if all entries below its main diagonal are zero, u ij = 0 for i > j Triangular matrices are important because triangular linear systems are easily solved by successive substitution Most direct methods for solving general linear systems first reduce matrix to triangular form and then solve resulting equivalent triangular system(s) Triangular systems are also frequently used as preconditioners in iterative methods for solving linear systems Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 3 / 42

Triangular Systems Parallel Algorithms Wavefront Algorithms Forward Substitution For lower triangular system Lx = b , solution can be obtained by forward substitution i − 1 � � � x i = b i − ℓ ij x j /ℓ ii , i = 1 , . . . , n j =1 for j = 1 to n x j = b j /ℓ jj { compute soln component } for i = j + 1 to n b i = b i − ℓ ij x j { update right-hand side } end end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 4 / 42

Triangular Systems Parallel Algorithms Wavefront Algorithms Back Substitution For upper triangular system Ux = b , solution can be obtained by back substitution n � � � x i = b i − u ij x j /u ii , i = n, . . . , 1 j = i +1 for j = n to 1 x j = b j /u jj { compute soln component } for i = 1 to j − 1 b i = b i − u ij x j { update right-hand side } end end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 5 / 42

Triangular Systems Parallel Algorithms Wavefront Algorithms Solving Triangular Systems Forward or back substitution requires about n 2 / 2 multiplications and similar number of additions, so serial exeuction time is T 1 = Θ( γn 2 ) We will consider only lower triangular systems, as analogous algorithms for upper triangular systems are similar Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 6 / 42

Triangular Systems Parallel Algorithms Wavefront Algorithms Loop Orderings for Forward Substitution for j = 1 to n for i = 1 to n x j = b j /ℓ jj for j = 1 to i − 1 for i = j + 1 to n b i = b i − ℓ ij x j b i = b i − ℓ ij x j end end x i = b i /ℓ ii end end right-looking left-looking immediate-update delayed-update data-driven demand-driven fan-out fan-in Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 7 / 42

Triangular Systems Fine-Grain Algorithm Parallel Algorithms 2-D Algorithm Wavefront Algorithms Parallel Algorithm Partition For i = 2 , . . . , n , j = 1 , . . . , i − 1 , fine-grain task ( i, j ) stores ℓ ij and computes product ℓ ij x j For i = 1 , . . . , n , fine-grain task ( i, i ) stores ℓ ii and b i , collects sum t i = � i − 1 j =1 ℓ ij x j , and computes and stores x i = ( b i − t i ) /ℓ ii yielding 2-D triangular array of n ( n + 1) / 2 fine-grain tasks Communicate For j = 1 , . . . , n − 1 , task ( j, j ) broadcasts x j to tasks ( i, j ) , i = j + 1 , . . . , n For i = 2 , . . . , n , sum reduction of products ℓ ij x j across tasks ( i, j ) , j = 1 , . . . , i , with task ( i, i ) as root Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 8 / 42

Triangular Systems Fine-Grain Algorithm Parallel Algorithms 2-D Algorithm Wavefront Algorithms Fine-Grain Tasks and Communication ℓ 11 b 1 x 1 ℓ 22 ℓ 21 b 2 x 2 ℓ 33 ℓ 31 ℓ 32 b 3 x 3 ℓ 44 ℓ 41 ℓ 42 ℓ 43 b 4 x 4 ℓ 55 ℓ 51 ℓ 52 ℓ 53 ℓ 54 b 5 x 5 ℓ 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 b 6 x 6 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 9 / 42

Triangular Systems Fine-Grain Algorithm Parallel Algorithms 2-D Algorithm Wavefront Algorithms Fine-Grain Parallel Algorithm if i = j then t = 0 if i > 1 then recv sum reduction of t across tasks ( i, k ) , k = 1 , . . . , i end x i = ( b i − t ) /ℓ ii broadcast x i to tasks ( k, i ) , k = i + 1 , . . . , n else recv broadcast of x j from task ( j, j ) t = ℓ ij x j reduce t across tasks ( i, k ) , k = 1 , . . . , i end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 10 / 42

Triangular Systems Fine-Grain Algorithm Parallel Algorithms 2-D Algorithm Wavefront Algorithms Fine-Grain Algorithm If communication is suitably pipelined, then fine-grain algorithm can achieve Θ( n ) execution time, but uses Θ( n 2 ) tasks, so it is inefficient If there are multiple right-hand-side vectors b , then successive solutions can be pipelined to increase overall efficiency Agglomerating fine-grain tasks yields more reasonable number of tasks and improves ratio of computation to communication Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 11 / 42

Triangular Systems Fine-Grain Algorithm Parallel Algorithms 2-D Algorithm Wavefront Algorithms Agglomeration Agglomerate With n × n array of fine-grain tasks, natural strategies are 2-D: combine k × k subarray of fine-grain tasks to form each coarse-grain task, yielding ( n/k ) 2 coarse-grain tasks 1-D column: combine n fine-grain tasks in each column into coarse-grain task, yielding n coarse-grain tasks 1-D row: combine n fine-grain tasks in each row into coarse-grain task, yielding n coarse-grain tasks Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 12 / 42

Triangular Systems Fine-Grain Algorithm Parallel Algorithms 2-D Algorithm Wavefront Algorithms 2-D Agglomeration ℓ 11 b 1 x 1 ℓ 22 ℓ 21 b 2 x 2 ℓ 33 ℓ 31 ℓ 32 b 3 x 3 ℓ 44 ℓ 41 ℓ 42 ℓ 43 b 4 x 4 ℓ 55 ℓ 51 ℓ 52 ℓ 53 ℓ 54 b 5 x 5 ℓ 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 b 6 x 6 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 13 / 42

Triangular Systems Fine-Grain Algorithm Parallel Algorithms 2-D Algorithm Wavefront Algorithms 1-D Column Agglomeration ℓ 11 b 1 x 1 ℓ 22 ℓ 21 b 2 x 2 ℓ 33 ℓ 31 ℓ 32 b 3 x 3 ℓ 44 ℓ 41 ℓ 42 ℓ 43 b 4 x 4 ℓ 55 ℓ 51 ℓ 52 ℓ 53 ℓ 54 b 5 x 5 ℓ 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 b 6 x 6 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 14 / 42

Triangular Systems Fine-Grain Algorithm Parallel Algorithms 2-D Algorithm Wavefront Algorithms 1-D Row Agglomeration ℓ 11 b 1 x 1 ℓ 22 ℓ 21 b 2 x 2 ℓ 33 ℓ 31 ℓ 32 b 3 x 3 ℓ 44 ℓ 41 ℓ 42 ℓ 43 b 4 x 4 ℓ 55 ℓ 51 ℓ 52 ℓ 53 ℓ 54 b 5 x 5 ℓ 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 b 6 x 6 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 15 / 42

Triangular Systems Fine-Grain Algorithm Parallel Algorithms 2-D Algorithm Wavefront Algorithms Mapping Map 2-D: assign ( n/k ) 2 /p coarse-grain tasks to each of p processors using any desired mapping in each dimension, treating target network as 2-D mesh 1-D: assign n/p coarse-grain tasks to each of p processors using any desired mapping, treating target network as 1-D mesh Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 16 / 42

Triangular Systems Fine-Grain Algorithm Parallel Algorithms 2-D Algorithm Wavefront Algorithms 1-D Column Agglomeration, Block Mapping ℓ 11 b 1 x 1 ℓ 22 ℓ 21 b 2 x 2 ℓ 33 ℓ 31 ℓ 32 b 3 x 3 ℓ 44 ℓ 41 ℓ 42 ℓ 43 b 4 x 4 ℓ 55 ℓ 51 ℓ 52 ℓ 53 ℓ 54 b 5 x 5 ℓ 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 b 6 x 6 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 17 / 42

Triangular Systems Fine-Grain Algorithm Parallel Algorithms 2-D Algorithm Wavefront Algorithms 1-D Column Agglomeration, Cyclic Mapping ℓ 11 b 1 x 1 ℓ 22 ℓ 21 b 2 x 2 ℓ 33 ℓ 31 ℓ 32 b 3 x 3 ℓ 44 ℓ 41 ℓ 42 ℓ 43 b 4 x 4 ℓ 55 ℓ 51 ℓ 54 ℓ 52 ℓ 53 b 5 x 5 ℓ 66 ℓ 61 ℓ 64 ℓ 62 ℓ 65 ℓ 63 b 6 x 6 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 18 / 42

Triangular Systems Fine-Grain Algorithm Parallel Algorithms 2-D Algorithm Wavefront Algorithms 1-D Block-Cyclic Algorithm Execution Time With block-size b , 1D partitioning requires n/b broadcasts of b -items for row-agglomeration requires n/b reductions of b -items for column-agglomeration in both cases O ( b 2 ) work must be done to solve for b entries of x between each of the n/b collectives The overall execution time is � � α ( n/b ) log( p ) + βn + γ ( n 2 /p + nb ) T p ( n, b ) = Θ Selecting block-size b = n/p , parallel execution time is � � αp log( p ) + βn + γn 2 /p T p ( n, n/p ) = Θ Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 19 / 42

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems - PowerPoint PPT Presentation

Triangular Systems Parallel Algorithms Wavefront Algorithms Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular Linear Systems Michael T. Heath and Edgar Solomonik Department of Computer Science

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.1 Parallel Algorithm

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.3 Parallel

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms for Heterogeneous Parallel Computers Antonio M. Vidal Maci a

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.2 LU Factorization

Distributed Systems Principles and Paradigms Chapter 04 (version February 18, 2008 ) Maarten van

Practical Auction Design Peter Cramton Professor of Economics, University of Maryland Chairman,

A New Way to Infer CSM Properties Ryan Foley University of Illinois Single Degenerate Wind

Quantum Monte Carlo calculations of Neutrino-Nucleus Interactions PONDD Physics Opportunities in

CMP722 ADVANCED COMPUTER VISION Lecture #9 Graph Networks Aykut Erdem // Hacettepe

Clustering CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani

D ISTRIBUTED S YSTEMS [COMP9243] Migration: a file can transparently move to another server

Parallel Programming and Heterogeneous Computing Non-Uniform Memory Access Max Plauth, Sven

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems - PowerPoint PPT Presentation

Triangular Systems Parallel Algorithms Wavefront Algorithms Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular Linear Systems Michael T. Heath and Edgar Solomonik Department of Computer Science

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.1 Parallel Algorithm

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.3 Parallel

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts &amp; Definitions

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms for Heterogeneous Parallel Computers Antonio M. Vidal Maci a

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.2 LU Factorization

Distributed Systems Principles and Paradigms Chapter 04 (version February 18, 2008 ) Maarten van

Practical Auction Design Peter Cramton Professor of Economics, University of Maryland Chairman,

A New Way to Infer CSM Properties Ryan Foley University of Illinois Single Degenerate Wind

Quantum Monte Carlo calculations of Neutrino-Nucleus Interactions PONDD Physics Opportunities in

CMP722 ADVANCED COMPUTER VISION Lecture #9 Graph Networks Aykut Erdem // Hacettepe

Clustering CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani

D ISTRIBUTED S YSTEMS [COMP9243] Migration: a file can transparently move to another server

Parallel Programming and Heterogeneous Computing Non-Uniform Memory Access Max Plauth, Sven

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions