numerical algorithms
play

Numerical Algorithms Matrices A Review Column a 0,0 a 0,1 a 0, m - PDF document

Numerical Algorithms Matrices A Review Column a 0,0 a 0,1 a 0, m 2 a 0, m 1 a 1,0 a 1,1 a 1, m 2 a 1, m 1 Row a n 2,0 a n 2,1 a n 2, m -2 a n 2, m 1 a n 1,0 a n 1,1 a n 1, m 2 a n 1, m


  1. Numerical Algorithms Matrices — A Review Column a 0,0 a 0,1 a 0, m − 2 a 0, m − 1 a 1,0 a 1,1 a 1, m − 2 a 1, m − 1 Row a n − 2,0 a n − 2,1 a n − 2, m -2 a n − 2, m − 1 a n − 1,0 a n − 1,1 a n − 1, m − 2 a n − 1, m − 1 An n × m matrix. Figure 10.1 335 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

  2. Matrix Addition Matrix addition simply involves adding corresponding elements of each matrix to form the result matrix. Given the elements of A as a i , j and the elements of B as b i , j , each element of C is computed as c i , j = a i , j + b i , j (0 ≤ i < n , 0 ≤ j < m ) 336 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

  3. Matrix Multiplication Multiplication of two matrices, A and B , produces the matrix C whose elements, c i , j (0 ≤ i < n , 0 ≤ j < m ), are computed as follows: l – 1 ∑ c i j = a i,k b k,j , k = 0 where A is an n × l matrix and B is an l × m matrix. Column Sum Multiply results j Row i c i , j × = A B C Matrix multiplication, C = A × B . Figure 10.2 337 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

  4. Matrix-Vector Multiplication A vector is a matrix with one column; i.e., an n × 1 matrix. Matrix-vector multiplication follows directly from the definition of matrix-matrix multipli- cation by making B an n × 1 matrix. The result will be an n × 1 matrix (vector). × = A b c Row sum c i i Figure 10.3 Matrix-vector multiplication c = A × b . 338 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

  5. Relationship of Matrices to Linear Equations A system of linear equations can be written in matrix form Ax = b The matrix A holds the a constants, x is a vector of the unknowns, and b is a vector of the b constants. 339 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

  6. Implementing Matrix Multiplication Sequential Code Assume throughout that the matrices are square ( n × n matrices). The sequential code to compute A × B could simply be for (i = 0; i < n; i++) for (j = 0; j < n; j++) { c[i][j] = 0; for (k = 0; k < n; k++) c[i][j] = c[i][j] + a[i][k] * b[k][j]; } This algorithm requires n 3 multiplications and n 3 additions, leading to a sequential time complexity of Ο ( n 3 ). 340 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

  7. Parallel Code Usually based upon the direct sequential matrix multiplication algorithm. Sequential code has independent loops. With n processors (and n × n matrices), we can expect a parallel time complexity of Ο ( n 2 ), and this is easily obtainable. Also quite easy to obtain a time complexity of Ο ( n ) with n 2 processors, where one element of A and B are assigned to each processor. These implementations are cost optimal [since Ο ( n 3 ) = n × Ο ( n 2 ) = n 2 × Ο ( n )]. Possible to obtain Ο (log n ) with n 3 processors by parallelizing the inner loop - not cost optimal [since Ο ( n 3 ) ≠ n 3 × Ο (log n )]. Ο (log n ) is the lower bound for parallel matrix multiplication. 341 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

  8. Partitioning into Submatrices Usually, we want to use far fewer than n processors with n × n matrices because of the size of n . Submatrices Each matrix can be divided into blocks of elements called submatrices . These submatrices can be manipulated as if there were single matrix elements. Suppose the matrix is divided into s 2 submatrices. Each submatrix has n / s × n / s elements. Using the notation A p,q as the submatrix in submatrix row p and submatrix column q : for (p = 0; p < s; p++) for (q = 0; q < s; q++) { C p,q = 0; /* clear elements of submatrix */ for (r = 0; r < m; r++) /* submatrix multiplication and */ C p,q = C p,q + A p,r * B r,q ; /* add to accumulating submatrix */ } The line C p,q = C p,q + A p,r * B r,q ; means multiply submatrix A p,r and B r,q using matrix multiplication and add to submatrix C p,q using matrix addition. The arrangement is known as block matrix multiplication . Sum q Multiply results p × = A B C Figure 10.4 Block matrix multiplication. 342 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

  9. a 0,0 a 0,1 a 0,2 a 0,3 b 0,0 b 0,1 b 0,2 b 0,3 a 1,0 a 1,1 a 1,2 a 1,3 b 1,0 b 1,1 b 1,2 b 1,3 × a 2,0 a 2,1 a 2,2 a 2,3 b 2,0 b 2,1 b 2,2 b 2,3 a 3,0 a 3,1 a 3,2 a 3,3 b 3,0 b 3,1 b 3,2 b 3,3 (a) Matrices A 0,0 B 0,0 A 0,1 B 1,0 a 0,0 a 0,1 b 0,0 b 0,1 a 0,2 a 0,3 b 2,0 b 2,1 × × + a 1,0 a 1,1 b 1,0 b 1,1 a 1,2 a 1,3 b 3,0 b 3,1 a 0,0 b 0,0 + a 0,1 b 1,0 a 0,0 b 0,1 + a 0,1 b 1,1 a 0,2 b 2,0 + a 0,3 b 3,0 a 0,2 b 2,1 + a 0,3 b 3,1 = + a 1,0 b 0,0 + a 1,1 b 1,0 a 1,0 b 0,1 + a 1,1 b 1,1 a 1,2 b 2,0 + a 1,3 b 3,0 a 1,2 b 2,1 + a 1,3 b 3,1 a 0,0 b 0,0 + a 0,1 b 1,0 + a 0,2 b 2,0 + a 0,3 b 3,0 a 0,0 b 0,1 + a 0,1 b 1,1 + a 0,2 b 2,1 + a 0,3 b 3,1 = a 1,0 b 0,0 + a 1,1 b 1,0 + a 1,2 b 2,0 + a 1,3 b 3,0 a 1,0 b 0,1 + a 1,1 b 1,1 + a 1,2 b 2,1 + a 1,3 b 3,1 = C 0,0 (b) Multiplying A 0,0 × B 0,0 to obtain C 0,0 Figure 10.5 Submatrix multiplication. 343 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

  10. Direct Implementation Allocate one processor to compute each element of C . Then n 2 processors would be needed. One row of elements of A and one column of elements of B are needed. Some of the same elements must be sent to more than one processor. Using submatrices, one processor would compute one m × m submatrix of C . Column j b[][j] Row i a[i][] Processor P i , j Figure 10.6 Direct implementation of c[i][j] matrix multiplication. 344 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

  11. Analysis Assuming n × n matrices and not submatrices. Communication With separate messages to each of the n 2 slave processors: t comm = n 2 ( t startup + 2 nt data ) + n 2 ( t startup + t data ) = n 2 (2 t startup + (2 n + 1) t data ) A broadcast along a single bus would yield t comm = ( t startup + n 2 t data ) + n 2 ( t startup + t data ) Dominant time now is in returning the results as t startup is usually significantly larger than t data . A gather routine should reduce the time. Computation Each slave performs in parallel n multiplications and n additions; i.e., t comp = 2 n 345 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

  12. Performance Improvement By using a tree construction so that n numbers can be added in log n steps using n proces- sors: a 0,0 b 0,0 a 0,1 b 1,0 a 0,2 b 2,0 a 0,3 b 3,0 × × × × P 0 P 1 P 2 P 3 + + P 0 P 2 + P 0 Figure 10.7 Accumulation using a tree c 0,0 construction. Computational time complexity of Ο (log n ) using n 3 processors Instead of Ο ( n ) using n 2 processors. 346 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

  13. Submatrices In every method, we can substitute submatrices for matrix elements to reduce the number of processors. Let us select m × m submatrices and s = n / m . Then there are s 2 submatrices in each matrix and s 2 processors. Communication Each of the s 2 slave processors must separately receive one row and one column of subma- trices. Each slave processor must separately return a C submatrix to the master processor giving a communication time of t comm = s 2 { 2( t startup + nmt data ) + ( t startup + m 2 t data ) } = ( n / m ) 2 { 3 t startup + ( m 2 + 2 nm ) t data } Complete matrices could be broadcast to every processor. As the size of the matrices, m , is increased, the data transmission time of each message increases but the actual number of messages decreases. Computation One sequential submatrix multiplication requires m 3 multiplications and m 3 additions. A submatrix addition requires m 2 additions. Hence t comp = s (2 m 3 + m 2 ) = Ο ( sm 3 ) = Ο ( nm 2 ) 347 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend