use of parallel matrix algorithms for laplace partial
play

Use of parallel matrix algorithms for Laplace partial differential - PDF document

CS140 V-1 Use of parallel matrix algorithms for Laplace partial differential equations A steady-state heat-flow problem on a rectangular 10 cm 20 cm metal sheet. One edge maintains temperature of 100 degree, other three edges


  1. ✬ ✩ CS140 V-1 Use of parallel matrix algorithms for Laplace partial differential equations A steady-state heat-flow problem on a rectangular 10 cm × 20 cm metal sheet. One edge maintains temperature of 100 degree, other three edges maintain 0 degree. What are the steady-state temperatures at interior points? Temperature 0 0 0 10cm u11 u21 u31 0 100 x 0 0 20cm 0 Temperature ✫ ✪ CS, UCSB Tao Yang

  2. ✬ ✩ CS140 V-2 The mathematical model Laplace equation: ∂ 2 U ( x, y ) + ∂ 2 u ( x, y ) = 0 ∂x 2 ∂y 2 with the boundary condition: u ( x, 0) = 0 , u ( x, 10) = 0 . u (0 , y ) = 0 , u (20 , y ) = 100 . Finite difference method to solve this PDE: • Discretize the region: Divide the function domain into a grid with gap h at each axis. • At each point ( ih, jh ), let u ( ih, jh ) = u i,j . Setup a linear equation using an approximated formula for numerical differentiation . • Solve the linear equations to find values of all points u i,j . ✫ ✪ CS, UCSB Tao Yang

  3. ✬ ✩ CS140 V-3 Approximating numerical differentiation f ′ ( x ) ≈ f ( x + h ) − f ( x ) or f ′ ( x ) ≈ f ( x ) − f ( x − h ) h h f ( x + h ) − f ( x ) + f ( x ) − f ( x − h ) f ′′ ( x ) ≈ f ′ ( x + h ) − f ′ ( x ) h h ≈ h h Thus f ′′ ( x ) ≈ f ( x + h ) + f ( x − h ) − 2 f ( x ) h 2 Then ∂ 2 u ( x i , y i ) ≈ u i +1 ,j − 2 u i,j + u i − 1 ,j ∂x 2 h 2 ∂ 2 u ( x i , y i ) ≈ u i,j +1 − 2 u i,j + u i,j − 1 ∂y 2 h 2 Adding the above two equations u i +1 ,j − 2 u ij + u i − 1 ,j + u i,j +1 − 2 u i,j + u i,j − 1 = 0 Then 4 u i,j − u i +1 ,j − u i − 1 ,j − u i,j +1 − u i,j − 1 = 0 ✫ ✪ CS, UCSB Tao Yang

  4. ✬ ✩ CS140 V-4 Example of Derived Linear Heat Equations Temperature 0 0 0 10cm u11 u21 u31 0 100 x 0 0 20cm 0 Temperature For this case: Let u 11 = x 1 , u 21 = x 2 , u 31 = x 3 . At u 11 , 4 x 1 − 0 − 0 − x 2 = 0 At u 21 , 4 x 2 − x 1 − 0 − x 3 − 0 = 0 At u 31 , 4 x 3 − x 2 − 0 − 100 − 0 = 0       4 − 1 0 x 1 0        = − 1 4 − 1 x 2 0                  0 − 1 4 x 3 100 Solutions: x 1 = 1 . 786 , x 2 = 7 . 143 , x 3 = 26 . 786 ✫ ✪ CS, UCSB Tao Yang

  5. ✬ ✩ CS140 V-5 Linear heat equations for a general 2D grid Given a general ( n + 2) × ( n + 2) grid, we have n 2 equations: 4 u i,j − u i +1 ,j − u i − 1 ,j − u i,j +1 − u i,j − 1 = 0 for 1 ≤ i, j ≤ n . Or express them as: u i,j = ( u i +1 ,j + u i − 1 ,j + u i,j +1 + u i,j − 1 ) / 4 Example, r = 2 , n = 6. Temperature held at U 0 Temperature Temperature held at U held at U1 0 Temperature held at U ✫ ✪ 0 CS, UCSB Tao Yang

  6. ✬ ✩ CS140 V-6 We order the unknowns as ( u 11 , u 12 , · · · , u 1 n , u 21 , u 22 , · · · , u 2 n , · · · , u n 1 , · · · , u nn ) For n = 2, the ordering is:     x 1 u 11         x 2 u 12     =         x 3 u 21             x 4 u 22 The matrix is:       4 − 1 − 1 0 x 1 u 01 + u 10             − 1 4 0 − 1 x 2 u 20 + u 31       =             − 1 0 4 − 1 x 3 u 02 + u 13                   0 − 1 − 1 4 x 4 u 32 + u 23 ✫ ✪ CS, UCSB Tao Yang

  7. ✬ ✩ CS140 V-7 In general, the left side matrix is:   T − I    − I T − I        − I T − I     ... ... ...         − I T n 2 × n 2   4 − 1    − 1 4 − 1        T = − 1 4 − 1     ... ... ...         − 1 4 n × n ✫ ✪ CS, UCSB Tao Yang

  8. ✬ ✩ CS140 V-8   1    1        I = 1     ...         1 n × n The matrix is too sparse, direct methods for solving this system takes too much time. ✫ ✪ CS, UCSB Tao Yang

  9. ✬ ✩ CS140 V-9 The Jacobi Iterative Method Given 4 u i,j − u i +1 ,j − u i − 1 ,j − u i,j +1 − u i,j − 1 = 0 for 1 ≤ i, j ≤ n . The Jacobi program: Repeat For i=1 to n For j=1 to n u new = 0 . 25( u i +1 ,j + u i − 1 ,j + u i,j +1 + u i,j − 1 ). i,j EndFor EndFor Until � u new − u ij � < ǫ ij Called 5-point stencil computation as u i,j depends on 4 neighbors. ✫ ✪ CS, UCSB Tao Yang

  10. ✬ ✩ CS140 V-10 The Gauss-Seidel Method Repeat u old = u . For i=1 to n For j=1 to n u i,j = 0 . 25( u i +1 ,j + u i − 1 ,j + u i,j +1 + u i,j − 1 ). EndFor EndFor Until � u ij − u old ij � < ǫ ✫ ✪ CS, UCSB Tao Yang

  11. ✬ ✩ CS140 V-11 Parallel Jacobi Method Assume we have a mesh of n × n processors. Assign u i,j to processor p i,j . The SPMD Jacobi program at processor p i,j : Repeat Collect data from four neighbors: u i +1 ,j , u i − 1 ,j , u i,j +1 , u i,j − 1 from p i +1 ,j , p i − 1 ,j , p i,j +1 , p i,j − 1 . u new = 0 . 25( u i +1 ,j + u i − 1 ,j + u i,j +1 + u i,j − 1 ). i,j diff i,j = | u new − u ij | ij Do a global reduction to get the maximum of diff i,j as M . Until M < ǫ ✫ ✪ CS, UCSB Tao Yang

  12. ✬ ✩ CS140 V-12 Performance evaluation • Each computation step takes ω = 5 operations. • There are 4 communication messages to be received. Assume sequential receiving. Communication costs 4( α + β ). • Assume that the global reduction takes ( α + β ) log n . • The sequential time Seq = Kωn 2 where K is the number of steps. • Assume ω = 0 . 5 , β = 0 . 1 , α = 100 , n = 500 , p 2 = 2500. • The parallel time PT = K ( ω + (4 + log n )( α + β )) ω ∗ n 2 Speedup = ω + (4 + log n )( α + β ) ≈ 192 Efficiency = Speedup = 7 . 7% . n 2 ✫ ✪ CS, UCSB Tao Yang

  13. ✬ ✩ CS140 V-13 Grid partitioning • Reduce the number of processors. Increase the granularity of computations. • Map the n × n grid to processors using 2D block method. Assume a p × p mesh, γ = n p . Example, r = 2 , n = 6. Temperature held at U 0 Temperature Temperature held at U held at U1 0 Temperature held at U 0 ✫ ✪ CS, UCSB Tao Yang

  14. ✬ ✩ CS140 V-14 Code partitioning Re-write the kernel part of the sequential code as: For bi = 1 to p For bj = 1 to p For i = ( b i − 1) γ + 1 to b i γ For j = ( b j − 1) γ + 1 to b j γ u new = 0 . 25( u i +1 ,j + u i − 1 ,j + u i,j +1 + u i,j − 1 ). i,j EndFor EndFor EndFor EndFor ✫ ✪ CS, UCSB Tao Yang

  15. ✬ ✩ CS140 V-15 Parallel SPMD code On processor p b i ,b j : Repeat Collect the data from its four neighbors. For i = ( b i − 1) γ + 1 to b i γ For j = ( b j − 1) γ + 1 to b j γ u new = 0 . 25( u i +1 ,j + u i − 1 ,j + u i,j +1 + u i,j − 1 ). i,j EndFor EndFor Compute the local maximum diff b i ,b j for the difference between old values and new values. Do a global reduction to get the maximum diff b i ,b j as M . Until M < ǫ ✫ ✪ CS, UCSB Tao Yang

  16. ✬ ✩ CS140 V-16 Performance evaluation • At each processor, each computation step takes ωr 2 operations. • The communication cost is 4( α + rβ ). • Assume that the global reduction takes ( α + β ) log p . • The number of steps is K . • Assume ω = 0 . 5 , β = 0 . 1 , α = 100 , n = 500 , r = 100 , p 2 = 25. PT = K ( r 2 ω + (4 + log p )( α + rβ )) ωr 2 p 2 Speedup = r 2 ω + (4 + log p )( α + rβ ) ≈ 21 . 2 . Efficiency = 84% . ✫ ✪ CS, UCSB Tao Yang

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend