new approaches to the direct solution of large scale
play

New Approaches to the Direct Solution of Large-Scale Banded Linear - PowerPoint PPT Presentation

www.bsc.es New Approaches to the Direct Solution of Large-Scale Banded Linear Systems Samuel Rodrguez Bernabeu samuel.rodriguez@bsc.es San Jos, California 2 Solving the linear system after reordering 10 Million degrees of freedom, double


  1. www.bsc.es New Approaches to the Direct Solution of Large-Scale Banded Linear Systems Samuel Rodríguez Bernabeu samuel.rodriguez@bsc.es San José, California

  2. 2

  3. Solving the linear system after reordering 10 Million degrees of freedom, double precision, complex linear system with a bandwidth of 2500 may require 1490 GB just to store LU matrix factors using LU factorization with partial pivoting. It is possible to solve it with less than 60 GB 3

  4. Solving the linear system 4

  5. Solving the linear system after reordering → Azad, Ariful, et al. "The Reverse Cuthill-McKee Algorithm in Distributed-Memory." arXiv preprint arXiv:1610.08128 (2016).. 5

  6. BUILDING BLOCK: LM-SPIKE: MEMORY FRIENDLY DIRECT SOLVER

  7. SPIKE algorithm Algorithm 1: SPIKE solver pseudocode. Given A , P and x , solves the A 1 x = P T b for x and returns the Px � PAP T � linear system Data: A , P , b B 1 Result: P T x 1 A := PAP T A 2 C 2 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions( A , nrhs ) B 2 4 for i in p do  0 � V i = A − 1 A 3 5 C 3 i B i  C i � W i = A − 1 6 i 0 B 3 F i = A − 1 i b i 7 A 4 8 end C 4 x = SolveReducedSystem( V top,bottom , W top,bottom , F top,bottom 9 ¯ ) i i i 10 for i in p do  0 ✓ �  C i � ◆ x top x i = A − 1 x bottom ¯ ¯ F i − 11 i +1 − i i − 1 B i 0 12 end 13 x = P T x Illustration block matrix decomposition layout used by the SPIKE algorithm for a 4 partitions case ( p = 4). fully asynchronous work prevents scalability A parallel hybrid banded system solver: the SPIKE algorithm. Parallel computing , 32 (2), 177-194. Polizzi, E., & Sameh, A. H. (2006). 7

  8. SPIKE algorithm Algorithm 1: SPIKE solver pseudocode. Given A , P and x , solves the A 1 x = P T b for x and returns the Px � PAP T � linear system Data: A , P , b B 1 Result: P T x 1 A := PAP T A 2 C 2 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions( A , nrhs ) B 2 4 for i in p do  0 � V i = A − 1 A 3 5 C 3 i B i  C i � W i = A − 1 6 i 0 B 3 F i = A − 1 i b i 7 A 4 8 end C 4 x = SolveReducedSystem( V top,bottom , W top,bottom , F top,bottom 9 ¯ ) i i i 10 for i in p do  0 ✓ �  C i � ◆ x top x i = A − 1 x bottom ¯ ¯ F i − 11 i +1 − i i − 1 B i 0 12 end 13 x = P T x Illustration block matrix decomposition layout used by the SPIKE algorithm for a 4 partitions case ( p = 4). fully asynchronous work prevents scalability A parallel hybrid banded system solver: the SPIKE algorithm. Parallel computing , 32 (2), 177-194. Polizzi, E., & Sameh, A. H. (2006). 8

  9. Controlling memory consumption PARDISO LM-SPIKE (p=5) LM-SPIKE (p=20) 9

  10. FIRST IDEA: LOCAL MATRIX BANDWIDTH REDUCTION

  11. Could we reduce the bandwidth locally ? → 11

  12. Could we reduce the bandwidth locally ? → 12

  13. Could we reduce the bandwidth locally ? →  �  �  �  A − 1 B � A B I A I 0 0 = CA − 1 D − CA − 1 B C D I I 0 0 13

  14. Could we reduce the bandwidth locally ? +20 Million degrees of freedom linear system arising from frequency domain discretization of Maxwell’s equations for CSEM. Notice that the inclusion of high-conductivity air layer makes the problem highly ill-conditioned. 14

  15. SECOND IDEA: COMPUTING A SUBSET OF THE INVERSE (JUST FOR DIAGONALLY DOMINANT CASES?)

  16. Computing selected elements of the inverse A − 1 A − 1 A − 1     b 11 11 12 13 Algorithm 1: SPIKE solver pseudocode. Given A , P and x , solves the x = A − 1 b = A − 1 A − 1 A − 1 b 21 x = P T b for x and returns the Px � PAP T � linear system     21 22 23 A − 1 A − 1 A − 1 b 31 31 32 33 Data: A , P , b Result: P T x 1 A := PAP T x 11 = A − 1 11 b 11 + A − 1 12 b 21 + A − 1 13 b 31 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions( A , nrhs ) 4 for i in p do  0 � V i = A − 1 5 i B i  C i � W i = A − 1 6 i 0 F i = A − 1 i b i 7 8 end x = SolveReducedSystem( V top,bottom , W top,bottom , F top,bottom 9 ¯ ) i i i 10 for i in p do  0 ✓ �  C i � ◆ x top x i = A − 1 x bottom ¯ ¯ F i − 11 i +1 − i i − 1 B i 0 12 end 13 x = P T x Kuzmin, A., Luisier, M., & Schenk, O. (2013, August). Fast methods for computing selected elements of the Green’s function in massively parallel nanoelectronic device simulations. In European Conference on Parallel Processing (pp. 533-544). Springer Berlin Heidelberg. 16

  17. Computing selected elements of the inverse A − 1 A − 1 A − 1     b 11 11 12 13 Algorithm 1: SPIKE solver pseudocode. Given A , P and x , solves the x = A − 1 b = A − 1 A − 1 A − 1 b 21 x = P T b for x and returns the Px � PAP T � linear system     21 22 23 A − 1 A − 1 A − 1 b 31 31 32 33 Data: A , P , b Result: P T x 1 A := PAP T x 11 = A − 1 11 b 11 + A − 1 12 b 21 + A − 1 13 b 31 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions( A , nrhs ) 4 for i in p do ?  0 � V i = A − 1 Diagonal Dominance 5 i B i y  C i � W i = A − 1 6 i 0 F i = A − 1 i b i 7 x 11 ≈ A − 1 11 b 11 8 end x = SolveReducedSystem( V top,bottom , W top,bottom , F top,bottom 9 ¯ ) i i i 10 for i in p do  0 ✓ �  C i � ◆ x top x i = A − 1 x bottom ¯ ¯ F i − 11 i +1 − i i − 1 B i 0 12 end 13 x = P T x Demko, S., Moss, W. F., & Smith, P. W. (1984). Decay rates for inverses of band matrices. Mathematics of computation , 43 (168), 491-499. 17

  18. Computing selected elements of the inverse A − 1 A − 1 A − 1     b 11 11 12 13 Algorithm 1: SPIKE solver pseudocode. Given A , P and x , solves the x = A − 1 b = A − 1 A − 1 A − 1 b 21 x = P T b for x and returns the Px � PAP T � linear system     21 22 23 A − 1 A − 1 A − 1 b 31 31 32 33 Data: A , P , b Result: P T x 1 A := PAP T x 11 = A − 1 11 b 11 + A − 1 12 b 21 + A − 1 13 b 31 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions( A , nrhs ) 4 for i in p do ?  0 � V i = A − 1 Diagonal Dominance 5 i B i y  C i � W i = A − 1 6 i 0 F i = A − 1 i b i 7 x 11 ≈ A − 1 11 b 11 8 end x = SolveReducedSystem( V top,bottom , W top,bottom , F top,bottom 9 ¯ ) ? i i i 10 for i in p do  0 via Neumann Series ✓ �  C i � ◆ y x top x i = A − 1 x bottom ¯ ¯ F i − 11 i +1 − i i − 1 B i 0 12 end 13 x = P T x ∞ A − 1 = ( I � A ) k X ( ) k I � A k ∞ < 1 k =0 Demko, S., Moss, W. F., & Smith, P. W. (1984). Decay rates for inverses of band matrices. Mathematics of computation , 43 (168), 491-499. 18

  19. Computing selected elements of the inverse ∞ A − 1 = ( I � A ) k X ( ) k I � A k ∞ < 1 k =0 = · ( I − A ) k ( I − A ) k − 1 ( I − A ) Truncated SPIKE Neumann Truncated SPIKE O ( n O (( k u + k l ) 2 ) Memory 2 × ( k u + k l )) O ( n 2 × ( k u + k l ) 2 ) O (( k u + k l ) 3 ) FLOPs 1 . 83 × 10 − 13 8 . 10 × 10 − 13 Relative Error Space and time complexity comparison between LU (diagonal boosting) and Neumann series approach applied to the factorization stage of the SPIKE al- gorithm ( p = 2). Relative error is referred to the solution of the whole linear system using SPIKE. Mikkelsen, C. C. K., & Manguoglu, M. (2008). Analysis of the truncated SPIKE algorithm. 19 SIAM Journal on Matrix Analysis and Applications , 30 (4), 1500-1519.

  20. Computing selected elements of the inverse 400x 250x 20

  21. Could we extend this to more general cases? Direct evaluation of A − 1 : ∞ A − 1 = ( I � A ) k X ( ) k I � A k ∞ < 1 k =0 ∞ 1 � − 1 = I � P − 1 ( I � cA ) k I � P − 1 ( I � cA ) k ∞ < 1 � k X P − 1 A � � ( ) c k =0 Indirect or Nested evaluation of A − 1 : ∞ ( I � A ) − 1 = X A k ( ) k A k ∞ < 1 k =0 ∞ ( I � cA ) − 1 = X c k A k ( ) k cA k ∞ < 1 k =0 ! j ∞ ∞ ∞ I − ( I − A ) − 1 i − 1 A − 1 = − ( I − A ) − 1 h A k + X X X A k = ? ⇐ ⇒ k =0 j =0 k =0 21

  22. Takeways Ninja skills are required, but not su ffi cient It is possible to use direct solvers for solving large 3D problems. We’re working on on-the-fly factorizations for diagonal blocks. 22

  23. www.bsc.es Thank you! samuel.rodriguez@bsc.es 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend