sparse approximate inverse preconditioners revisited
play

Sparse Approximate Inverse Preconditioners Revisited Salvatore - PowerPoint PPT Presentation

Sparse Approximate Inverse Preconditioners Revisited Salvatore Filippone Daniele Bertaccini salvatore.filippone@uniroma2.it Universit` a di Roma Tor Vergata Due Giorni Algebra Lineare Numerica Dip. Matematica, Uni. Genova, Sep. 16 2012 S.


  1. Sparse Approximate Inverse Preconditioners Revisited Salvatore Filippone Daniele Bertaccini salvatore.filippone@uniroma2.it Universit` a di Roma Tor Vergata Due Giorni Algebra Lineare Numerica Dip. Matematica, Uni. Genova, Sep. 16 2012 S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 1 / 35

  2. Outline Background: why sparse inverses; Approximate inverse algorithmic variants; Applications and tests; Recipes; Directions; S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 2 / 35

  3. Background S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 3 / 35

  4. Preconditioning for Krylov Methods Krylov methods: search for the solution to Ax = b in x m ∈ { x 0 + K m } , b − Ax m ⊥ L m where K m ( A, r 0 ) = span { r 0 , Ar 0 , A 2 r 0 , . . . , A m − 1 r 0 } Now, choose a transformation M Ax = b ⇔ M − 1 Ax = M − 1 b, such that: 1 M can be computed easily; 2 M − 1 can be applied easily; 3 M − 1 A ≈ I . Time to solution: T tot = T prec + N it × T it If you can find a good compromise, you’ll get convergence in N it ≪ n iterations at an affordable cost T it Often: no convergence without preconditioner. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 4 / 35

  5. Approximate inverses Seek a matrix G = M − 1 to minimize � I − GA � 2 F This approach is theoretically attractive but exceedingly expensive. Alternative strategies: Biconjugation (Benzi, Cullum, Tuma around 2000) Inversion of incomplete factors (Van Duin 1999) Recent work by Rafiei, Bollh¨ ofer (2011) on biconjugation. All of them compute A − 1 ≈ ZD − 1 W T Development directions: efficient and robust implementation, embed in multilevel and preconditioner update frameworks. Joint work with D. Bertaccini, starting from (Benzi, Bertaccini: 2003), (Sgallari, Bertaccini 2010) S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 5 / 35

  6. Approximate inverses: why? Interest is renewed by the appearance on the computing scene of GPGPUs: fast becoming a key tool in scientific computing: price–performance ratio is extremely appealing. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 6 / 35

  7. Motivations S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 7 / 35

  8. Motivations Lots of work (including Barbieri, Cardellini and Filippone, 2009) on matrix multiply and other dense algorithms. Key algorithm (enabler for LAPACK); “Simple”: Can be studied thoroughly; allows many variations; Reusable algorithmic patterns; Situation MUCH more complicated for sparse kernels. In particular: Sparse triangular system solves are intractable (See Barbieri, Cardellini, Filippone and Rouson, 2011). Machine hours grant PSBLAS-GPU at CASPUR, for a hybrid GPU-MPI version (under active development). S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 8 / 35

  9. Approximate Inverses: Algorithmic variants S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 9 / 35

  10. Approximate inverses: Biconjugation Given two A -biconjugate sets of vectors W and Z we have   p 1 0 . . . 0 0 p 2 . . . 0 W T AZ = D =    ;  . . .  ... . . .   . . .  0 0 . . . p n it follows easily that A − 1 = � (1 /p i ) z i w T i . An obvious idea is then: Apply a biconjugation process with W and Z triangular, and sparsify while building. (Benzi & Tuma) Critical issue: how many dot products do we actually perform. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 10 / 35

  11. Approximate inverses: Biconjugation Right looking variant, stabilized (Benzi, Cullum, Tuma) 1: w (0) ← z (0) ← e i 1 ≤ i ≤ n i i 2: for i = 1 , . . . , n do v i ← Az ( i − 1) 3: i for j = i, i + 1 , . . . , n do 4: p ( i − 1) i z ( i − 1) ← v T ; 5: j j end for 6: for j = i + 1 , . . . , n do 7: � p ( i − 1) � z ( i ) ← z ( i − 1) z ( i − 1) j − ; 8: j j i p ( i − 1) i end for 9: 10: end for 11: z i ← z ( i − 1) , p i ← p ( i − 1) , 1 ≤ i ≤ n i i Better behaviour in difficult cases. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 11 / 35

  12. Approximate inverses: Biconjugation Left looking variant 1: Let z (0) ← e 1 , p (0) ← a 11 1 1 2: for i = 2 , . . . , n do z (0) ← e i 3: i for j = 1 , . . . , i − 1 do 4: p ( j − 1) j z ( j − 1) ← a T 5: i i − ( p ( j − 1) z ( j ) ← z ( j − 1) ) z ( j − 1) i 6: i i p ( j − 1) j j end for 7: p ( i − 1) i z ( i − 1) ← a T 8: i i 9: end for Note: the drop strategy is applied just once per column, to the end result of all updates. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 12 / 35

  13. Approximate inverses: sparse inversion of sparse factors Given one factor of an LDU decomposition 1 � e i u T � ( I + e i u T U = I + i = i ) i = n − 1 its inverse is also triangular; thus n − 1 U − 1 = � ( I − e i u T � u T i ) = I + e i ˆ i i =1 and therefore n − 1 � u T i = − u T ( I − e j u T ˆ j ) i j = i +1 It is natural to 1 Start from a sparse factor 2 Apply a drop strategy to avoid a dense inverse S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 13 / 35

  14. Approximate inverses: sparse inversion of sparse factors Numerical drop strategy: 1: for j = 1 to n − 1 do u T i ← − u T ˆ 2: i u T j location of first nonzero in ˆ 3: i while j < n do 4: u T α ← − ˆ i e j 5: if | α | > ǫ then 6: u T u T i + αu T ˆ i ← ˆ 7: j else 8: u T ˆ i ( j ) ← 0 9: end if 10: u T j location of next nonzero in ˆ 11: i end while 12: 13: end for Can be implemented reusing all the building blocks of ILUT. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 14 / 35

  15. Approximate inverses: sparse inversion of sparse factors Fill level drop strategy: 1: for j = 1 to n − 1 do u T i ← − u T ˆ 2: i u T j location of first nonzero in ˆ 3: i while j < n do 4: if level ij ≤ p then 5: u T α ← − ˆ i e j 6: u T u T i + αu T ˆ i ← ˆ 7: j update fill levels level ik = min( level ik , level ij + 1) 8: else 9: u T ˆ i ( j ) ← 0 10: end if 11: u T j location of next nonzero in ˆ 12: i end while 13: 14: end for Again, easy to implement from the building blocks of ILU. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 15 / 35

  16. Approximate inverses: sparse inversion of sparse factors Need to adjust the input matrix: we apply a scaling factor α given by the maximum absolute value of the coefficients in A to compute α − 1 A = LDU, and therefore the input to the inversion step is given by A = L ( αD ) U. For numerical drop tolerance this improves behaviour on nonsymmetric matrices; level-fill strategies are a bit more robust. To be investigated: scale each row independently. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 16 / 35

  17. Approximate inverses: computational load From Van Duin: nnz U � � C fact = O nnz L n nnz U � � C invrt = O nnz ˆ U n C subst = O ( nnz U ) This is true of the sparse inversion of sparse factos; biconjugation is much more expensive. Problem: this is “just” a FLOP count. Does not take into account many other issues, such as: Efficiency of search through the nonzeros; Size of resulting data structure; Memory access patterns. In any case: Approximate inverses are VERY expensive to compute; Hence the interest of update formulations. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 17 / 35

  18. Approximate inverses: Algorithmic Variants on GPU 2D convection-diffusion PDE, centered finite differences, Dirichlet unit square, AMD processor, NVIDIA GTX 285. Case size NOPREC AINVK 0,1 AINVT 0,1,.01,.001 tpr it tslv tpr it tslv tpr it tslv pde0100 10000 0.0 214 0.11 0.05 74 0.07 0.40 110 0.09 pde0200 40000 0.0 423 0.52 0.19 154 0.29 0.81 215 0.38 pde0300 90000 0.0 629 0.88 0.43 242 0.58 1.64 326 0.74 pde0400 160000 0.0 832 1.36 0.77 325 0.92 2.85 430 1.11 pde0500 250000 0.0 1043 1.99 1.19 394 1.34 4.40 543 1.64 pde0600 360000 0.0 1246 2.88 1.71 481 1.96 6.08 639 2.29 pde0700 490000 0.0 1476 4.22 2.33 556 2.74 8.28 755 3.21 pde0800 640000 0.0 1749 5.49 3.12 654 3.84 10.59 867 4.36 pde0900 810000 0.0 1965 7.47 3.93 724 5.10 13.54 1042 6.17 pde1000 1000000 0.0 † 2000 8.78 4.90 810 6.64 16.64 1106 7.64 pde1100 1210000 0.0 † 2000 10.19 5.95 883 8.49 19.89 1280 10.34 Note: preconditioner computed on CPU, solve on GPU. Different number of nonzeros among the two variants. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 18 / 35

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend