Sparse Approximate Inverse Preconditioners Revisited Salvatore - PowerPoint PPT Presentation

Sparse Approximate Inverse Preconditioners Revisited Salvatore Filippone Daniele Bertaccini salvatore.filippone@uniroma2.it Universit` a di Roma Tor Vergata Due Giorni Algebra Lineare Numerica Dip. Matematica, Uni. Genova, Sep. 16 2012 S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 1 / 35

Outline Background: why sparse inverses; Approximate inverse algorithmic variants; Applications and tests; Recipes; Directions; S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 2 / 35

Background S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 3 / 35

Preconditioning for Krylov Methods Krylov methods: search for the solution to Ax = b in x m ∈ { x 0 + K m } , b − Ax m ⊥ L m where K m ( A, r 0 ) = span { r 0 , Ar 0 , A 2 r 0 , . . . , A m − 1 r 0 } Now, choose a transformation M Ax = b ⇔ M − 1 Ax = M − 1 b, such that: 1 M can be computed easily; 2 M − 1 can be applied easily; 3 M − 1 A ≈ I . Time to solution: T tot = T prec + N it × T it If you can find a good compromise, you’ll get convergence in N it ≪ n iterations at an affordable cost T it Often: no convergence without preconditioner. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 4 / 35

Approximate inverses Seek a matrix G = M − 1 to minimize � I − GA � 2 F This approach is theoretically attractive but exceedingly expensive. Alternative strategies: Biconjugation (Benzi, Cullum, Tuma around 2000) Inversion of incomplete factors (Van Duin 1999) Recent work by Rafiei, Bollh¨ ofer (2011) on biconjugation. All of them compute A − 1 ≈ ZD − 1 W T Development directions: efficient and robust implementation, embed in multilevel and preconditioner update frameworks. Joint work with D. Bertaccini, starting from (Benzi, Bertaccini: 2003), (Sgallari, Bertaccini 2010) S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 5 / 35

Approximate inverses: why? Interest is renewed by the appearance on the computing scene of GPGPUs: fast becoming a key tool in scientific computing: price–performance ratio is extremely appealing. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 6 / 35

Motivations S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 7 / 35

Motivations Lots of work (including Barbieri, Cardellini and Filippone, 2009) on matrix multiply and other dense algorithms. Key algorithm (enabler for LAPACK); “Simple”: Can be studied thoroughly; allows many variations; Reusable algorithmic patterns; Situation MUCH more complicated for sparse kernels. In particular: Sparse triangular system solves are intractable (See Barbieri, Cardellini, Filippone and Rouson, 2011). Machine hours grant PSBLAS-GPU at CASPUR, for a hybrid GPU-MPI version (under active development). S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 8 / 35

Approximate Inverses: Algorithmic variants S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 9 / 35

Approximate inverses: Biconjugation Given two A -biconjugate sets of vectors W and Z we have   p 1 0 . . . 0 0 p 2 . . . 0 W T AZ = D =    ;  . . .  ... . . .   . . .  0 0 . . . p n it follows easily that A − 1 = � (1 /p i ) z i w T i . An obvious idea is then: Apply a biconjugation process with W and Z triangular, and sparsify while building. (Benzi & Tuma) Critical issue: how many dot products do we actually perform. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 10 / 35

Approximate inverses: Biconjugation Right looking variant, stabilized (Benzi, Cullum, Tuma) 1: w (0) ← z (0) ← e i 1 ≤ i ≤ n i i 2: for i = 1 , . . . , n do v i ← Az ( i − 1) 3: i for j = i, i + 1 , . . . , n do 4: p ( i − 1) i z ( i − 1) ← v T ; 5: j j end for 6: for j = i + 1 , . . . , n do 7: � p ( i − 1) � z ( i ) ← z ( i − 1) z ( i − 1) j − ; 8: j j i p ( i − 1) i end for 9: 10: end for 11: z i ← z ( i − 1) , p i ← p ( i − 1) , 1 ≤ i ≤ n i i Better behaviour in difficult cases. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 11 / 35

Approximate inverses: Biconjugation Left looking variant 1: Let z (0) ← e 1 , p (0) ← a 11 1 1 2: for i = 2 , . . . , n do z (0) ← e i 3: i for j = 1 , . . . , i − 1 do 4: p ( j − 1) j z ( j − 1) ← a T 5: i i − ( p ( j − 1) z ( j ) ← z ( j − 1) ) z ( j − 1) i 6: i i p ( j − 1) j j end for 7: p ( i − 1) i z ( i − 1) ← a T 8: i i 9: end for Note: the drop strategy is applied just once per column, to the end result of all updates. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 12 / 35

Approximate inverses: sparse inversion of sparse factors Given one factor of an LDU decomposition 1 � e i u T � ( I + e i u T U = I + i = i ) i = n − 1 its inverse is also triangular; thus n − 1 U − 1 = � ( I − e i u T � u T i ) = I + e i ˆ i i =1 and therefore n − 1 � u T i = − u T ( I − e j u T ˆ j ) i j = i +1 It is natural to 1 Start from a sparse factor 2 Apply a drop strategy to avoid a dense inverse S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 13 / 35

Approximate inverses: sparse inversion of sparse factors Numerical drop strategy: 1: for j = 1 to n − 1 do u T i ← − u T ˆ 2: i u T j location of first nonzero in ˆ 3: i while j < n do 4: u T α ← − ˆ i e j 5: if | α | > ǫ then 6: u T u T i + αu T ˆ i ← ˆ 7: j else 8: u T ˆ i ( j ) ← 0 9: end if 10: u T j location of next nonzero in ˆ 11: i end while 12: 13: end for Can be implemented reusing all the building blocks of ILUT. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 14 / 35

Approximate inverses: sparse inversion of sparse factors Fill level drop strategy: 1: for j = 1 to n − 1 do u T i ← − u T ˆ 2: i u T j location of first nonzero in ˆ 3: i while j < n do 4: if level ij ≤ p then 5: u T α ← − ˆ i e j 6: u T u T i + αu T ˆ i ← ˆ 7: j update fill levels level ik = min( level ik , level ij + 1) 8: else 9: u T ˆ i ( j ) ← 0 10: end if 11: u T j location of next nonzero in ˆ 12: i end while 13: 14: end for Again, easy to implement from the building blocks of ILU. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 15 / 35

Approximate inverses: sparse inversion of sparse factors Need to adjust the input matrix: we apply a scaling factor α given by the maximum absolute value of the coefficients in A to compute α − 1 A = LDU, and therefore the input to the inversion step is given by A = L ( αD ) U. For numerical drop tolerance this improves behaviour on nonsymmetric matrices; level-fill strategies are a bit more robust. To be investigated: scale each row independently. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 16 / 35

Approximate inverses: computational load From Van Duin: nnz U � � C fact = O nnz L n nnz U � � C invrt = O nnz ˆ U n C subst = O ( nnz U ) This is true of the sparse inversion of sparse factos; biconjugation is much more expensive. Problem: this is “just” a FLOP count. Does not take into account many other issues, such as: Efficiency of search through the nonzeros; Size of resulting data structure; Memory access patterns. In any case: Approximate inverses are VERY expensive to compute; Hence the interest of update formulations. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 17 / 35

Approximate inverses: Algorithmic Variants on GPU 2D convection-diffusion PDE, centered finite differences, Dirichlet unit square, AMD processor, NVIDIA GTX 285. Case size NOPREC AINVK 0,1 AINVT 0,1,.01,.001 tpr it tslv tpr it tslv tpr it tslv pde0100 10000 0.0 214 0.11 0.05 74 0.07 0.40 110 0.09 pde0200 40000 0.0 423 0.52 0.19 154 0.29 0.81 215 0.38 pde0300 90000 0.0 629 0.88 0.43 242 0.58 1.64 326 0.74 pde0400 160000 0.0 832 1.36 0.77 325 0.92 2.85 430 1.11 pde0500 250000 0.0 1043 1.99 1.19 394 1.34 4.40 543 1.64 pde0600 360000 0.0 1246 2.88 1.71 481 1.96 6.08 639 2.29 pde0700 490000 0.0 1476 4.22 2.33 556 2.74 8.28 755 3.21 pde0800 640000 0.0 1749 5.49 3.12 654 3.84 10.59 867 4.36 pde0900 810000 0.0 1965 7.47 3.93 724 5.10 13.54 1042 6.17 pde1000 1000000 0.0 † 2000 8.78 4.90 810 6.64 16.64 1106 7.64 pde1100 1210000 0.0 † 2000 10.19 5.95 883 8.49 19.89 1280 10.34 Note: preconditioner computed on CPU, solve on GPU. Different number of nonzeros among the two variants. S. Filippone (Roma TV) Approx. Inverse Precond. DIMA, Sep. 2012 18 / 35

Sparse Approximate Inverse Preconditioners Revisited Salvatore - PowerPoint PPT Presentation

Sparse Approximate Inverse Preconditioners Revisited Salvatore Filippone Daniele Bertaccini salvatore.filippone@uniroma2.it Universit` a di Roma Tor Vergata Due Giorni Algebra Lineare Numerica Dip. Matematica, Uni. Genova, Sep. 16 2012 S.

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Multilevel Schwarz and Multigrid preconditioners for the Bidomain system S. Scacchi Universit`

Preconditioners for computations of subduction zone modelling Sander Rhebergen, Richard Katz,

Sparse Approximate Inverse and Hybrid Algorithms for Matrix Computations Vassil Alexandrov

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Inverse Kinematics Inverse Kinematics Inverse Kinematics Carnegie Carnegie Sebastian Grassia

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Compression, inversion and sparse approximate PCA of dense kernel matrices in near linear

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

THE CASE MOLLA SALI VS. GREECE ECHR 19 TH DECEMBER 2018, APPL. 20452/2014 D R M ARCO R IZZUTI

Adaptive Interventions: What are they? Why do we need them? and How can we study them? Daniel

Group or Team Level Intervention Can We Question the Importance of Teams at Workplace in Present

Anita L. Stewart, Ph.D. Institute for Health & Aging Center for Aging in Diverse Communities

Performance Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon,

1 Definition of CPU execution time CPI -- Cycles per instruction CPU execution_time = CPU

Notes from SUSY2018 Conference Romain Madar 06/2018 Romain Madar Notes from SUSY2018 Conference

Wheeled Rob 17. Wheeled Robots Guy Campion, Woojin Chung 17.2.5 Characterization of Robot

Sparse Approximate Inverse Preconditioners Revisited Salvatore - PowerPoint PPT Presentation

Sparse Approximate Inverse Preconditioners Revisited Salvatore Filippone Daniele Bertaccini salvatore.filippone@uniroma2.it Universit` a di Roma Tor Vergata Due Giorni Algebra Lineare Numerica Dip. Matematica, Uni. Genova, Sep. 16 2012 S.

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Multilevel Schwarz and Multigrid preconditioners for the Bidomain system S. Scacchi Universit`

Preconditioners for computations of subduction zone modelling Sander Rhebergen, Richard Katz,

Sparse Approximate Inverse and Hybrid Algorithms for Matrix Computations Vassil Alexandrov

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Inverse Kinematics Inverse Kinematics Inverse Kinematics Carnegie Carnegie Sebastian Grassia

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Compression, inversion and sparse approximate PCA of dense kernel matrices in near linear

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

THE CASE MOLLA SALI VS. GREECE ECHR 19 TH DECEMBER 2018, APPL. 20452/2014 D R M ARCO R IZZUTI

Adaptive Interventions: What are they? Why do we need them? and How can we study them? Daniel

Group or Team Level Intervention Can We Question the Importance of Teams at Workplace in Present

Anita L. Stewart, Ph.D. Institute for Health &amp; Aging Center for Aging in Diverse Communities

Performance Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon,

1 Definition of CPU execution time CPI -- Cycles per instruction CPU execution_time = CPU

Notes from SUSY2018 Conference Romain Madar 06/2018 Romain Madar Notes from SUSY2018 Conference

Wheeled Rob 17. Wheeled Robots Guy Campion, Woojin Chung 17.2.5 Characterization of Robot

Anita L. Stewart, Ph.D. Institute for Health & Aging Center for Aging in Diverse Communities