ILAS 2013 Using ILU to estimate the diagonal of the inverse of a - PowerPoint PPT Presentation

ILAS 2013 Using ILU to estimate the diagonal of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis, Stratis Gallopoulos University of Patras, Greece Acks: NSF, DOE SciDAC

The problem Given a large, N × N matrix A and a function f find trace of f ( A ) : Tr ( f ( A )) Common functions: f ( A ) = A − 1 f ( A ) = log ( A ) i A − 1 R j f ( A ) = R T Applications: UQ, Data Mining, Quantum Monte Carlo, Lattice QCD Our focus: f ( A ) = A − 1 [ 2 ]

The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n x = randZ2( N ,1) sum = sum + x T A − 1 x trace = sum / n [ 3 ]

The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace 2 problems for i=1: n Large number of samples x = randZ2( N ,1) How to compute x T A − 1 x sum = sum + x T A − 1 x trace = sum / n [ 4 ]

The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n x = randZ2( N ,1) sum = sum + x T A − 1 x Find quadrature x T A − 1 x Solve Ay = x with CG compute y T x with Lanczos trace = sum / n (Golub’69, Bai’95, Meurant’06,’09, Strakos’11, ...) [ 5 ]

The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n x = randZ2( N ,1) sum = sum + x T A − 1 x O ( 100 − 1000 s ) statistically independent RHS trace = sum / n Recycling (de Sturler), Deflation (Morgan, AS’07, ...) speed up Krylov methods [ 6 ]

Selecting the vectors in x T A − 1 x Random x ∈ Z N best variance for real matrices (Hutchinson 1989) 2 x = randn ( N , 1 ) worse variance than Z 2 variance depends only on diag ( A − 1 ) x = e i single large element? x = F T e i mixing of diagonal elements: (Toledo et al. 2010) F = DFT or F = Hadamard Deterministic x = H T e i , i = 1 ,..., 2 k Hadamard in natural order (Bekas et al. 2007) � 1 i ∈ C m x m i = Probing. Assumes multicolored graph (Tang et al. 2011) 0 else Random-deterministic Hierarchical Probing for lattices (A.S, J.L. 2013) [ 7 ]

Variance of the estimators Rademacher vectors x i ∈ Z N 2 s ∑ i � = j ( A − 1 Tr = 1 s ∑ s i A − 1 x i Var ( Tr ) = 2 s � ˜ A − 1 � 2 F = 2 i = 1 x T i j ) 2 Diagonal x = e j ( i ) Var ( Tr ) = N 2 i = 1 A − 1 Tr = N s ∑ s s Var ( diag ( A − 1 )) j ( i ) , j ( i ) magnitude −1 A variance [ 8 ]

Approximating diag( A − 1 ) from ILU Consider an incomplete LU of A : [ L , U ] = ILU ( A ) If U − 1 L − 1 good approximation to A − 1 then compute trace from: M = diag ( U − 1 L − 1 ) Computing M needs only one pass over L , U (Erisman, Tienny, ’75) E = U − 1 L − 1 − A − 1 In some cases, Tr ( E ) can be sufficiently close to zero However, what if | Tr ( E ) | is not small? [ 9 ]

ILU gives more info Observation: even if Tr ( E ) large, M may approximate the pattern of diag( A − 1 ) and/or E may have smaller variance Ex. small Laplacian and DW2048 Comparing traceA, traceILU and fitILU with sorted order 10 Inverse of A 9 Inverse of ILU Difference 8 7 6 Diagonal element 5 4 3 2 1 0 −1 1400 1500 1600 1700 1800 1900 2000 Sorted index of diagonal element [ 10 ]

Capture pattern better by fitting p ( M ) to diag( A − 1 ) Find p () : min � p ( M ) − diag ( A − 1 ) � on a set of m indices • Induce smoothness on M by sorting • Use m equispaced indices to capture the range of values • Compute A − 1 j j of these indices • Fit M j to A − 1 j j using MATLAB’s LinearModel.stepwise When ILU is a good preconditioner, Tr ( p ( M )) can be accurate to O(1E-3) ! [ 11 ]

Examples of fitting TOLS4000, DW2048, af23560, conf6 Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit 0.1 Inverse of A Inverse of ILU 50 0 Fitting of ILU Inverse of A Inverse of ILU −0.1 0 Fitting of ILU Diagonal element Diagonal element −0.2 −50 −0.3 −0.4 −100 −0.5 −150 −0.6 0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 Sorted index of diagonal element Sorted index of diagonal element Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit −5 2 x 10 0.02 Inverse of A Inverse of A Inverse of ILU 1.5 Fitting of ILU Inverse of ILU 0 Imagery part of diagonal element Fitting of ILU 1 Diagonal element −0.02 0.5 −0.04 0 −0.5 −0.06 −1 −0.08 −1.5 −0.1 −2 0 0.5 1 1.5 2 2.5 −0.0549 −0.0549 −0.0549 −0.0548 −0.0548 −0.0548 −0.0548 −0.0548 −0.0548 Real part of diagonal element Sorted index of diagonal element 4 x 10 [ 12 ]

Improving on the Tr ( M ) and Tr ( p ( M )) • MC on E = M − diag ( A − 1 ) potentially smaller variance on the diagonal • MC on E 2 = p ( M ) − diag ( A − 1 ) m inversions for fitting, s − m inversions for MC further variance improvement • MC with importance sampling based on M or p ( M ) Or is traditional Hutchinson better? • MC with Z N 2 on A − 1 • MC with Z N 2 on E Depends on approximation properties of ILU [ 13 ]

AF23560, conf6 ( k c − 10 − 8 ) Experiments Monte carlo, Matrix: Bai/af23560 Monte carlo, Matrix: Bai/af23560 −680 −720 MC on diag(A −1 ) MC on diag(E)_fit MC on A^−1 −700 MC on diag(E) MC on E MC on diag(E2) −740 Trace of A^−1 −720 −1 Trace of A −740 Trace estimate Trace estimate −760 −760 −780 −780 −800 −800 −820 −840 −820 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of samples Number of samples Monte carlo, Matrix: conf6.0−QCD Monte carlo, Matrix: conf6.0−QCD MC on diag(E2) −126.32 −126.375 MC on diag(E) MC on E −126.34 −1 Trace of A −126.376 −126.36 Trace estimate Trace estimate −126.38 −126.377 −126.4 MC on diag(A −1 ) −126.42 MC on diag(E) −126.378 −126.44 MC on diag(E2) MC on A −1 −126.46 −126.379 MC on E −126.48 −1 Trace of A 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of samples Number of samples [ 14 ]

TOLS4000 ILU( A + 10 I ), DW2048 ILU ( A + 0 . 01 I ) Experiments Monte carlo, Matrix: Bai/tols4000 Monte carlo, Matrix: Bai/tols4000 −60 −116 −80 −118 −100 −120 −120 Trace estimate −122 Trace estimate −140 −124 −160 −126 MC on diag(A^−1) −180 MC on diag(E) −128 −200 MC on diag(E2) MC on diag(E2) −130 MC on A^−1 MC on A^−1 −220 MC on E MC on E −132 Trace of A^−1 −240 Trace of A^−1 −134 200 400 600 800 1000 0 200 400 600 800 1000 1200 Number of samples Number of samples Monte carlo, Matrix: Bai/dw2048 Monte carlo, Matrix: Bai/dw2048 7500 MC on diag(A −1 ) MC on diag(E2) 15000 MC on A −1 MC on diag(E) 7000 MC on diag(E2) MC on E 6500 −1 MC on A −1 Trace of A 10000 MC on E 6000 Trace estimate Trace estimate −1 Trace of A 5500 5000 5000 4500 0 4000 3500 −5000 3000 200 400 600 800 1000 200 400 600 800 1000 Number of samples Number of samples [ 15 ]

Fitting allows ILU( A + σ I ) Experiments Often ILU on A is not possible, ill-conditioned, or too expensive Better results if we use a better conditioned ILU( A + σ I ) and allow the fitting to fix the diagonal Matrix tols4000: Statistical error varying shifts 2 10 0 10 Standard derivation −2 10 −4 10 Std(diag(E2)) Std(diag(A −1 )) Std(diag(E)) −6 Std(A −1 ) 10 Std(E) −10 −5 0 10 10 10 Shifts [ 16 ]

Sometimes Z N Experiments 2 better QCD matrix (49K) close to k c EPB3 Monte carlo, Matrix: matb5 Monte carlo, Matrix: epb3 6 x 10 MC on diag(E2) MC on diag(E2) 2.3 MC on A −1 MC on A −1 −9015 2.28 MC on E MC on E 2.26 −9020 Trace estimate Trace estimate 2.24 2.22 −9025 2.2 2.18 −9030 2.16 −9035 2.14 2.12 −9040 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of samples Number of samples Here E had smaller off diagonal variance — Not easily predictable by theory [ 17 ]

Dynamically identifying smallest variance For every fitting point i = 1 ,..., m Compute a i = A − 1 e i • Based on a ii = A − 1 ii update estimates for var(diag(A)) var(diag(E)) ( a ii − M i ) var(diag(E2)) ( a ii − p ( M i ) ) • Use � a i � 2 − a 2 ii to update estimate for var(MC on A ) = � A � 2 F • Compute µ i = U − 1 L − 1 e i and update estimate for var(MC on E ) ( � a i − µ i � 2 − a 2 ii − µ 2 ii ) Large differences in various methods would show after a few points [ 18 ]

ILAS 2013 Using ILU to estimate the diagonal of the inverse of a - PowerPoint PPT Presentation

ILAS 2013 Using ILU to estimate the diagonal of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis, Stratis Gallopoulos University of Patras, Greece Acks: NSF, DOE SciDAC The

Diagonal-Free Proofs of the Diagonal Lemma Saeed Salehi University of Tabriz & IPM WORMSHOP

Algebraic Multilevel ILU-Based Preconditioning for Stationary Flow Problems Jan Mayer Hendryk

1 Diagonal Complete Latin Squares (Author: Jenny Zhang) Definition : Right-diagonal complete

Diagonal asymptotics for products of combinatorial classes Or: the diagonal method is still not

Methods for estimating the diagonal of matrix functions Jesse Laeuchli Andreas Stathopoulos CSC

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

Welcome to the Vil ilas Avenue and Campbell Street Reconstruction Public In Involvement Meeting

Advanced Algorithms (X) Shanghai Jiao Tong University Chihao Zhang May 11, 2020 Estimate

The Frobenius and the Tate diagonal eCHT Reading Seminar on THH Yuqing Shi University of Utrecht

Overview Yesterday we studied how real 2 2 matrices act on C . Just as the action of a diagonal

Compact spaces with a P -diagonal T a sc eil n agam K. P. Hart Faculty EEMCS TU

Orthogonal similarity reduction of any symmetric matrix into a diagonal-plus-semiseparable one

Diagonal extender based Prikry forcing Dima Sinapova University of California Irvine March 2012

Various Single Feed CP MSAs L 2 (a) (b) (c) L 1 L L (d) (e) (f) L L L (a) Diagonal fed

Revised: March 4, 2013 3/19/2013 3/19/2013 2 3/19/2013 3 3/19/2013 4 3/19/2013 5

Recovered Paper Yield Estimate Recovered Paper Yield Estimate Using Laboratory + Mill Using

Sum Product Networks What is a Sum Product Network? 1. It is a tractable probabilistic model

Linking the 20th century paper history to the sum of all knowledge Joachim Neubert ZBW

Minkowski Sums and Offsets of Polygons Seminar Computational Geometry and Geometric Computing

Programming 1 Sample Midterm Questions February 19, 2019 The test consists of 1. 15 multiple

Porting Merge Similar Functions pass to ThinLTO Aditya Kumar Facebook ThinLTO (Intro)

Minkowski Sums Minkowski Sumdefinition, complexity, construction, applications Efi Fogel Tel

Principles of Everything is an object Computer Science II Program is a bunch of objects

Student National Pharmaceutical Association (SNPhA) Mission The Student National

Sambuz

Useful Links

Newsletter

Mail Us

ILAS 2013 Using ILU to estimate the diagonal of the inverse of a - PowerPoint PPT Presentation

ILAS 2013 Using ILU to estimate the diagonal of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis, Stratis Gallopoulos University of Patras, Greece Acks: NSF, DOE SciDAC The

Diagonal-Free Proofs of the Diagonal Lemma Saeed Salehi University of Tabriz &amp; IPM WORMSHOP

Algebraic Multilevel ILU-Based Preconditioning for Stationary Flow Problems Jan Mayer Hendryk

1 Diagonal Complete Latin Squares (Author: Jenny Zhang) Definition : Right-diagonal complete

Diagonal asymptotics for products of combinatorial classes Or: the diagonal method is still not

Methods for estimating the diagonal of matrix functions Jesse Laeuchli Andreas Stathopoulos CSC

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

Welcome to the Vil ilas Avenue and Campbell Street Reconstruction Public In Involvement Meeting

Advanced Algorithms (X) Shanghai Jiao Tong University Chihao Zhang May 11, 2020 Estimate

The Frobenius and the Tate diagonal eCHT Reading Seminar on THH Yuqing Shi University of Utrecht

Overview Yesterday we studied how real 2 2 matrices act on C . Just as the action of a diagonal

Compact spaces with a P -diagonal T a sc eil n agam K. P. Hart Faculty EEMCS TU

Orthogonal similarity reduction of any symmetric matrix into a diagonal-plus-semiseparable one

Diagonal extender based Prikry forcing Dima Sinapova University of California Irvine March 2012

Various Single Feed CP MSAs L 2 (a) (b) (c) L 1 L L (d) (e) (f) L L L (a) Diagonal fed

Revised: March 4, 2013 3/19/2013 3/19/2013 2 3/19/2013 3 3/19/2013 4 3/19/2013 5

Recovered Paper Yield Estimate Recovered Paper Yield Estimate Using Laboratory + Mill Using

Sum Product Networks What is a Sum Product Network? 1. It is a tractable probabilistic model

Linking the 20th century paper history to the sum of all knowledge Joachim Neubert ZBW

Minkowski Sums and Offsets of Polygons Seminar Computational Geometry and Geometric Computing

Programming 1 Sample Midterm Questions February 19, 2019 The test consists of 1. 15 multiple

Porting Merge Similar Functions pass to ThinLTO Aditya Kumar Facebook ThinLTO (Intro)

Minkowski Sums Minkowski Sumdefinition, complexity, construction, applications Efi Fogel Tel

Principles of Everything is an object Computer Science II Program is a bunch of objects

Student National Pharmaceutical Association (SNPhA) Mission The Student National

Sambuz

Useful Links

Newsletter

Mail Us

Diagonal-Free Proofs of the Diagonal Lemma Saeed Salehi University of Tabriz & IPM WORMSHOP