ilas 2013 using ilu to estimate the diagonal of the
play

ILAS 2013 Using ILU to estimate the diagonal of the inverse of a - PowerPoint PPT Presentation

ILAS 2013 Using ILU to estimate the diagonal of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis, Stratis Gallopoulos University of Patras, Greece Acks: NSF, DOE SciDAC The


  1. ILAS 2013 Using ILU to estimate the diagonal of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis, Stratis Gallopoulos University of Patras, Greece Acks: NSF, DOE SciDAC

  2. The problem Given a large, N × N matrix A and a function f find trace of f ( A ) : Tr ( f ( A )) Common functions: f ( A ) = A − 1 f ( A ) = log ( A ) i A − 1 R j f ( A ) = R T Applications: UQ, Data Mining, Quantum Monte Carlo, Lattice QCD Our focus: f ( A ) = A − 1 [ 2 ]

  3. The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n x = randZ2( N ,1) sum = sum + x T A − 1 x trace = sum / n [ 3 ]

  4. The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace 2 problems for i=1: n Large number of samples x = randZ2( N ,1) How to compute x T A − 1 x sum = sum + x T A − 1 x trace = sum / n [ 4 ]

  5. The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n x = randZ2( N ,1) sum = sum + x T A − 1 x Find quadrature x T A − 1 x Solve Ay = x with CG compute y T x with Lanczos trace = sum / n (Golub’69, Bai’95, Meurant’06,’09, Strakos’11, ...) [ 5 ]

  6. The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n x = randZ2( N ,1) sum = sum + x T A − 1 x O ( 100 − 1000 s ) statistically independent RHS trace = sum / n Recycling (de Sturler), Deflation (Morgan, AS’07, ...) speed up Krylov methods [ 6 ]

  7. Selecting the vectors in x T A − 1 x Random x ∈ Z N best variance for real matrices (Hutchinson 1989) 2 x = randn ( N , 1 ) worse variance than Z 2 variance depends only on diag ( A − 1 ) x = e i single large element? x = F T e i mixing of diagonal elements: (Toledo et al. 2010) F = DFT or F = Hadamard Deterministic x = H T e i , i = 1 ,..., 2 k Hadamard in natural order (Bekas et al. 2007) � 1 i ∈ C m x m i = Probing. Assumes multicolored graph (Tang et al. 2011) 0 else Random-deterministic Hierarchical Probing for lattices (A.S, J.L. 2013) [ 7 ]

  8. Variance of the estimators Rademacher vectors x i ∈ Z N 2 s ∑ i � = j ( A − 1 Tr = 1 s ∑ s i A − 1 x i Var ( Tr ) = 2 s � ˜ A − 1 � 2 F = 2 i = 1 x T i j ) 2 Diagonal x = e j ( i ) Var ( Tr ) = N 2 i = 1 A − 1 Tr = N s ∑ s s Var ( diag ( A − 1 )) j ( i ) , j ( i ) magnitude −1 A variance [ 8 ]

  9. Approximating diag( A − 1 ) from ILU Consider an incomplete LU of A : [ L , U ] = ILU ( A ) If U − 1 L − 1 good approximation to A − 1 then compute trace from: M = diag ( U − 1 L − 1 ) Computing M needs only one pass over L , U (Erisman, Tienny, ’75) E = U − 1 L − 1 − A − 1 In some cases, Tr ( E ) can be sufficiently close to zero However, what if | Tr ( E ) | is not small? [ 9 ]

  10. ILU gives more info Observation: even if Tr ( E ) large, M may approximate the pattern of diag( A − 1 ) and/or E may have smaller variance Ex. small Laplacian and DW2048 Comparing traceA, traceILU and fitILU with sorted order 10 Inverse of A 9 Inverse of ILU Difference 8 7 6 Diagonal element 5 4 3 2 1 0 −1 1400 1500 1600 1700 1800 1900 2000 Sorted index of diagonal element [ 10 ]

  11. Capture pattern better by fitting p ( M ) to diag( A − 1 ) Find p () : min � p ( M ) − diag ( A − 1 ) � on a set of m indices • Induce smoothness on M by sorting • Use m equispaced indices to capture the range of values • Compute A − 1 j j of these indices • Fit M j to A − 1 j j using MATLAB’s LinearModel.stepwise When ILU is a good preconditioner, Tr ( p ( M )) can be accurate to O(1E-3) ! [ 11 ]

  12. Examples of fitting TOLS4000, DW2048, af23560, conf6 Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit 0.1 Inverse of A Inverse of ILU 50 0 Fitting of ILU Inverse of A Inverse of ILU −0.1 0 Fitting of ILU Diagonal element Diagonal element −0.2 −50 −0.3 −0.4 −100 −0.5 −150 −0.6 0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 Sorted index of diagonal element Sorted index of diagonal element Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit −5 2 x 10 0.02 Inverse of A Inverse of A Inverse of ILU 1.5 Fitting of ILU Inverse of ILU 0 Imagery part of diagonal element Fitting of ILU 1 Diagonal element −0.02 0.5 −0.04 0 −0.5 −0.06 −1 −0.08 −1.5 −0.1 −2 0 0.5 1 1.5 2 2.5 −0.0549 −0.0549 −0.0549 −0.0548 −0.0548 −0.0548 −0.0548 −0.0548 −0.0548 Real part of diagonal element Sorted index of diagonal element 4 x 10 [ 12 ]

  13. Improving on the Tr ( M ) and Tr ( p ( M )) • MC on E = M − diag ( A − 1 ) potentially smaller variance on the diagonal • MC on E 2 = p ( M ) − diag ( A − 1 ) m inversions for fitting, s − m inversions for MC further variance improvement • MC with importance sampling based on M or p ( M ) Or is traditional Hutchinson better? • MC with Z N 2 on A − 1 • MC with Z N 2 on E Depends on approximation properties of ILU [ 13 ]

  14. AF23560, conf6 ( k c − 10 − 8 ) Experiments Monte carlo, Matrix: Bai/af23560 Monte carlo, Matrix: Bai/af23560 −680 −720 MC on diag(A −1 ) MC on diag(E)_fit MC on A^−1 −700 MC on diag(E) MC on E MC on diag(E2) −740 Trace of A^−1 −720 −1 Trace of A −740 Trace estimate Trace estimate −760 −760 −780 −780 −800 −800 −820 −840 −820 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of samples Number of samples Monte carlo, Matrix: conf6.0−QCD Monte carlo, Matrix: conf6.0−QCD MC on diag(E2) −126.32 −126.375 MC on diag(E) MC on E −126.34 −1 Trace of A −126.376 −126.36 Trace estimate Trace estimate −126.38 −126.377 −126.4 MC on diag(A −1 ) −126.42 MC on diag(E) −126.378 −126.44 MC on diag(E2) MC on A −1 −126.46 −126.379 MC on E −126.48 −1 Trace of A 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of samples Number of samples [ 14 ]

  15. TOLS4000 ILU( A + 10 I ), DW2048 ILU ( A + 0 . 01 I ) Experiments Monte carlo, Matrix: Bai/tols4000 Monte carlo, Matrix: Bai/tols4000 −60 −116 −80 −118 −100 −120 −120 Trace estimate −122 Trace estimate −140 −124 −160 −126 MC on diag(A^−1) −180 MC on diag(E) −128 −200 MC on diag(E2) MC on diag(E2) −130 MC on A^−1 MC on A^−1 −220 MC on E MC on E −132 Trace of A^−1 −240 Trace of A^−1 −134 200 400 600 800 1000 0 200 400 600 800 1000 1200 Number of samples Number of samples Monte carlo, Matrix: Bai/dw2048 Monte carlo, Matrix: Bai/dw2048 7500 MC on diag(A −1 ) MC on diag(E2) 15000 MC on A −1 MC on diag(E) 7000 MC on diag(E2) MC on E 6500 −1 MC on A −1 Trace of A 10000 MC on E 6000 Trace estimate Trace estimate −1 Trace of A 5500 5000 5000 4500 0 4000 3500 −5000 3000 200 400 600 800 1000 200 400 600 800 1000 Number of samples Number of samples [ 15 ]

  16. Fitting allows ILU( A + σ I ) Experiments Often ILU on A is not possible, ill-conditioned, or too expensive Better results if we use a better conditioned ILU( A + σ I ) and allow the fitting to fix the diagonal Matrix tols4000: Statistical error varying shifts 2 10 0 10 Standard derivation −2 10 −4 10 Std(diag(E2)) Std(diag(A −1 )) Std(diag(E)) −6 Std(A −1 ) 10 Std(E) −10 −5 0 10 10 10 Shifts [ 16 ]

  17. Sometimes Z N Experiments 2 better QCD matrix (49K) close to k c EPB3 Monte carlo, Matrix: matb5 Monte carlo, Matrix: epb3 6 x 10 MC on diag(E2) MC on diag(E2) 2.3 MC on A −1 MC on A −1 −9015 2.28 MC on E MC on E 2.26 −9020 Trace estimate Trace estimate 2.24 2.22 −9025 2.2 2.18 −9030 2.16 −9035 2.14 2.12 −9040 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of samples Number of samples Here E had smaller off diagonal variance — Not easily predictable by theory [ 17 ]

  18. Dynamically identifying smallest variance For every fitting point i = 1 ,..., m Compute a i = A − 1 e i • Based on a ii = A − 1 ii update estimates for var(diag(A)) var(diag(E)) ( a ii − M i ) var(diag(E2)) ( a ii − p ( M i ) ) • Use � a i � 2 − a 2 ii to update estimate for var(MC on A ) = � A � 2 F • Compute µ i = U − 1 L − 1 e i and update estimate for var(MC on E ) ( � a i − µ i � 2 − a 2 ii − µ 2 ii ) Large differences in various methods would show after a few points [ 18 ]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend