ILAS 2013 Using ILU to estimate the diagonal of the inverse of a - - PowerPoint PPT Presentation
ILAS 2013 Using ILU to estimate the diagonal of the inverse of a - - PowerPoint PPT Presentation
ILAS 2013 Using ILU to estimate the diagonal of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis, Stratis Gallopoulos University of Patras, Greece Acks: NSF, DOE SciDAC The
The problem Given a large, N ×N matrix A and a function f find trace of f(A): Tr(f(A)) Common functions: f(A) = A−1 f(A) = log(A) f(A) = RT
i A−1Rj
Applications: UQ, Data Mining, Quantum Monte Carlo, Lattice QCD Our focus: f(A) = A−1
[ 2 ]
The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z2 variables xi =
- 1 with probability 1/2
−1 with probability 1/2 then E(xTA−1x) = Tr(A−1) Monte Carlo Trace for i=1:n x = randZ2(N,1) sum = sum + xTA−1x trace = sum/n
[ 3 ]
The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z2 variables xi =
- 1 with probability 1/2
−1 with probability 1/2 then E(xTA−1x) = Tr(A−1) Monte Carlo Trace for i=1:n x = randZ2(N,1) sum = sum + xTA−1x trace = sum/n 2 problems Large number of samples How to compute xTA−1x
[ 4 ]
The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z2 variables xi =
- 1 with probability 1/2
−1 with probability 1/2 then E(xTA−1x) = Tr(A−1) Monte Carlo Trace for i=1:n x = randZ2(N,1) sum = sum + xTA−1x trace = sum/n Solve Ay = x with CG compute yTx Find quadrature xTA−1x with Lanczos (Golub’69, Bai’95, Meurant’06,’09, Strakos’11, ...)
[ 5 ]
The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z2 variables xi =
- 1 with probability 1/2
−1 with probability 1/2 then E(xTA−1x) = Tr(A−1) Monte Carlo Trace for i=1:n x = randZ2(N,1) sum = sum + xTA−1x trace = sum/n O(100−1000s) statistically independent RHS Recycling (de Sturler), Deflation (Morgan, AS’07, ...) speed up Krylov methods
[ 6 ]
Selecting the vectors in xTA−1x Random x ∈ ZN
2
best variance for real matrices (Hutchinson 1989) x = randn(N,1) worse variance than Z2 x = ei variance depends only on diag(A−1) single large element? x = FTei mixing of diagonal elements: (Toledo et al. 2010) F = DFT or F = Hadamard Deterministic x = HTei, i = 1,...,2k Hadamard in natural order (Bekas et al. 2007) xm
i =
1 i ∈ Cm 0 else
- Probing. Assumes multicolored graph (Tang et al. 2011)
Random-deterministic Hierarchical Probing for lattices (A.S, J.L. 2013)
[ 7 ]
Variance of the estimators Rademacher vectors xi ∈ ZN
2
Tr = 1
s ∑s i=1xT i A−1xi
Var(Tr) = 2
s ˜
A−12
F = 2 s ∑i=j(A−1 i j )2
Diagonal x = ej(i) Tr = N
s ∑s i=1A−1 j(i),j(i)
Var(Tr) = N2
s Var(diag(A−1))
A
−1
magnitude variance
[ 8 ]
Approximating diag(A−1) from ILU Consider an incomplete LU of A: [L,U] = ILU(A) If U−1L−1 good approximation to A−1 then compute trace from: M = diag(U−1L−1) Computing M needs only one pass over L,U (Erisman, Tienny, ’75) E = U−1L−1 −A−1 In some cases, Tr(E) can be sufficiently close to zero However, what if |Tr(E)| is not small?
[ 9 ]
ILU gives more info Observation: even if Tr(E) large, M may approximate the pattern of diag(A−1) and/or E may have smaller variance
- Ex. small Laplacian and DW2048
1400 1500 1600 1700 1800 1900 2000 −1 1 2 3 4 5 6 7 8 9 10 Sorted index of diagonal element Diagonal element Comparing traceA, traceILU and fitILU with sorted order
Inverse of A Inverse of ILU Difference
[ 10 ]
Capture pattern better by fitting p(M) to diag(A−1) Find p(): minp(M)−diag(A−1) on a set of m indices
- Induce smoothness on M by sorting
- Use m equispaced indices to capture the range of values
- Compute A−1
j j of these indices
- Fit Mj to A−1
j j using MATLAB’s LinearModel.stepwise
When ILU is a good preconditioner, Tr(p(M)) can be accurate to O(1E-3) !
[ 11 ]
Examples of fitting TOLS4000, DW2048, af23560, conf6
500 1000 1500 2000 2500 3000 3500 4000 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 Sorted index of diagonal element Diagonal element Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Inverse of A Inverse of ILU Fitting of ILU 500 1000 1500 2000 −150 −100 −50 50 Sorted index of diagonal element Diagonal element Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Inverse of A Inverse of ILU Fitting of ILU
0.5 1 1.5 2 2.5 x 10
4
−0.1 −0.08 −0.06 −0.04 −0.02 0.02 Sorted index of diagonal element Diagonal element Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Inverse of A Inverse of ILU Fitting of ILU
−0.0549 −0.0549 −0.0549 −0.0548 −0.0548 −0.0548 −0.0548 −0.0548 −0.0548 −2 −1.5 −1 −0.5 0.5 1 1.5 2 x 10
−5
Real part of diagonal element Imagery part of diagonal element Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Inverse of A Inverse of ILU Fitting of ILU
[ 12 ]
Improving on the Tr(M) and Tr(p(M))
- MC on E = M −diag(A−1)
potentially smaller variance on the diagonal
- MC on E2 = p(M)−diag(A−1)
m inversions for fitting, s−m inversions for MC further variance improvement
- MC with importance sampling based on M or p(M)
Or is traditional Hutchinson better?
- MC with ZN
2 on A−1
- MC with ZN
2 on E
Depends on approximation properties of ILU
[ 13 ]
Experiments AF23560, conf6 (kc −10−8)
200 400 600 800 1000 −820 −800 −780 −760 −740 −720 Monte carlo, Matrix: Bai/af23560 Number of samples Trace estimate MC on diag(A −1) MC on diag(E) MC on diag(E2) Trace of A
−1
200 400 600 800 1000 −840 −820 −800 −780 −760 −740 −720 −700 −680 Monte carlo, Matrix: Bai/af23560 Number of samples Trace estimate MC on diag(E)_fit MC on A^−1 MC on E Trace of A^−1 200 400 600 800 1000 −126.48 −126.46 −126.44 −126.42 −126.4 −126.38 −126.36 −126.34 −126.32 Monte carlo, Matrix: conf6.0−QCD Number of samples Trace estimate MC on diag(A −1) MC on diag(E) MC on diag(E2) MC on A−1 MC on E Trace of A
−1
200 400 600 800 1000 −126.379 −126.378 −126.377 −126.376 −126.375 Monte carlo, Matrix: conf6.0−QCD Number of samples Trace estimate MC on diag(E2) MC on diag(E) MC on E Trace of A
−1
[ 14 ]
Experiments TOLS4000 ILU(A+10I), DW2048 ILU(A+0.01I)
200 400 600 800 1000 −240 −220 −200 −180 −160 −140 −120 −100 −80 −60 Monte carlo, Matrix: Bai/tols4000 Number of samples Trace estimate MC on diag(A^−1) MC on diag(E) MC on diag(E2) MC on A^−1 MC on E Trace of A^−1
200 400 600 800 1000 1200 −134 −132 −130 −128 −126 −124 −122 −120 −118 −116 Monte carlo, Matrix: Bai/tols4000 Number of samples Trace estimate MC on diag(E2) MC on A^−1 MC on E Trace of A^−1
200 400 600 800 1000 −5000 5000 10000 15000 Monte carlo, Matrix: Bai/dw2048 Number of samples Trace estimate MC on diag(A−1) MC on diag(E) MC on diag(E2) MC on A−1 MC on E Trace of A
−1
200 400 600 800 1000 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 Monte carlo, Matrix: Bai/dw2048 Number of samples Trace estimate MC on diag(E2) MC on A−1 MC on E Trace of A
−1
[ 15 ]
Experiments Fitting allows ILU(A+σI) Often ILU on A is not possible, ill-conditioned, or too expensive Better results if we use a better conditioned ILU(A+σI) and allow the fitting to fix the diagonal
10
−10
10
−5
10 10
−6
10
−4
10
−2
10 10
2
Matrix tols4000: Statistical error varying shifts Shifts Standard derivation
Std(diag(E2)) Std(diag(A−1)) Std(diag(E)) Std(A−1) Std(E)
[ 16 ]
Experiments Sometimes ZN
2 better
QCD matrix (49K) close to kc EPB3
200 400 600 800 1000 −9040 −9035 −9030 −9025 −9020 −9015 Monte carlo, Matrix: matb5 Number of samples Trace estimate MC on diag(E2) MC on A−1 MC on E 200 400 600 800 1000 2.12 2.14 2.16 2.18 2.2 2.22 2.24 2.26 2.28 2.3 x 10
6
Monte carlo, Matrix: epb3 Number of samples Trace estimate MC on diag(E2) MC on A−1 MC on E
Here E had smaller off diagonal variance — Not easily predictable by theory
[ 17 ]
Dynamically identifying smallest variance For every fitting point i = 1,...,m Compute ai = A−1ei
- Based on aii = A−1
ii update estimates for
var(diag(A)) var(diag(E)) (aii −Mi) var(diag(E2)) (aii − p(Mi))
- Use ai2 −a2
ii to update estimate for
var(MC on A) = A2
F
- Compute µi = U−1L−1ei and update estimate for
var(MC on E) (ai − µi2 −a2
ii − µ2 ii)
Large differences in various methods would show after a few points
[ 18 ]
Dynamically identifying smallest variance Estimated variance converges to actual variance Relative differences apparent almost immediately
10
1
10
2
10
−2
10 10
2
10
4
10
6
Matrix cage8: Variance Estimation Number of fitting points Variance
- Estim. Var(diag(E2))
- Estim. Var(A
−1)
- Estim. Var(E)
Actual Var(diag(E2)) Actual Var(A
−1)
Actual Var(E)
10
1
10
2
10
1
10
2
Matrix bwm200: Variance Estimation Number of fitting points Variance
- Estim. Var(diag(E2))
- Estim. Var(A
−1)
- Estim. Var(E)
Actual Var(diag(E2)) Actual Var(A
−1)
Actual Var(E)
[ 19 ]
Dynamically identifying smallest variance Given a total s of allowed steps, ask what method will give the smallest error at s Eg., the matb5 QCD matrix:
10 20 30 40 50 60 70 80 90 100 2 4 6 8 10 12 14 num of points used in fitting Predicted Error
Predicted Error at 200th quadrature with Rademacher on A Predicted Error at 200th quadrature with Rademacher on E Predicted Error at 200th quadrature with unit. Basis on E2
After 10 steps, excellent match between estimated and observed variances
[ 20 ]
Conclusions A method to approximate Tr(A−1) based on ILU
- Negligible additional computational cost
- Very good accuracy, if ILU is effective
- Fitting improves accuracy
- MC on the fitted diagonal improves speed too