ILAS 2013 Using ILU to estimate the diagonal of the inverse of a - - PowerPoint PPT Presentation

ilas 2013 using ilu to estimate the diagonal of the
SMART_READER_LITE
LIVE PREVIEW

ILAS 2013 Using ILU to estimate the diagonal of the inverse of a - - PowerPoint PPT Presentation

ILAS 2013 Using ILU to estimate the diagonal of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis, Stratis Gallopoulos University of Patras, Greece Acks: NSF, DOE SciDAC The


slide-1
SLIDE 1

ILAS 2013 Using ILU to estimate the diagonal of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis, Stratis Gallopoulos University of Patras, Greece Acks: NSF, DOE SciDAC

slide-2
SLIDE 2

The problem Given a large, N ×N matrix A and a function f find trace of f(A): Tr(f(A)) Common functions: f(A) = A−1 f(A) = log(A) f(A) = RT

i A−1Rj

Applications: UQ, Data Mining, Quantum Monte Carlo, Lattice QCD Our focus: f(A) = A−1

[ 2 ]

slide-3
SLIDE 3

The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z2 variables xi =

  • 1 with probability 1/2

−1 with probability 1/2 then E(xTA−1x) = Tr(A−1) Monte Carlo Trace for i=1:n x = randZ2(N,1) sum = sum + xTA−1x trace = sum/n

[ 3 ]

slide-4
SLIDE 4

The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z2 variables xi =

  • 1 with probability 1/2

−1 with probability 1/2 then E(xTA−1x) = Tr(A−1) Monte Carlo Trace for i=1:n x = randZ2(N,1) sum = sum + xTA−1x trace = sum/n 2 problems Large number of samples How to compute xTA−1x

[ 4 ]

slide-5
SLIDE 5

The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z2 variables xi =

  • 1 with probability 1/2

−1 with probability 1/2 then E(xTA−1x) = Tr(A−1) Monte Carlo Trace for i=1:n x = randZ2(N,1) sum = sum + xTA−1x trace = sum/n Solve Ay = x with CG compute yTx Find quadrature xTA−1x with Lanczos (Golub’69, Bai’95, Meurant’06,’09, Strakos’11, ...)

[ 5 ]

slide-6
SLIDE 6

The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z2 variables xi =

  • 1 with probability 1/2

−1 with probability 1/2 then E(xTA−1x) = Tr(A−1) Monte Carlo Trace for i=1:n x = randZ2(N,1) sum = sum + xTA−1x trace = sum/n O(100−1000s) statistically independent RHS Recycling (de Sturler), Deflation (Morgan, AS’07, ...) speed up Krylov methods

[ 6 ]

slide-7
SLIDE 7

Selecting the vectors in xTA−1x Random x ∈ ZN

2

best variance for real matrices (Hutchinson 1989) x = randn(N,1) worse variance than Z2 x = ei variance depends only on diag(A−1) single large element? x = FTei mixing of diagonal elements: (Toledo et al. 2010) F = DFT or F = Hadamard Deterministic x = HTei, i = 1,...,2k Hadamard in natural order (Bekas et al. 2007) xm

i =

1 i ∈ Cm 0 else

  • Probing. Assumes multicolored graph (Tang et al. 2011)

Random-deterministic Hierarchical Probing for lattices (A.S, J.L. 2013)

[ 7 ]

slide-8
SLIDE 8

Variance of the estimators Rademacher vectors xi ∈ ZN

2

Tr = 1

s ∑s i=1xT i A−1xi

Var(Tr) = 2

s ˜

A−12

F = 2 s ∑i=j(A−1 i j )2

Diagonal x = ej(i) Tr = N

s ∑s i=1A−1 j(i),j(i)

Var(Tr) = N2

s Var(diag(A−1))

A

−1

magnitude variance

[ 8 ]

slide-9
SLIDE 9

Approximating diag(A−1) from ILU Consider an incomplete LU of A: [L,U] = ILU(A) If U−1L−1 good approximation to A−1 then compute trace from: M = diag(U−1L−1) Computing M needs only one pass over L,U (Erisman, Tienny, ’75) E = U−1L−1 −A−1 In some cases, Tr(E) can be sufficiently close to zero However, what if |Tr(E)| is not small?

[ 9 ]

slide-10
SLIDE 10

ILU gives more info Observation: even if Tr(E) large, M may approximate the pattern of diag(A−1) and/or E may have smaller variance

  • Ex. small Laplacian and DW2048

1400 1500 1600 1700 1800 1900 2000 −1 1 2 3 4 5 6 7 8 9 10 Sorted index of diagonal element Diagonal element Comparing traceA, traceILU and fitILU with sorted order

Inverse of A Inverse of ILU Difference

[ 10 ]

slide-11
SLIDE 11

Capture pattern better by fitting p(M) to diag(A−1) Find p(): minp(M)−diag(A−1) on a set of m indices

  • Induce smoothness on M by sorting
  • Use m equispaced indices to capture the range of values
  • Compute A−1

j j of these indices

  • Fit Mj to A−1

j j using MATLAB’s LinearModel.stepwise

When ILU is a good preconditioner, Tr(p(M)) can be accurate to O(1E-3) !

[ 11 ]

slide-12
SLIDE 12

Examples of fitting TOLS4000, DW2048, af23560, conf6

500 1000 1500 2000 2500 3000 3500 4000 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 Sorted index of diagonal element Diagonal element Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Inverse of A Inverse of ILU Fitting of ILU 500 1000 1500 2000 −150 −100 −50 50 Sorted index of diagonal element Diagonal element Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Inverse of A Inverse of ILU Fitting of ILU

0.5 1 1.5 2 2.5 x 10

4

−0.1 −0.08 −0.06 −0.04 −0.02 0.02 Sorted index of diagonal element Diagonal element Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Inverse of A Inverse of ILU Fitting of ILU

−0.0549 −0.0549 −0.0549 −0.0548 −0.0548 −0.0548 −0.0548 −0.0548 −0.0548 −2 −1.5 −1 −0.5 0.5 1 1.5 2 x 10

−5

Real part of diagonal element Imagery part of diagonal element Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Inverse of A Inverse of ILU Fitting of ILU

[ 12 ]

slide-13
SLIDE 13

Improving on the Tr(M) and Tr(p(M))

  • MC on E = M −diag(A−1)

potentially smaller variance on the diagonal

  • MC on E2 = p(M)−diag(A−1)

m inversions for fitting, s−m inversions for MC further variance improvement

  • MC with importance sampling based on M or p(M)

Or is traditional Hutchinson better?

  • MC with ZN

2 on A−1

  • MC with ZN

2 on E

Depends on approximation properties of ILU

[ 13 ]

slide-14
SLIDE 14

Experiments AF23560, conf6 (kc −10−8)

200 400 600 800 1000 −820 −800 −780 −760 −740 −720 Monte carlo, Matrix: Bai/af23560 Number of samples Trace estimate MC on diag(A −1) MC on diag(E) MC on diag(E2) Trace of A

−1

200 400 600 800 1000 −840 −820 −800 −780 −760 −740 −720 −700 −680 Monte carlo, Matrix: Bai/af23560 Number of samples Trace estimate MC on diag(E)_fit MC on A^−1 MC on E Trace of A^−1 200 400 600 800 1000 −126.48 −126.46 −126.44 −126.42 −126.4 −126.38 −126.36 −126.34 −126.32 Monte carlo, Matrix: conf6.0−QCD Number of samples Trace estimate MC on diag(A −1) MC on diag(E) MC on diag(E2) MC on A−1 MC on E Trace of A

−1

200 400 600 800 1000 −126.379 −126.378 −126.377 −126.376 −126.375 Monte carlo, Matrix: conf6.0−QCD Number of samples Trace estimate MC on diag(E2) MC on diag(E) MC on E Trace of A

−1

[ 14 ]

slide-15
SLIDE 15

Experiments TOLS4000 ILU(A+10I), DW2048 ILU(A+0.01I)

200 400 600 800 1000 −240 −220 −200 −180 −160 −140 −120 −100 −80 −60 Monte carlo, Matrix: Bai/tols4000 Number of samples Trace estimate MC on diag(A^−1) MC on diag(E) MC on diag(E2) MC on A^−1 MC on E Trace of A^−1

200 400 600 800 1000 1200 −134 −132 −130 −128 −126 −124 −122 −120 −118 −116 Monte carlo, Matrix: Bai/tols4000 Number of samples Trace estimate MC on diag(E2) MC on A^−1 MC on E Trace of A^−1

200 400 600 800 1000 −5000 5000 10000 15000 Monte carlo, Matrix: Bai/dw2048 Number of samples Trace estimate MC on diag(A−1) MC on diag(E) MC on diag(E2) MC on A−1 MC on E Trace of A

−1

200 400 600 800 1000 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 Monte carlo, Matrix: Bai/dw2048 Number of samples Trace estimate MC on diag(E2) MC on A−1 MC on E Trace of A

−1

[ 15 ]

slide-16
SLIDE 16

Experiments Fitting allows ILU(A+σI) Often ILU on A is not possible, ill-conditioned, or too expensive Better results if we use a better conditioned ILU(A+σI) and allow the fitting to fix the diagonal

10

−10

10

−5

10 10

−6

10

−4

10

−2

10 10

2

Matrix tols4000: Statistical error varying shifts Shifts Standard derivation

Std(diag(E2)) Std(diag(A−1)) Std(diag(E)) Std(A−1) Std(E)

[ 16 ]

slide-17
SLIDE 17

Experiments Sometimes ZN

2 better

QCD matrix (49K) close to kc EPB3

200 400 600 800 1000 −9040 −9035 −9030 −9025 −9020 −9015 Monte carlo, Matrix: matb5 Number of samples Trace estimate MC on diag(E2) MC on A−1 MC on E 200 400 600 800 1000 2.12 2.14 2.16 2.18 2.2 2.22 2.24 2.26 2.28 2.3 x 10

6

Monte carlo, Matrix: epb3 Number of samples Trace estimate MC on diag(E2) MC on A−1 MC on E

Here E had smaller off diagonal variance — Not easily predictable by theory

[ 17 ]

slide-18
SLIDE 18

Dynamically identifying smallest variance For every fitting point i = 1,...,m Compute ai = A−1ei

  • Based on aii = A−1

ii update estimates for

var(diag(A)) var(diag(E)) (aii −Mi) var(diag(E2)) (aii − p(Mi))

  • Use ai2 −a2

ii to update estimate for

var(MC on A) = A2

F

  • Compute µi = U−1L−1ei and update estimate for

var(MC on E) (ai − µi2 −a2

ii − µ2 ii)

Large differences in various methods would show after a few points

[ 18 ]

slide-19
SLIDE 19

Dynamically identifying smallest variance Estimated variance converges to actual variance Relative differences apparent almost immediately

10

1

10

2

10

−2

10 10

2

10

4

10

6

Matrix cage8: Variance Estimation Number of fitting points Variance

  • Estim. Var(diag(E2))
  • Estim. Var(A

−1)

  • Estim. Var(E)

Actual Var(diag(E2)) Actual Var(A

−1)

Actual Var(E)

10

1

10

2

10

1

10

2

Matrix bwm200: Variance Estimation Number of fitting points Variance

  • Estim. Var(diag(E2))
  • Estim. Var(A

−1)

  • Estim. Var(E)

Actual Var(diag(E2)) Actual Var(A

−1)

Actual Var(E)

[ 19 ]

slide-20
SLIDE 20

Dynamically identifying smallest variance Given a total s of allowed steps, ask what method will give the smallest error at s Eg., the matb5 QCD matrix:

10 20 30 40 50 60 70 80 90 100 2 4 6 8 10 12 14 num of points used in fitting Predicted Error

Predicted Error at 200th quadrature with Rademacher on A Predicted Error at 200th quadrature with Rademacher on E Predicted Error at 200th quadrature with unit. Basis on E2

After 10 steps, excellent match between estimated and observed variances

[ 20 ]

slide-21
SLIDE 21

Conclusions A method to approximate Tr(A−1) based on ILU

  • Negligible additional computational cost
  • Very good accuracy, if ILU is effective
  • Fitting improves accuracy
  • MC on the fitted diagonal improves speed too

Easy to monitor and choose the most appropriate MC estimator

[ 21 ]