ERCIM 2013 Fitting techniques for estimating the trace of the - PowerPoint PPT Presentation

ERCIM 2013 Fitting techniques for estimating the trace of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis University of Minnesota Stratis Gallopoulos University of Patras, Greece Acks: NSF, DOE SciDAC

The problem Given a large, N × N matrix A and a function f find trace of f ( A ) : Tr ( f ( A )) Common functions f ( A ) = A − 1 , log ( A ) , exp ( A ) , R T i A − 1 R j , ... Applications: UQ, Data Mining, Quantum Monte Carlo, Lattice QCD Our focus: f ( A ) = A − 1 but techniques general [ 2 ]

Standard underlying method Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n x = randZ2( N ,1) sum = sum + x T A − 1 x trace = sum / n [ 3 ]

Standard underlying method Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace 2 problems for i=1: n Large number of samples x = randZ2( N ,1) How to compute x T A − 1 x sum = sum + x T A − 1 x trace = sum / n [ 4 ]

Standard underlying method Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n Solve Ay = x vs quadrature x T A − 1 x x = randZ2( N ,1) sum = sum + x T A − 1 x Golub’69, Bai’95, Meurant’06,’09, Strakos’11 O ( 100 − 1000 s ) statistically independent RHS trace = sum / n Recycling (de Sturler), Deflation (Morgan, AS’07) [ 5 ]

Selecting the vectors in x T A − 1 x to min variance or error Random x ∈ Z N best variance for real matrices (Hutchinson 1989) 2 variance depends only on diag ( A − 1 ) x = e i x = F T e i mixing F = DFT, Hadamard (Avron et al. 2010) Deterministic x = He i , i = 1 ,..., 2 k Hadamard in natural order (Bekas et al. 2007) � 1 i ∈ C m x m i = Probing. Assumes multicolored graph (Tang et al. 2011) 0 else x = H ( p m , k i ) Hierarchical Probing for lattices (A.S, J.L. 2013) Maintains benefits of probing but cheap and incremental [ 6 ]

Variance of the estimators Rademacher vectors x i ∈ Z N 2 s ∑ i � = j ( A − 1 Tr = 1 s ∑ s i A − 1 x i Var ( Tr ) = 2 s � ˜ A − 1 � 2 F = 2 i = 1 x T i j ) 2 Diagonal x = e j ( i ) Var ( Tr ) = N 2 i = 1 A − 1 Tr = N s ∑ s s Var ( diag ( A − 1 )) j ( i ) , j ( i ) magnitude −1 A variance Unclear which method is best a-priori [ 7 ]

Why focus on the diagonal method? Trace = integral of a 1-D signal. Can we improve Monte Carlo? Not without external information about the distribution of diagonal elements Our goals: • What if we have an approximation M ≈ diag( A − 1 )? • Is Tr ( M ) ≈ Tr ( A − 1 ) sufficient? • If not, can we use fitting p ( M ) (regression/interpolation/quadrature)? • Can the fitting reduce Var ( p ( M ) − diag ( A − 1 )) ? [ 8 ]

Approximations to diag( A − 1 ) • Inexpensive bounds on diagonal elements (Robinson and Wathen ’92) e.g., for A SPD, 1 / A ii often capture the pattern of diag ( A − 1 ) • Let [ L , U ] = ILU ( A ) (incomplete LU) and M = diag ( U − 1 L − 1 ) Requires only A − 1 i , j entries from sparsity of L , U (Erisman, Tienny, ’75) • Eigen/singular vectors M = diag ( X Λ − 1 Y T ) , for nev smallest eigenvalues Already available from deflating multiple right hand sides! Number of eigenvectors can be increased while solving Ax = e i (eigCG) [ 9 ]

For some problems M captures pattern of diag( A − 1 ) well Laplacian delsq(numgrid(’S’,34)) 1.4 diag(A −1 ) diag(XL −1 X T ) 1.2 diag((LU) −1 ) Deflation: 15 smallest eigenpairs 1 ILU(’crout’,’row’,0.01) 0.8 Traces not close but 0.6 Var ( diag ( X Λ − 1 X T − A − 1 )) = 4e-4 0.4 Var ( diag (( LU ) − 1 − A − 1 )) = 1e-2 0.2 0 0 200 400 600 800 1000 Index of diagonal MC on diag ( A − 1 − M ) can be competitive to Hutchinson’s method [ 10 ]

In some cases approximation is pointless Rajat10 circuit simulation matrix (size 30202) 1 0.5 0 −0.5 diag(A −1 diag(M) −1 0 0.5 1 1.5 2 2.5 3 Index of diagonal element 4 x 10 M from 100 smallest singular triplets [ 11 ]

Capture pattern better by fitting M to D = diag ( A − 1 ) MC resolves shift D = c + M , but not scale D = bM (variance may increase!) Approach 1. Least squares fit with bM + c i A − 1 e i , for i ∈ S a set of k indices 1. Solve D i = e T 2. Find [ b , c ] = argmin {� D ( S ) − ( bM ( S )+ c ) � 2 , b , c ∈ ℜ } Not many points (linear systems) are needed. Typically 10-20. Significant improvement in the estimation of trace Reduces variance for potentially continuing with MC [ 12 ]

Approach 1 example D ≈ bM + c Matrix RDB5000, 50 smallest singular triplets, k=20 points used to fit Accuracy of systems and singular vectors is 1e-6. 0.15 diag(A −1 diag(A −1 M fitted: diag(bM+c) 0.1 0.1 0.05 0.05 0 0 −0.05 −0.05 −0.1 −0.1 −0.15 −0.15 −0.2 −0.2 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Index of diagonal Index of diagonal Tr ( A − 1 ) − 267 . 880 Var ( D ) 6 . 1 e − 3 Tr ( b ∗ M + c ) − 267 . 544 Var ( D − M ) 4 . 3 e − 4 Rel . Err . 1 E − 3 Var ( D − bM − c ) 4 . 3 e − 5 [ 13 ]

Better fitting Linear model preserves shape of M , thus relies too much on the quality of M Interpolating with a higher degree polynomial could be noisy. Approach 2 basic. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) i A − 1 e i , for i ∈ S a set of k indices 1. Solve D i = e T 2. Fit p ( M ( S )) = D ( S ) For PCHIP to effectively capture the pattern (global and local) of D it needs: • smoothness of the approximant • elements of M ( S ) to appear in increasing order • to capture the whole range of values of D • to capture where most of the action in D is happening [ 14 ]

Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 1. [ ˜ M , J ] = sort( M ) to obtain a CDF-like, smooth graph 2. Choose Q a set of k indices: { 1 , 2 } ∈ Q and the k − 2 are chosen such that they minimize the integration error with trapezoidal rule of ˜ M . Do not consider indices that produce non-unique ˜ M i values. 0.15 0.05 diag(A −1 M 0.1 0 0.05 0 −0.05 M −0.05 selected points −0.1 diag(A −1 ) −0.1 −0.15 −0.15 −0.2 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Index of diagonal Diagonal index of sorted M [ 15 ]

Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 1. [ ˜ M , J ] = sort( M ) 2. Choose Q a set of k indices. 3. S = J ( Q ) the corresponding indices in original ordering i A − 1 e i , for i ∈ S 4. Solve D i = e T 0.05 0 −0.05 −0.1 M diag(A −1 ) −0.15 computed A −1 (ii) 0 1000 2000 3000 4000 5000 Diagonal index of sorted M [ 16 ]

Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 1. [ ˜ M , J ] = sort( M ) 2. Choose Q a set of k indices. 3. S = J ( Q ) original ordering i A − 1 e i , for i ∈ S 4. Solve D i = e T 5. PCHIP fit p ( M ( S )) = D ( S ) . Use p ( M ) ≈ D 0.05 0.05 0 0 −0.05 −0.05 −0.1 −0.1 M M diag(A −1 ) diag(A −1 ) −0.15 −0.15 computed A −1 (ii) p(M) fit 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Diagonal index of sorted M Diagonal index of sorted M [ 17 ]

Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 1. [ ˜ M , J ] = sort( M ) 2. Choose Q a set of k indices. If (4) computes also evecs, 3. S = J ( Q ) original ordering update points incrementally i A − 1 e i , for i ∈ S 4. Solve D i = e T 5. PCHIP fit p ( M ( S )) = D ( S ) . Use p ( M ) ≈ D 0.05 0.05 0 0 −0.05 −0.05 −0.1 −0.1 M M diag(A −1 ) diag(A −1 ) −0.15 −0.15 computed A −1 (ii) p(M) fit 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Diagonal index of sorted M Diagonal index of sorted M [ 18 ]

Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 0.045 diag(A −1 ) p(M) bM+c 0.04 0.035 Improvement over bM+c 0.03 0.025 0.02 3500 4000 4500 5000 Diagonal index of sorted M −0.12 0.02 diag(A −1 ) p(M) −0.13 0 bM+c −0.14 −0.02 −0.15 −0.04 −0.16 −0.06 −0.17 −0.08 diag(A −1 ) p(M) −0.18 −0.1 bM+c 0 500 1000 1500 2000 2200 2400 2600 2800 3000 Diagonal index of sorted M Diagonal index of sorted M [ 19 ]

Fitting examples nev=k=100: OLM5000, SiNa, KUU Comparing diag(A −1 ), diag(M) and diag(M_fit) Comparing diag(A −1 ), diag(M) and diag(M_fit) 0 diag(A −1 ) −0.194 M Fit p(M) −0.196 −0.05 −0.198 Diagonal element Diagonal element −0.2 diag(A −1 ) −0.1 OLM5000 M −0.202 Fit p(M) −0.204 −0.15 −0.206 −0.208 −0.2 −0.21 0 1000 2000 3000 4000 5000 0 500 1000 1500 2000 2500 Index of sorted M Index of sorted M Comparing diag(A −1 ), diag(M) and diag(M_fit) Comparing diag(A −1 ), diag(M) and diag(M_fit) 1.4 0.2 1.2 0.15 1 0.1 0.8 KUU SiNa 0.05 0.6 0 0.4 −0.05 diag(A −1 ) diag(A −1 ) 0.2 M M −0.1 Fit p(M) Fit p(M) 0 0 1000 2000 3000 4000 5000 6000 7000 0 1000 2000 3000 4000 5000 Index of sorted M Index of sorted M [ 20 ]

ERCIM 2013 Fitting techniques for estimating the trace of the - PowerPoint PPT Presentation

ERCIM 2013 Fitting techniques for estimating the trace of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis University of Minnesota Stratis Gallopoulos University of Patras,

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Estimating Criteria for for Fitting Fitting B B- -spline Curves spline Curves: : Estimating

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Trace Caches and optimizations therein CSE 240C - Rushi Chakrabarti - Winter 2009 Trace Caches

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

and Size of Social Networks via Random Walk Stephen J. Hardiman* Liran Katzir Capital Fund

ProofTheory: Logicaland Philosophical Aspects Class 3: BeyondSequents Greg Restall and Shawn

Scalable methods for optimal control of systems governed by PDEs with random coefficient fields

Draft Lecture III notes for Les Houches 2014 Joel E. Moore, UC Berkeley and LBNL (Dated: August

LEARNING DISORDERED TOPOLOGICAL PHASES BY STATISTICAL RECOVERY OF SYMMETRY

Non-Idempotent Plonka Functions and Hyperidentities Non-Idempotent Plonka Functions and

Computer Supported Modeling and Reasoning David Basin, Achim D. Brucker, Jan-Georg Smaus, and

Representation theorems and the semantics of (semi)lattice based logics Viorica

ERCIM 2013 Fitting techniques for estimating the trace of the - PowerPoint PPT Presentation

ERCIM 2013 Fitting techniques for estimating the trace of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis University of Minnesota Stratis Gallopoulos University of Patras,

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Estimating Criteria for for Fitting Fitting B B- -spline Curves spline Curves: : Estimating

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Trace Caches and optimizations therein CSE 240C - Rushi Chakrabarti - Winter 2009 Trace Caches

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Over fitting distribution functions over Bayesian Regression / &quot; ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

and Size of Social Networks via Random Walk Stephen J. Hardiman* Liran Katzir Capital Fund

ProofTheory: Logicaland Philosophical Aspects Class 3: BeyondSequents Greg Restall and Shawn

Scalable methods for optimal control of systems governed by PDEs with random coefficient fields

Draft Lecture III notes for Les Houches 2014 Joel E. Moore, UC Berkeley and LBNL (Dated: August

LEARNING DISORDERED TOPOLOGICAL PHASES BY STATISTICAL RECOVERY OF SYMMETRY

Non-Idempotent Plonka Functions and Hyperidentities Non-Idempotent Plonka Functions and

Computer Supported Modeling and Reasoning David Basin, Achim D. Brucker, Jan-Georg Smaus, and

Representation theorems and the semantics of (semi)lattice based logics Viorica

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist