Solving Large Dense Linear Systems with Covariance Matrices Jie - PowerPoint PPT Presentation

Solving Large Dense Linear Systems with Covariance Matrices Jie Chen Mathematics and Computer Science Division Argonne National Laboratory Solving Large Dense Linear Systems with Covariance Matrices – p. 1/30 SIAM LA, June 19, 2012

Introduction Covariance matrices appear in every corner of statistical analysis: Multivariate statistics Stochastic processes Sampling Max likelihood fitting Interpolation; kriging Regression and classification Prediction; forecasting The handling of covariance matrices incurs many matrix computations Solving Large Dense Linear Systems with Covariance Matrices – p. 2/30 SIAM LA, June 19, 2012

Introduction Example: Sampling Generate a random vector from an n -dimensional normal distribution with mean µ and covariance matrix K . Steps: 1. Compute a Cholesky factorization K = LL T 2. Generate a random vector z with i.i.d. standard normal variables 3. The vector y = Lz + µ is one such sample, because ... E [ y ] = L · E [ z ] + µ = µ cov[ y ] = E [( y − µ )( y − µ ) T ] = E [( Lz )( z T L T )] = K Can replace L by K 1 / 2 , so need to compute K 1 / 2 z . Solving Large Dense Linear Systems with Covariance Matrices – p. 3/30 SIAM LA, June 19, 2012

Introduction Example: Maximum likelihood estimation [Opposite of sampling:] Given a vector y , what is the most likely normal distribution it comes from? Assuming that mean is zero and that the covariance matrix K is parameterized by θ , then maximize the likelihood � � − 1 2 y T K − 1 y L ( θ ) := (2 π ) − n 2 (det K ) − 1 2 exp Can use any optimization method to solve max log L or ∇ log L = 0 Need to evaluate log(det K ) and K − 1 y � N 1 log(det K ) = tr(log K ) ≈ i =1 u i (log K ) u i N [log(det K )] ′ = tr( K − 1 K ′ K − 1 ) ≈ � N 1 i =1 u i ( K − 1 K ′ K − 1 ) u i N Solving Large Dense Linear Systems with Covariance Matrices – p. 4/30 SIAM LA, June 19, 2012

Introduction Example: Interpolation Given some points x i ( i = 1 , . . . , n ) and their function values f ( x i ) , what is the function value of an unknown point x 0 ? If we assume that f is a sample path of a stochastic process with covariance function φ f ( x 0 ) = � n i =1 w i f ( x i ) , then the weights w i are computed as − 1       w 1 φ ( x 1 , x 1 ) · · · φ ( x 1 , x n ) φ ( x 1 , x 0 ) . . . .    ...    . . . .  =       . . . .      w n φ ( x n , x 1 ) · · · φ ( x n , x n ) φ ( x n , x 0 ) Where does the above formula come from? Recall our old friend, least squares: w = ( A T A ) − 1 ( A T y ) . min w � y − Aw � = ⇒ Solving Large Dense Linear Systems with Covariance Matrices – p. 5/30 SIAM LA, June 19, 2012

Introduction We focus on Solving linear system with covariance matrix K , where K ij = φ ( x i − x j ) What is so special/challenging about covariance matrices? Can be very large Can be fully dense Can be increasingly ill-conditioned as matrix size grows Can be associated with a large number of random right-hand sides Positive definite Solving Large Dense Linear Systems with Covariance Matrices – p. 6/30 SIAM LA, June 19, 2012

Linear Solver Consider the conjugate gradient method for solving Kx = b Require: Initial guess x 0 , preconditioner M ≈ K 1: Compute r 0 = b − Kx 0 , z 0 = M − 1 r 0 and p 0 = z 0 2: for j = 0 , 1 , . . . until convergence do 3: α j = ( r j , r j ) / ( Kp j , p j ) 4: x j +1 = x j + α j p j 5: r j +1 = r j − α j Kp j Check list: z j +1 = M − 1 r j +1 6: Matrix-vector mult? 7: β j = ( r j +1 , z j +1 ) / ( r j , z j ) Preconditioner? 8: p j +1 = z j +1 + β j p j Parallelism? 9: end for Solving Large Dense Linear Systems with Covariance Matrices – p. 7/30 SIAM LA, June 19, 2012

A Simple Case: Regular Grid But not at all trivial... Consider that K ij = φ ( x i − x j ) where the x i ’s are on a regular grid K is (multilevel) Toeplitz Multiplying K with vector p requires embedding K to a larger (multilevel) circulant matrix, and padding p with zeros Multiplying (multilevel) circulant matrix needs (multi-dimensional) FFT A (multilevel) circulant preconditioner M can be constructed CG can be extended by using the seed method or block method to handle multiple right-hand sides Parallel implementation? ... is a headache ... because too many data transfers and global synchronizations Solving Large Dense Linear Systems with Covariance Matrices – p. 8/30 SIAM LA, June 19, 2012

Preconditioning In general, how do we precondition K ? Need some knowledge of φ ... Mat´ ern: � √ � √ � ν � 1 2 ν � x � 2 ν � x � φ ( x ) = K ν 2 ν − 1 Γ( ν ) θ θ 1 θ : Scale parameter. Can also make θ = 1, ν = 0.5 θ = 1, ν = 2 function anisotropic. 0.8 ν : Smoothness. Controls the 0.6 φ (x) differentiability of φ at 0 . 0.4 More flexible than Gaussian kernel 0.2 (infinitely differentiable). 0 When ν → ∞ , it is Gaussian. 0 1 2 3 4 |x| Solving Large Dense Linear Systems with Covariance Matrices – p. 9/30 SIAM LA, June 19, 2012

Spectral Density (Covariance, spectral density) pair: � R d f ( ω ) exp( i ω T x ) dω, φ ( x ) = with f ( ω ) > 0 . Spectral density of Mat´ ern kernel 2 ν + θ 2 � ω � 2 � − ( ν + d/ 2) . � f ( ω ) ∝ If regular grid, that is, x j = j/n , then write � [0 , 2 π ) d f n ( ω ) exp( i ω T j ) dω φ ( j/n ) = � ω ∈ [0 , 2 π ) d . f n ( ω ) = n f ( n ◦ ( ω + 2 πl )) , l ∈ Z d Solving Large Dense Linear Systems with Covariance Matrices – p. 10/30 SIAM LA, June 19, 2012

Spectral Density 0.016 f n , n = 10 0.014 0.012 0.01 0.008 0.006 0.004 0.002 f 0 −50 0 50 Solving Large Dense Linear Systems with Covariance Matrices – p. 11/30 SIAM LA, June 19, 2012

Spectrum Bilinear form: � � 2 � � � � � a T Ka = a j exp( i ω T x j ) � � a j a l φ ( x j − x l ) = R d f ( ω ) dω. � � � � j j,l If regular grid, that is, x j = j/n , then write � � 2 � � � � a T Ka = a j exp( i ω T j ) � � [0 , 2 π ) d f n ( ω ) dω � � � � j � � 2 ≈ (2 π ) d � � a j exp( i (2 πk/n ) T j ) � � f n (2 πk/n ) . � � n � � 0 ≤ k ≤ n − 1 0 ≤ j ≤ n − 1 Intuitively, eigenvalues of K “similar to” (2 π ) d f n (2 πk/n ) . Solving Large Dense Linear Systems with Covariance Matrices – p. 12/30 SIAM LA, June 19, 2012

Spectrum Definition: Two sets of real numbers { a ( n ) } j =1 ,...,n and { b ( n ) } j =1 ,...,n are j j equally distributed in the interval [ M 1 , M 2 ] if for any continuous function F : [ M 1 , M 2 ] → R , n 1 � [ F ( a ( n ) ) − F ( b ( n ) lim )] = 0 . j j n n →∞ j =1 Theorem: If φ ∈ L 1 ∩ L 2 , then the set of eigenvalues of K/n and the set { (2 π ) d f n (2 πj/n ) /n } are equally distributed. Message: Loosely speaking, the spectrum of K is like f n evenly sampled on [0 , 2 π ) . Solving Large Dense Linear Systems with Covariance Matrices – p. 13/30 SIAM LA, June 19, 2012

Spectrum ern kernel ( d = 2 , ν = 3 ). Blue: eigenvalues of K . Red: (2 π ) d f n (2 πj/n ) . Mat´ 2 10 1 10 0 10 −1 10 −2 10 −3 10 −4 10 −200 0 200 400 600 800 1000 1200 Solving Large Dense Linear Systems with Covariance Matrices – p. 14/30 SIAM LA, June 19, 2012

Preconditioning Preconditioning idea: suppress the variation of f . � R d −� ω � 2 f ( ω ) exp( i ω T x ) dω ∆ φ ( x ) = Covariance Spectral density 2 ν + θ 2 � ω � 2 � − ( ν + d/ 2) ≍ (1 + � ω � ) − 8 � φ f ( ω ) ∝ � ω � 2 f ( ω ) ≍ � ω � 2 (1 + � ω � ) − 8 ∆ φ ∆ 2 φ � ω � 4 f ( ω ) ≍ � ω � 4 (1 + � ω � ) − 8 ∆ 3 φ � ω � 6 (1 + � ω � ) − 8 ∆ 4 φ � ω � 8 (1 + � ω � ) − 8 . . . . . . Solving Large Dense Linear Systems with Covariance Matrices – p. 15/30 SIAM LA, June 19, 2012

� �� s p sin 2 � ω p Preconditioning − 4 � d f [ s ] p =1 n 2 n ( ω ) = f n ( ω ) 2 Discrete case: ( L , D : discrete Laplacian) Matrix Entry � [0 , 2 π ) d f n ( ω ) exp( i ω T j ) dω K φ ( j/n ) = � n ( ω ) exp( i ω T j ) dω [0 , 2 π ) d f [1] D φ ( j/n ) = � n ( ω ) exp( i ω T j ) dω K [2] = LKL T D 2 φ ( j/n ) = [0 , 2 π ) d f [2] � D 3 φ ( j/n ) = n ( ω ) exp( i ω T j ) dω [0 , 2 π ) d f [3] � n ( ω ) exp( i ω T j ) dω K [4] = LK [2] L T D 4 φ ( j/n ) = [0 , 2 π ) d f [4] Solving Large Dense Linear Systems with Covariance Matrices – p. 16/30 SIAM LA, June 19, 2012

Preconditioning Same Mat´ ern kernel as in page 14. Left: original. Right: after Laplacian. 2 10 1 10 0 10 −1 10 10 7 −2 10 10 6 −3 10 10 5 −4 10 10 4 −200 0 200 400 600 800 1000 1200 −200 0 200 400 600 800 1000 1200 Solving Large Dense Linear Systems with Covariance Matrices – p. 17/30 SIAM LA, June 19, 2012

Solving Large Dense Linear Systems with Covariance Matrices Jie - PowerPoint PPT Presentation

Solving Large Dense Linear Systems with Covariance Matrices Jie Chen Mathematics and Computer Science Division Argonne National Laboratory Solving Large Dense Linear Systems with Covariance Matrices p. 1/30 SIAM LA, June 19, 2012

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Recap: variance/covariance structure for linear mixed models Important features of linear mixed

Solving Systems Solving Systems by Graphing by Solving Systems by Substitution Graphing

Covariance and spectrum Repetition Covariance function: r w ( ) Ew ( t + ) w T ( t )

Dense cold mixes: Preservation of Dense cold mixes: Preservation of county roads county roads

Modelling covariance kernels for nonstationary random fields Christopher G. Small University of

Covariance & anchored t ypes 1 Covariance? Wit hin t he t ype syst em of a programming

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPMC 6th February 2017

Spatial covariance-robust minimax prediction based on experimental design ideas Gunter Spoeck

Spatial Misalignment What is the issue with a single variable? Terminology "points"

Fresh Re-Keying: Security against Side-Channel and Fault Attacks for Low-Cost Devices Marcel

Design of a Privacy-preserving Document Submission and Grading System Benjamin Greschbach,

Towards an Equivalence Theorem for Computer Simulation Experiments? Werner G. Mller Department

The use of GIS in modelling exposure (practical) Kees de Hoogh Swiss TPH Environmental

Introduction Introduction to ILWIS to ILWIS Source: Cees van Westen Associated Institute of

DISRUPTING THE PATH TO VIOLENCE 1. The Power to Prevent Violence 2. Understanding More about

Solving Large Dense Linear Systems with Covariance Matrices Jie - PowerPoint PPT Presentation

Solving Large Dense Linear Systems with Covariance Matrices Jie Chen Mathematics and Computer Science Division Argonne National Laboratory Solving Large Dense Linear Systems with Covariance Matrices p. 1/30 SIAM LA, June 19, 2012

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Recap: variance/covariance structure for linear mixed models Important features of linear mixed

Solving Systems Solving Systems by Graphing by Solving Systems by Substitution Graphing

Covariance and spectrum Repetition Covariance function: r w ( ) Ew ( t + ) w T ( t )

Dense cold mixes: Preservation of Dense cold mixes: Preservation of county roads county roads

Modelling covariance kernels for nonstationary random fields Christopher G. Small University of

Covariance &amp; anchored t ypes 1 Covariance? Wit hin t he t ype syst em of a programming

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPMC 6th February 2017

Spatial covariance-robust minimax prediction based on experimental design ideas Gunter Spoeck

Spatial Misalignment What is the issue with a single variable? Terminology &quot;points&quot;

Fresh Re-Keying: Security against Side-Channel and Fault Attacks for Low-Cost Devices Marcel

Design of a Privacy-preserving Document Submission and Grading System Benjamin Greschbach,

Towards an Equivalence Theorem for Computer Simulation Experiments? Werner G. Mller Department

The use of GIS in modelling exposure (practical) Kees de Hoogh Swiss TPH Environmental

Introduction Introduction to ILWIS to ILWIS Source: Cees van Westen Associated Institute of

DISRUPTING THE PATH TO VIOLENCE 1. The Power to Prevent Violence 2. Understanding More about

Covariance & anchored t ypes 1 Covariance? Wit hin t he t ype syst em of a programming

Spatial Misalignment What is the issue with a single variable? Terminology "points"