Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 - PowerPoint PPT Presentation

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 1

GPs and Computational Complexity 2

2 exp n 1 i The problem with GPs x n 2 j 2 ij d ij Update covariance parameter? n 3 log 2 2 n 1 Unless you are lucky (or clever), Gaussian process models are difficult to x 2 1 log 2 1 Evaluate the (log) likelihood? n 3 0 1 Z with Z i Chol Want to sample y ? 3 scale to large problems. For a Gaussian process y ∼ N ( µ , Σ ) :

2 exp n 1 i The problem with GPs x n 2 j 2 ij d ij Update covariance parameter? n 3 log 2 2 n 1 Unless you are lucky (or clever), Gaussian process models are difficult to x 2 1 log 2 1 Evaluate the (log) likelihood? Want to sample y ? 3 scale to large problems. For a Gaussian process y ∼ N ( µ , Σ ) : µ + Chol ( Σ ) × Z with Z i ∼ N ( 0 , 1 ) O ( n 3 )

2 exp n 1 i The problem with GPs Unless you are lucky (or clever), Gaussian process models are difficult to n 2 j 2 ij d ij Update covariance parameter? 2 n 1 1 Want to sample y ? Evaluate the (log) likelihood? 3 2 scale to large problems. For a Gaussian process y ∼ N ( µ , Σ ) : µ + Chol ( Σ ) × Z with Z i ∼ N ( 0 , 1 ) O ( n 3 ) 2 ( x − µ ) ′ Σ − 1 ( x − µ ) − ( n 3 ) − log | Σ | − log 2 π O

The problem with GPs 1 Update covariance parameter? 2 n Unless you are lucky (or clever), Gaussian process models are difficult to 1 2 3 Want to sample y ? Evaluate the (log) likelihood? scale to large problems. For a Gaussian process y ∼ N ( µ , Σ ) : µ + Chol ( Σ ) × Z with Z i ∼ N ( 0 , 1 ) O ( n 3 ) 2 ( x − µ ) ′ Σ − 1 ( x − µ ) − ( n 3 ) − log | Σ | − log 2 π O { Σ } ij = σ 2 exp ( −{ d } ij ϕ ) + σ 2 O ( n 2 ) n 1 i = j

n 2 - Quadratic complexity - Pray n 3 - Cubic complexity - Give up A simple guide to computational complexity - Go for it 4 O ( n ) - Linear complexity

n 2 - Quadratic complexity - Pray n 3 - Cubic complexity - Give up A simple guide to computational complexity 4 O ( n ) - Linear complexity - Go for it

n 3 - Cubic complexity - Give up A simple guide to computational complexity - Pray 4 O ( n ) - Linear complexity - Go for it O ( n 2 ) - Quadratic complexity

n 3 - Cubic complexity - Give up A simple guide to computational complexity 4 O ( n ) - Linear complexity - Go for it O ( n 2 ) - Quadratic complexity - Pray

A simple guide to computational complexity - Give up 4 O ( n ) - Linear complexity - Go for it O ( n 2 ) - Quadratic complexity - Pray O ( n 3 ) - Cubic complexity

A simple guide to computational complexity 4 O ( n ) - Linear complexity - Go for it O ( n 2 ) - Quadratic complexity - Pray O ( n 3 ) - Cubic complexity - Give up

How bad is the problem? 5 30 method 20 time (secs) chol inv LU inv QR inv 10 0 2500 5000 7500 10000 n

Practice - Migratory Model Prediction 0.052 0.208 2.3 3. Calc. p o chol p o Z 0.049 0.9 op 4. Calc. Allele Prob 0.129 0.127 1.0 Total 1.732 0.465 3.7 Total run time: CPU (28.9 min), CPU+GPU (7.8 min) 0.467 o After fitting the GP need to sample from the posterior predictive distribution p , Step CPU (secs) CPU+GPU (secs) Rel. Performance 1 Calc. 1. po 1.080 0.046 23.0 2. Calc. chol p po 6 at ∼ 3000 locations y p ∼ N ( µ p + Σ po Σ − 1 o ( y o − µ o ) , Σ p − Σ po Σ − 1 ) o Σ op

Practice - Migratory Model Prediction Calc. Allele Prob 2.3 3. 0.049 0.052 0.9 4. 0.129 0.467 0.127 1.0 Total 1.732 0.465 3.7 Total run time: CPU (28.9 min), CPU+GPU (7.8 min) 0.208 6 After fitting the GP need to sample from the posterior predictive distribution 1. Step CPU (secs) Rel. Performance CPU+GPU (secs) 1.080 0.046 23.0 2. at ∼ 3000 locations y p ∼ N ( µ p + Σ po Σ − 1 o ( y o − µ o ) , Σ p − Σ po Σ − 1 ) o Σ op Calc. Σ p , Σ po Calc. chol (Σ p − Σ po Σ − 1 o Σ op ) Calc. µ p | o + chol (Σ p | o ) × Z

Cholesky CPU vs GPU (P100) 7 30 method chol inv 20 LU inv time (secs) QR inv comp cpu gpu 10 0 2500 5000 7500 10000 n

8 10.0 method chol inv LU inv time (secs) QR inv comp cpu gpu 0.1 2500 5000 7500 10000 n

Relative Performance 9 10 Relative performance method chol inv LU inv QR inv 1 2500 5000 7500 10000 n

Aside - Matrix Multiplication 10 Matrix Multiplication 7.5 time (sec) comp 5.0 cpu gpu 2.5 0.0 2500 5000 7500 10000 n

An even bigger hammer • Uses both shared and distributed memory bigGP is an R package written by Chris Paciorek (UC Berkeley), et al. 11 • Specialized distributed implementation of linear algebra operation for GPs • Designed to run on large super computer clusters • Able to fit models on the order of n = 65k (32 GB Cov. matrix) Cholesky Decomposition 1000 Execution time, seconds (log scale) 100 10 1 6 cores 0.1 60 cores 816 cores 12480 cores 49920 cores 0.01 2048 8192 32768 131072 Matrix dimension, n (log scale)

More scalable solutions? • Spectral domain / basis functions • Covariance tapering • GMRF approximations • Low-rank approximations • Nearest-neighbor models 12

Low Rank Approximations 13

V t V t n diag S k diag S Low rank approximations in general m n m m n U U k k k m n m Lets look at the example of the singular value decomposition of a matrix, S the singular values. Usually the singular values and vectors are ordered M U n where U are called the left singular vectors, V the right singular vectors, and such that the signular values are in descending order. It turns out (Eckart–Young theorem) that we can approximate M as having rank r by defining S to only have the r largest singular values (others set to zero). M 14 n × m = n × n diag ( S ) V t m × m n × m

Low rank approximations in general S the singular values. Usually the singular values and vectors are ordered U U Lets look at the example of the singular value decomposition of a matrix, zero). S to only have the r largest singular values (others set to It turns out (Eckart–Young theorem) that we can approximate M as having such that the signular values are in descending order. M where U are called the left singular vectors, V the right singular vectors, and U M 14 n × m = n × n diag ( S ) V t m × m n × m rank r by defining ˜ ˜ n × n diag (˜ = ˜ n × k diag (˜ ˜ n × m = S ) S ) V t V t m × m k × m n × m k × k

Example 1 50 0 25 0 32 0 45 0 79 0 17 0 00 0 00 0 51 0 37 0 25 0 51 0 32 0 37 0 45 0 58 0 79 0 58 0 51 Rank 2 approximation: 0 333 0 140 0 166 0 203 0 249 0 166 0 200 0 251 0 203 0 51 0 251 0 330 0 501 0 249 0 333 0 501 1 000 M 15 t   1 . 000 0 . 500 0 . 333 0 . 250 M = 0 . 500 0 . 333 0 . 250 0 . 200  = U diag ( S ) V    0 . 333 0 . 250 0 . 200 0 . 167 0 . 250 0 . 200 0 . 167 0 . 143   − 0 . 79 − 0 . 18 − 0 . 03 0 . 58 − 0 . 45 − 0 . 37 U = V =  0 . 74 0 . 33    − 0 . 32 − 0 . 51 − 0 . 10 − 0 . 79 − 0 . 25 − 0 . 51 − 0 . 64 0 . 51 S = ( 0 . 00 ) 1 . 50 0 . 17 0 . 01

Example t Rank 2 approximation: 15   1 . 000 0 . 500 0 . 333 0 . 250 M = 0 . 500 0 . 333 0 . 250 0 . 200  = U diag ( S ) V    0 . 333 0 . 250 0 . 200 0 . 167 0 . 250 0 . 200 0 . 167 0 . 143   − 0 . 79 − 0 . 18 − 0 . 03 0 . 58 − 0 . 45 − 0 . 37 U = V =  0 . 74 0 . 33    − 0 . 32 − 0 . 51 − 0 . 10 − 0 . 79 − 0 . 25 − 0 . 51 − 0 . 64 0 . 51 S = ( 0 . 00 ) 1 . 50 0 . 17 0 . 01   − 0 . 79 0 . 58 ( ) ( ) − 0 . 45 − 0 . 37 − 0 . 79 − 0 . 45 − 0 . 32 − 0 . 25 ˜ M = 1 . 50 0 . 00     − 0 . 32 − 0 . 51 − 0 . 37 − 0 . 51 − 0 . 51 0 . 00 0 . 17 0 . 58 − 0 . 25 − 0 . 51   1 . 000 0 . 501 0 . 333 0 . 249 = 0 . 501 0 . 330 0 . 251 0 . 203     0 . 333 0 . 251 0 . 200 0 . 166 0 . 249 0 . 203 0 . 166 0 . 140

Approximation Error m Strong dependence (large eff. range): We can measure the error of the approximation using the Frobenius norm, n 16 1 / 2   ∥ M − ˜ ∑ ∑ ( M ij − ˜ M ∥ F = M ij ) 2   i = 1 j = 1

Approximation Error m We can measure the error of the approximation using the Frobenius norm, Strong dependence (large eff. range): n 16 1 / 2   ∥ M − ˜ ∑ ∑ ( M ij − ˜ M ∥ F = M ij ) 2   i = 1 j = 1 SVD Low Rank SVD 15 15 Error (Frob. norm) Singular Values 10 10 5 5 0 0 0 10 20 30 40 50 0 10 20 30 40 50

Weak dependence (short eff. range): 17 SVD Low Rank SVD 10 4 Error (Frob. norm) 8 Singular Values 3 6 2 4 1 2 0 0 0 10 20 30 40 50 0 10 20 30 40 50 Rank

1 is k V t A How does this help? (Sherman-Morrison-Woodbury) There is an immensely useful linear algebra identity, the k which is hopefully small. 1 U is k 1 S • diag S . k which is hopefully small, or even better S • S 1 . diag A , then it is trivial to find A • Imagine that A How does this help? 18 A a decomposed matrix, Sherman-Morrison- Woodbury formula, for the inverse (and determinant) of M ) − 1 ( − 1 = ˜ n × m + U n × k S k × k V t n × m k × m = A − 1 − A − 1 U S − 1 + V t A − 1 U ) − 1 V t A − 1 . (

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 - PowerPoint PPT Presentation

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 1 GPs and Computational Complexity 2 2 exp n 1 i The problem with GPs x n 2 j 2 ij d ij Update covariance parameter? n 3 log 2 2 n 1 Unless you are lucky (or

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

SuperMatrix: A Multithreaded Runtime Scheduling System for Algorithms-by-Blocks Ernie Chan, Field

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Mixed models in R using the lme4 package Part 6: Theory of linear mixed models, evaluating

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Douglas Bates 8 th

LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University

Quadratic forms and Ellipses Max Turgeon 19/09/2019 In these notes, I want to clarify a few

Some Areas of Recent Research Michael Stein University of Chicago Department Retreat, October

Introduction to Mobile Robotics A Compact Course on Linear Algebra Wolfram Burgard, Cyrill

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 - PowerPoint PPT Presentation

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 1 GPs and Computational Complexity 2 2 exp n 1 i The problem with GPs x n 2 j 2 ij d ij Update covariance parameter? n 3 log 2 2 n 1 Unless you are lucky (or

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

SuperMatrix: A Multithreaded Runtime Scheduling System for Algorithms-by-Blocks Ernie Chan, Field

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Mixed models in R using the lme4 package Part 6: Theory of linear mixed models, evaluating

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Douglas Bates 8 th

LU Factorization Marco Chiarandini Department of Mathematics &amp; Computer Science University

Quadratic forms and Ellipses Max Turgeon 19/09/2019 In these notes, I want to clarify a few

Some Areas of Recent Research Michael Stein University of Chicago Department Retreat, October

Introduction to Mobile Robotics A Compact Course on Linear Algebra Wolfram Burgard, Cyrill

LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University