exact inference for gaussian process regression in case
play

Exact Inference for Gaussian Process Regression in case of Big Data - PowerPoint PPT Presentation

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1 , 2 , 3 , Burnaev Evgeny 1 , 2 , 3 , Kapushev Yermek 1 , 2 1 Institute for Information Transmission Problems, Moscow, Russia


  1. Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1 , 2 , 3 , Burnaev Evgeny 1 , 2 , 3 , Kapushev Yermek 1 , 2 1 Institute for Information Transmission Problems, Moscow, Russia 2 DATADVANCE, llc, Moscow, Russia 3 PreMoLab, MIPT, Dolgoprudny, Russia ICML 2014 workshop on New Learning Frameworks and Models for Big Data Beijing, 2014 1/39 Burnaev Evgeny

  2. Approximation task Problem statement Let y = g ( x ) be some unknown function. The training sample is given D = { x i , y i } , g ( x i ) = y i , i = 1 , . . . , N. The task is to construct ˆ f ( x ) such that: ˆ f ( x ) ≈ g ( x ) . 2/39 Burnaev Evgeny

  3. Factorial Design of Experiments Factors: s k = { x k i k ∈ X k , i k = 1 , . . . ,n k } , X k ∈ R d k , k = 1 , . . . , K ; 1.0 d k — dimensionality of the factor s k . 0.8 Factorial Design of Experiments: 0.6 i =1 = s 1 × s 2 × · · · × s K . S = { x i } N x 2 0.4 Dimensionality of x ∈ S : d = � K k =1 d k . 0.2 Sample size: N = � K k =1 n k 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x 1 3/39 Burnaev Evgeny

  4. Example of Factorial DoE 1 0.9 0.8 0.7 0.6 0.5 x 3 0.4 0.3 0.2 1 0.1 0.5 0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0 0.2 0.1 0 x 1 x 2 Figure : Factorial DoE with multidimensional factor 4/39 Burnaev Evgeny

  5. DoE in engineering problems 1 Independent groups of variables — factors. 2 Training sample generation procedure: Fix values of the first factor. Perform experiments varying values of other factors. Fix other value of the first factor. Perform new series of experiments. 3 Take into account knowledge from a subject domain. Data properties Has special structure Can have large sample size Factors’ sizes can differ significantly 5/39 Burnaev Evgeny

  6. Example of Factorial DoE Modeling of pressure distribution over the aircraft wing: Angle of attack: s 1 = { 0 , 0 . 8 , 1 . 6 , 2 . 4 , 3 . 2 , 4 . 0 } . Mach number: s 2 = { 0 . 77 , 0 . 78 , 0 . 79 , 0 . 8 , 0 . 81 , 0 . 82 , 0 . 83 } . Wing points coordinates: s 3 — 5000 point in R 3 . The training sample: S = s 1 × s 2 × s 3 . Dimensionality d = 1 + 1 + 3 = 5 . Sample size N = 6 ∗ 7 ∗ 5000 = 210000 . 6/39 Burnaev Evgeny

  7. Existing solutions Universal techniques: Disadvantages: don’t take into account sample structure ⇒ low approximation quality, high computational complexity Multivariate Adaptive Regression Splines [Friedman, 1991] Disadvantages: discontinuous derivatives, non-physical behaviour Tensor product of splines [Stone et al., 1997, Xiao et al., 2013] Disadvantages: only one-dimensional factors, no accuracy evaluation procedure Gaussian Processes on lattice [Dietrich and Newsam, 1997, Stroud et al., 2014] Disadvantages: two-dimensional grid with equidistant points 7/39 Burnaev Evgeny

  8. The aim is to develop computationally efficient algorithm taking into account special features of factorial Design of Experiments 8/39 Burnaev Evgeny

  9. Gaussian Process Regression Function model g ( x ) = f ( x ) + ε ( x ) , where f ( x ) — Gaussian process (GP), ε ( x ) — Gaussian white noise. GP is fully defined by its mean and covariance function. The covariance function of f ( x ) � � � d i ( x ( i ) − x ′ ( i ) ) 2 K f ( x, x ′ ) = σ 2 θ 2 f exp − , i =1 where x ( i ) — i -th component of vector, θ = ( σ 2 f , θ 1 , . . . , θ d ) — parameters of the covariance function. The covariance function of g ( x ) : K g ( x, x ′ ) = K ( x, x ′ ) + σ 2 noise δ ( x, x ′ ) , δ ( x, x ′ ) — Kronecker delta. 9/39 Burnaev Evgeny

  10. Choosing parameters Maximum Likelihood Estimation Loglikelihood log p ( y | X, θ ) = − 1 g y − 1 2 log | K g | − N 2 y T K − 1 2 log 2 π, where | K g | — determinant of matrix K g = � K g ( x i , x j ) � N i,j =1 , x i , x j ∈ S . Parameters θ ∗ are chosen such that θ ∗ = arg max (log p ( y | X, θ )) θ 10/39 Burnaev Evgeny

  11. Final model Prediction of g ( x ) at point x f ( x ) = k T K − 1 ˆ g y , where k = ( K f ( x 1 , x ) , . . . , K f ( x n , x )) . Posterior variance σ 2 ( x ) = K f ( x, x ) − k T K − 1 g k . 11/39 Burnaev Evgeny

  12. Gaussian Processes for factorial DoE Issues: 1 High computational complexity: O ( N 3 ) . In case of factorial DoE the sample size N can be very large. 2 Degeneracy as a result of significantly different factor sizes. 12/39 Burnaev Evgeny

  13. Gaussian Processes for factorial DoE Issues: 1 High computational complexity: O ( N 3 ) . In case of factorial DoE the sample size N can be very large. 2 Degeneracy as a result of significantly different factor sizes. 12/39 Burnaev Evgeny

  14. Estimation of covariance function parameters Loglikelihood: log p ( y | X, θ ) = − 1 g y − 1 2 log | K g | − N 2 y T K − 1 2 log 2 π, Derivatives: ∂θ (log p ( y | X , σ f , σ noise )) = − 1 ∂ g K ′ ) + 1 2 y T K − 1 2Tr( K − 1 g K ′ K − 1 g y , where θ is a parameter of covariance function (component of θ i , σ noise or σ f,i , i = 1 , . . . , d ), and K ′ = ∂ K ∂θ 13/39 Burnaev Evgeny

  15. Tensors and Kronecker product Definition Tensor Y is a K -dimensional matrix of size n 1 ∗ n 2 ∗ · · · ∗ n K : � � y i 1 ,i 2 ,...,i K , { i k = 1 , . . . , n k } K Y = . k =1 Definition The Kronecker product of matrices A and B is a block matrix   a 11 B · · · a 1 n B  . .  ... . . A ⊗ B =  .  . . a m 1 B · · · a mn B 14/39 Burnaev Evgeny

  16. Related operations Operation vec : vec( Y ) = [ Y 1 , 1 ,..., 1 , Y 2 , 1 ,..., 1 , . . . , Y n 1 , 1 ,..., 1 , Y 1 , 2 ,..., 1 , . . . , Y n 1 ,n 2 ,...,n K ] . Multiplication of a tensor by a matrix along k -th direction � Z = Y ⊗ k B ⇔ Z i 1 ,...,i k − 1 ,j,i k +1 ,...i K = Y i 1 ,...,i k ,...,i K B i k j . i k Connection between tensors and the Kronecker product: vec( Y ⊗ 1 B 1 · · · ⊗ K B K ) = (B 1 ⊗ · · · ⊗ B K )vec( Y ) (1) Complexity of computation of the left part — O ( N � k n k ) , of the right part — O ( N 2 ) . 15/39 Burnaev Evgeny

  17. Covariance function Form of the covariance function: � K x i , y i ∈ s i , k i ( x i , y i ) , K f ( x, y ) = i =1 where k i is an arbitrary covariance function for i -th factor. Covariance matrix: � K K = K i , i =1 K i is a covariance matrix for i -th factor. 16/39 Burnaev Evgeny

  18. Fast computation of loglikelihood Proposition Let K i = U i D i U T i be a Singular Value Decomposition (SVD) of the matrix K i , where U i is an orthogonal matrix, and D i is diagonal. Then: � | K g | = D i 1 ,...,i K , i 1 ,...,i K �� � � � � − 1 �� � (2) K − 1 U T D k + σ 2 = noise I U k , g k k k k � g y = vec [(( Y ⊗ 1 U 1 · · · ⊗ K U K ) ∗ D − 1 � K − 1 ⊗ 1 U T 1 · · · ⊗ K U T , K noise I + � where D is a tensor of diagonal elements of the matrix σ 2 k D k 17/39 Burnaev Evgeny

  19. Computational complexity Proposition Calculation of the loglikelihood using (2) has the following computation complexity � � � K � K n 3 O N n i + . i i =1 i =1 Assuming n i ≪ N (number of factors is large and their sizes are close) we get � � � � � N 1+ 1 O N n i = O . K 18/39 Burnaev Evgeny

  20. Fast computation of derivatives Proposition The following statements hold � � � D − 1 � � K � � Tr( K − 1 ˆ g K ′ ) = U i K ′ diag , diag , i U i i =1 1 2 y T K − 1 g K ′ K − 1 g y = �A , A ⊗ 1 K T 1 ⊗ 2 · · · ⊗ i − 1 K T i − 1 ⊗ i � ⊗ i ∂ K T ⊗ i +1 K T i +1 ⊗ i +2 · · · ⊗ K K T i , K ∂θ noise I + � where ˆ D = σ 2 k D k , and vec( A ) = K − 1 g y . The computational complexity is � � � K � K n 3 O N n i + . i i =1 i =1 19/39 Burnaev Evgeny

  21. Gaussian Processes for facotrial DoE Issues: 1 High computational complexity: O ( N 3 ) . In case of factorial DoE the sample size N can be very large. 2 Degeneracy as a result of significantly different factor sizes. 20/39 Burnaev Evgeny

  22. Gaussian Processes for facotrial DoE Issues: 1 High computational complexity: O ( N 3 ) . In case of factorial DoE the sample size N can be very large. 2 Degeneracy as a result of significantly different factor sizes. 20/39 Burnaev Evgeny

  23. Degeneracy Example of degeneracy 2.5 GP regression 2.5 True function training set training set 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 1 1 0.5 0.5 1 0 1 0 0.5 0.5 −0.5 0 0 −0.5 −0.5 −0.5 −1 x 2 −1 −1 x 2 −1 x 1 x 1 Figure : Approximation obtained using GP Figure : Original function from GPML toolbox 21/39 Burnaev Evgeny

  24. Regularization Prior distribution: θ ( i ) − a ( i ) ∼ B e ( α, β ) , { i = 1 , . . . , d k } K k k k =1 , b ( i ) k − a ( i ) k c k C k a ( i ) b ( i ) = x,y ∈ s k ( x ( i ) − y ( i ) ) , = x,y ∈ s k ,x � = y ( x ( i ) − y ( i ) ) k k max min where B e ( α, β ) is the Beta distribution, c k and C k are parameters of the algorithm (we use c k = 0 . 01 and C k = 2 ). Initialization � 1 � �� − 1 θ ( i ) x ∈ s k ( x ( i ) ) − min x ∈ s k ( x ( i ) ) = max . k n k 22/39 Burnaev Evgeny

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend