sparse gaussian processes with spherical harmonic features
play

Sparse Gaussian Processes with Spherical Harmonic Features Vincent - PowerPoint PPT Presentation

Sparse Gaussian Processes with Spherical Harmonic Features Vincent Dutordoir 1 , Nicolas Durrande 1 and James Hensman 2 1 PROWLER.io, 2 Amazon (Work completed while JH was at PROWLER.io) International Conference of Machine Learning 2020


  1. Sparse Gaussian Processes with Spherical Harmonic Features Vincent Dutordoir 1 , Nicolas Durrande 1 and James Hensman 2 1 PROWLER.io, 2 Amazon (Work completed while JH was at PROWLER.io) International Conference of Machine Learning – 2020

  2. Contribution We improve the scaling of Sparse GPs with #datapoints and #inputs Wall-clock Time NLPD (Error) 1000 1.50 Airline dataset: 918.77 1.31 1.29 Regression problem 750 NLPD (Lower is better) 1.00 6 . 10 6 datapoints Time (seconds) 500 8 input dimensions 0.50 250 Setup 41.32 GTX 1070 GPU 0 0.00 SVGP * VISH * Models 2 / 14

  3. Variational Inference with Spherical Harmonics (VISH) y Gist of method: make inputs d + 1 dimensional bias project data radially on S d Fast SVGP on the sphere map predictions on S d back to the original space x The efficiency of VISH comes from using spherical harmonics as inducing functions for the SVGP on the sphere. 3 / 14

  4. From inducing points to inducing features Inducing Points VISH u m = f ( z m ) u m = � f , φ m � H K uu = K uu = K − 1 uu is O ( M 3 ) K − 1 uu is O ( M ) Orthogonality of the basisfunctions φ leads to diagonal K uu and O ( M ) inversion 4 / 14

  5. Deep-dive 5 / 14

  6. Sparse Variational Gaussian processes Scalable and flexible Capture the GP by a set of inducing variables u = f ( Z ) , at locations z 1 , . . . , z M . 6 / 14

  7. Sparse Variational Gaussian processes Scalable and flexible Capture the GP by a set of inducing variables u = f ( Z ) , at locations z 1 , . . . , z M . Minimise KL-divergence from p ( f ( · ) | y ) to q ( f ( · )) = GP ( µ ( · ) , ν ( · , · ′ )) � u ( · ) K − 1 µ ( · ) = k ⊤ uu m , u ( · ) K − 1 uu ( K uu − S ) K − 1 ν ( · , · ′ ) = k ( · , · ′ ) − k ⊤ uu k u ( · ′ ) where [ K uu ] m , m ′ = Cov ( u m , u m ′ ) and [ k u ( · )] m = Cov ( u m , f ( · )) . 6 / 14

  8. Sparse Variational Gaussian processes Scalable and flexible Capture the GP by a set of inducing variables u = f ( Z ) , at locations z 1 , . . . , z M . Minimise KL-divergence from p ( f ( · ) | y ) to q ( f ( · )) = GP ( µ ( · ) , ν ( · , · ′ )) � u ( · ) K − 1 µ ( · ) = k ⊤ uu m , u ( · ) K − 1 uu ( K uu − S ) K − 1 ν ( · , · ′ ) = k ( · , · ′ ) − k ⊤ uu k u ( · ′ ) where [ K uu ] m , m ′ = Cov ( u m , u m ′ ) and [ k u ( · )] m = Cov ( u m , f ( · )) . A more flexible (e.g. non-Gaussian likelihoods) and scalable (e.g. mini-batching) model at a cost of O ( M 3 + M 2 N ) . 6 / 14

  9. Sparse Variational Gaussian processes Scalable and flexible Capture the GP by a set of inducing variables u = f ( Z ) , at locations z 1 , . . . , z M . Minimise KL-divergence from p ( f ( · ) | y ) to q ( f ( · )) = GP ( µ ( · ) , ν ( · , · ′ )) � u ( · ) K − 1 µ ( · ) = k ⊤ uu m , u ( · ) K − 1 uu ( K uu − S ) K − 1 ν ( · , · ′ ) = k ( · , · ′ ) − k ⊤ uu k u ( · ′ ) where [ K uu ] m , m ′ = Cov ( u m , u m ′ ) and [ k u ( · )] m = Cov ( u m , f ( · )) . A more flexible (e.g. non-Gaussian likelihoods) and scalable (e.g. mini-batching) model at a cost of O ( M 3 + M 2 N ) . Speedup through structure in the K uu matrix (e.g. Hensman et al 2017, VFF). 6 / 14

  10. Outline Gaussian processes on the circle and hypersphere Spherical harmonics as inducing features Linear projection data on the hyper-sphere 7 / 14

  11. Gaussian processes on the circle f = � k ( θ 1 , θ 2 ) = i ξ i φ i ( θ ) , with � ∞ Φ( θ ) = [cos( i θ ) , sin( i θ )] ∞ i = 0 λ i φ i ( θ 1 ) φ i ( θ 2 ) ξ i ∼ N ( 0 , λ i ) i = 0 2 3.0 2.5 1 z 2.0 0 1.5 1 1.0 1.0 0.5 0.5 1.0 0.0 0.5 y 0.0 0.0 0.5 0.5 /2 0 /2 x 1.0 1.0 1 2 8 / 14

  12. Spherical Harmonics Orthonormal basis on the hyper sphere Eigenfunctions the Laplace-Beltrami operator ∆ S d − 1 φ i = λ i φ i Eigenfunction of zonal kernels 9 / 14

  13. Mercer’s theorem for zonal kernels on the sphere Zonal kernels are the spherical counterpart of stationary x’ kernels k ( x , x ′ ) = k ′ ( distance ( x , x ′ )) . x T x x 10 / 14

  14. Mercer’s theorem for zonal kernels on the sphere Zonal kernels are the spherical counterpart of stationary x’ kernels k ( x , x ′ ) = k ′ ( distance ( x , x ′ )) . x T x x Mercer’s decomposition: Any zonal kernel k on the hyper- sphere can be decomposed as ∞ � k ( x , x ′ ) = λ i φ i ( x ) φ i ( x ′ ) . i = 0 10 / 14

  15. Mercer’s theorem for zonal kernels on the sphere Zonal kernels are the spherical counterpart of stationary x’ kernels k ( x , x ′ ) = k ′ ( distance ( x , x ′ )) . x T x x Mercer’s decomposition: Any zonal kernel k on the hyper- sphere can be decomposed as ∞ � k ( x , x ′ ) = λ i φ i ( x ) φ i ( x ′ ) . i = 0 Karhunen–Loève expansion: A GP f on the hypersphere with zonal covariance k can be written f = � i ξ i φ i with ξ i ∼ N ( 0 , λ i ) : 10 / 14

  16. Spherical harmonics as inducing features in SVGPs Define the kernel’s RKHS H with reproducing inner-product: � k ( x , · ) , h ( · ) � H = h ( x ) 11 / 14

  17. Spherical harmonics as inducing features in SVGPs Define the kernel’s RKHS H with reproducing inner-product: � k ( x , · ) , h ( · ) � H = h ( x ) Approximate posterior constructed out of inducing features u m = � f , φ m � H 11 / 14

  18. Spherical harmonics as inducing features in SVGPs Define the kernel’s RKHS H with reproducing inner-product: � k ( x , · ) , h ( · ) � H = h ( x ) Approximate posterior constructed out of inducing features u m = � f , φ m � H ⇒ Diagonal covariance matrix: [ K uu ] m , m ′ = Cov ( u m , u m ′ ) = � φ m , φ m ′ � H = λ − 1 = m δ mm ′ 11 / 14

  19. Spherical harmonics as inducing features in SVGPs Define the kernel’s RKHS H with reproducing inner-product: � k ( x , · ) , h ( · ) � H = h ( x ) Approximate posterior constructed out of inducing features u m = � f , φ m � H ⇒ Diagonal covariance matrix: [ K uu ] m , m ′ = Cov ( u m , u m ′ ) = � φ m , φ m ′ � H = λ − 1 = m δ mm ′ = ⇒ Spherical Harmonics as features [ k u ( · )] m = Cov ( u m , f ( · )) = φ m ( · ) 11 / 14

  20. Spherical harmonics as inducing features in SVGPs Define the kernel’s RKHS H with reproducing inner-product: � k ( x , · ) , h ( · ) � H = h ( x ) Approximate posterior constructed out of inducing features u m = � f , φ m � H ⇒ Diagonal covariance matrix: [ K uu ] m , m ′ = Cov ( u m , u m ′ ) = � φ m , φ m ′ � H = λ − 1 = m δ mm ′ = ⇒ Spherical Harmonics as features [ k u ( · )] m = Cov ( u m , f ( · )) = φ m ( · ) ⇒ A O ( M 2 N ) approximate GP q ( f ( · )) = � � Φ ⊤ ( · ) m ; k ( · , · ′ ) − Φ ⊤ ( · )( Λ − S ) Φ ( · ′ ) GP , where Λ = diag ( λ 1 , . . . , λ M ) and Φ ( · ) = [ φ 1 ( · ) , . . . , φ M ( · )] . 11 / 14

  21. Linear mapping to the hypersphere Most datasets do not correspond to data on a hypersphere... y The proposed solution is to augment bias the inputs with a constant variable (bias) before projecting it radially onto the hypersphere. x Although such construction may seem arbitrary, it is used implicitly in the Arc-Cosine kernel [Cho & Saul, 2009]: x ⊤ x ′ k ( x , x ′ ) = � x �� x ′ � (sin θ + ( π − θ ) cos θ ) with θ = arccos � x �� x ′ � . � �� � � �� � radial angular 12 / 14

  22. Experiment Airline dataset: 6,000,000 datapoints regression task fitted in 40 seconds on a single cheap GTX 1070 GPU NLPD Wall-clock Time 1.50 1000 918.77 1.31 1.32 1.29 750 1.00 Time (Seconds) NLPD (Error) 500 0.50 250 41.32 75.61 0.00 0 SVGP Additive-VFF * VISH * SVGP Additive-VFF * VISH * models models 13 / 14

  23. Conclusion Summary of the advantages It is the fastest SVGP model to date ⇒ No need for expensive hardware The natural ordering of spherical harmonics makes our model scale nicely with the input dimension ⇒ Does not suffer from the curse of dimensionality as VFF Similarities with Arc-cosine kernel makes extrapolation properties similar to Neural Networks Reach out to have a chat if you want to know more! 14 / 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend