constraining gaussian processes by variational fourier
play

Constraining Gaussian Processes by Variational Fourier Features - PowerPoint PPT Presentation

Constraining Gaussian Processes by Variational Fourier Features Arno Solin Aalto University Joint work with Manon Kok (and earlier work with Nicolas Durrande, James Hensman, and Simo S arkk a) September 12, 2019 @arnosolin


  1. Constraining Gaussian Processes by Variational Fourier Features Arno Solin Aalto University Joint work with Manon Kok (and earlier work with Nicolas Durrande, James Hensman, and Simo S¨ arkk¨ a) September 12, 2019 � @arnosolin � arno.solin.fi

  2. Outline Motivation Conclusion Examples Model Low-rank How this relates representation to SLAM Non-Gaussian likelihoods Constraining Gaussian processes by variational Fourier features Arno Solin 2/35

  3. The idea Constraining Gaussian processes by variational Fourier features Arno Solin 3/35

  4. What? ◮ Gaussian processes (GPs) provide a powerful framework for extrapolation, interpolation, and noise removal in regression and classification ◮ We constrain GPs to arbitrarily-shaped domains with boundary conditions ◮ Applications in, e.g. , imaging, spatial analysis, robotics, or general ML tasks Constraining Gaussian processes by variational Fourier features Arno Solin 4/35

  5. Why is this non-trivial? GPs provide convenient ways for model specification and inference, but . . . ◮ Issue #1: How to represent this prior? ◮ Issue #2: Limitations in scaling do large data sets ◮ Issue #3: Limitations in dealing with non-Gaussian likelihoods Constraining Gaussian processes by variational Fourier features Arno Solin 5/35

  6. Hilbert Space Methods for Reduced-Rank GPs Constraining Gaussian processes by variational Fourier features Arno Solin 6/35

  7. Problem formulation ◮ Gaussian process (GP) regression problem: f ( x ) ∼ GP ( 0 , κ ( x , x ′ )) , y i = f ( x i ) + ε i . ◮ The GP-regression has cubic computational complexity O ( n 3 ) in the number of measurements. ◮ This results from the inversion of an n × n matrix: n I ) − 1 y E [ f ( x ∗ )] = κ ( x ∗ , x 1 : n ) ( κ ( x 1 : n , x 1 : n ) + σ 2 n I ) − 1 κ ( x 1 : n , x ∗ ) . V [ f ( x ∗ )] = κ ( x ∗ , x ∗ ) − κ ( x ∗ , x 1 : n ) ( κ ( x 1 : n , x 1 : n ) + σ 2 ◮ Various sparse, reduced-rank, and related approximations have been developed for mitigating this problem. Constraining Gaussian processes by variational Fourier features Arno Solin 7/35

  8. Covariance operator ◮ For covariance function κ ( x , x ′ ) we can define covariance operator: � K φ = κ ( · , x ′ ) φ ( x ′ ) d x ′ . ◮ For stationary covariance function κ ( x , x ′ ) � κ ( � r � ) ; r = x − x ′ we get � κ ( r ) e − i ω T r d r . S ( ω ) = ◮ The transfer function corresponding to the operator K is S ( ω ) = F [ K ] . ◮ The spectral density S ( ω ) also gives the approximate eigenvalues of the operator K . Constraining Gaussian processes by variational Fourier features Arno Solin 8/35

  9. Laplacian operator series ◮ In isotropic case S ( ω ) � S ( � ω � ) , we can expand S ( � ω � ) = a 0 + a 1 � ω � 2 + a 2 ( � ω � 2 ) 2 + a 3 ( � ω � 2 ) 3 + · · · . ◮ The Fourier transform of the Laplace operator ∇ 2 is −� ω � 2 , i.e. , K = a 0 + a 1 ( −∇ 2 ) + a 2 ( −∇ 2 ) 2 + a 3 ( −∇ 2 ) 3 + · · · . ◮ Defines a pseudo-differential operator as a series of differential operators. ◮ Let us now approximate the Laplacian operators with a Hilbert method... Constraining Gaussian processes by variational Fourier features Arno Solin 9/35

  10. Series expansions of GPs ◮ Assume a covariance function κ ( x , x ′ ) and an inner product, say, � � f , g � = f ( x ) g ( x ) w ( x ) d x . Ω ◮ The inner product induces a Hilbert-space of (random) functions. ◮ If we fix a basis { φ j ( x ) } , a Gaussian process f ( x ) can be expanded into a series ∞ � f ( x ) = f j φ j ( x ) , j = 1 where f j are jointly Gaussian. ◮ If we select φ j to be the eigenfunctions of κ ( x , x ′ ) w.r.t. �· , ·� , then this becomes a Karhunen–Lo` eve series. ◮ In the Karhunen–Lo` eve case the coefficients f j are independent Gaussian. Constraining Gaussian processes by variational Fourier features Arno Solin 10/35

  11. Hilbert-space approximation of the Laplacian ◮ Consider the eigenvalue problem for the Laplacian operators: � −∇ 2 φ j ( x ) = λ 2 j φ j ( x ) , x ∈ Ω , φ j ( x ) = 0 , x ∈ ∂ Ω . ◮ The eigenfunctions φ j ( · ) are orthonormal w.r.t. inner product � � f , g � = f ( x ) g ( x ) d x , Ω � φ i ( x ) φ j ( x ) d x = δ ij . Ω ◮ The negative Laplacian has the formal kernel � λ 2 ℓ ( x , x ′ ) = j φ j ( x ) φ j ( x ′ ) j in the sense that � −∇ 2 f ( x ) = ℓ ( x , x ′ ) f ( x ′ ) d x ′ . Constraining Gaussian processes by variational Fourier features Arno Solin 11/35

  12. Approximation of the covariance function ◮ Recall that we have the expansion K = a 0 + a 1 ( −∇ 2 ) + a 2 ( −∇ 2 ) 2 + a 3 ( −∇ 2 ) 3 + · · · . ◮ Substituting the formal kernel gives κ ( x , x ′ ) ≈ a 0 + a 1 ℓ 1 ( x , x ′ ) + a 2 ℓ 2 ( x , x ′ ) + a 3 ℓ 3 ( x , x ′ ) + · · · � � � a 0 + a 1 λ 2 j + a 2 λ 4 j + a 3 λ 6 φ j ( x ) φ j ( x ′ ) . = j + · · · j ◮ Evaluating the spectral density series at � ω � 2 = λ 2 j gives S ( λ j ) = a 0 + a 1 λ 2 j + a 2 λ 4 j + a 3 λ 6 j + · · · . ◮ This leads to the final approximation � κ ( x , x ′ ) ≈ S ( λ j ) φ j ( x ) φ j ( x ′ ) . j Constraining Gaussian processes by variational Fourier features Arno Solin 12/35

  13. Accuracy of the approximation ν = 1 ν = 3 ν = 5 ν = 7 ν → ∞ Exact 2 2 2 2 m = 12 m = 32 m = 64 m = 128 0 5 ℓ Approximations to covariance functions of the Mat´ ern class of various degrees of smoothness; ν = 1 / 2 corresponds to the exponential Ornstein–Uhlenbeck covariance function, and ν → ∞ to the squared exponential (exponentiated quadratic) covariance function. Constraining Gaussian processes by variational Fourier features Arno Solin 13/35

  14. Gaussian processes on a sphere Easy to apply in simple domains (hyper-spheres, hyper-cubes, . . . ) Constraining Gaussian processes by variational Fourier features Arno Solin 14/35

  15. Reduced-rank method for GP regression ◮ Recall the GP-regression problem f ( x ) ∼ GP ( 0 , κ ( x , x ′ )) y i = f ( x i ) + ε i . ◮ Let us now approximate m � f ( x ) ≈ f j φ j ( x ) , j = 1 where f j ∼ N ( 0 , S ( λ j )) . ◮ Via the matrix inversion lemma we then get E [ f ( x ∗ )] ≈ φ T ∗ ( Φ T Φ + σ 2 n Λ − 1 ) − 1 Φ T y , V [ f ( x ∗ )] ≈ σ 2 n φ T ∗ ( Φ T Φ + σ 2 n Λ − 1 ) − 1 φ ∗ . Constraining Gaussian processes by variational Fourier features Arno Solin 15/35

  16. Computational complexity ◮ The computation of Φ T Φ takes O ( nm 2 ) operations. ◮ The covariance function parameters do not enter Φ and we need to evaluate Φ T Φ only once (nice in parameter estimation). ◮ The scaling in input dimensionality can be quite bad—but depends on the chosen domain. Constraining Gaussian processes by variational Fourier features Arno Solin 16/35

  17. Airline delay example ◮ Every commercial flight in the US for 2008 ( n ≈ 6 M). ◮ Inputs, x : Age of the aircraft, route distance, airtime, departure time, arrival time, day of the week, day of the month, and month. ◮ Target, y : Delay at landing (in minutes). ◮ Additive model: 8 � f ( x ) ∼ GP ( 0 , κ se ( x d , x ′ d )) d = 1 ε i ∼ N ( 0 , σ 2 y i = f ( x i ) + ε i , n ) Constraining Gaussian processes by variational Fourier features Arno Solin 17/35

  18. Airline delay example ◮ Every commercial flight in the US for 2008 ( n ≈ 6 M). ◮ Inputs, x : Age of the aircraft, route distance, airtime, departure time, arrival time, day of the week, day of the month, and month. ◮ Target, y : Delay at landing (in minutes). ◮ Additive model: 8 � f ( x ) ∼ GP ( 0 , κ se ( x d , x ′ d )) d = 1 ε i ∼ N ( 0 , σ 2 y i = f ( x i ) + ε i , n ) Results Constraining Gaussian processes by variational Fourier features Arno Solin 17/35

  19. Constraining Gaussian processes by variational Fourier features Arno Solin 18/35

  20. The model In terms of a GP prior and a likelihood, this can be written as � f ( x ) ∼ GP ( 0 , κ ( x , x ′ )) , x ∈ Ω s.t. f ( x ) = 0 , x ∈ ∂ Ω n � y | f ∼ p ( y i | f ( x i )) i = 1 where ( x i , y i ) are the n input–output pairs Constraining Gaussian processes by variational Fourier features Arno Solin 19/35

  21. Why is this non-trivial? GPs provide convenient ways for model specification and inference, but . . . ◮ Issue #1: How to represent this prior? ◮ Issue #2: Limitations in scaling do large data sets ◮ Issue #3: Limitations in dealing with non-Gaussian likelihoods Constraining Gaussian processes by variational Fourier features Arno Solin 20/35

  22. Addressing the three issues ◮ As a pre-processing step, we solve a Fourier-like generalised harmonic feature representation of the GP prior in the domain of interest ◮ Both constrains the GP and attains a low-rank representation that is used for speeding up inference ◮ The method scales as O ( nm 2 ) in prediction and O ( m 3 ) in hyperparameter learning ( n number of data, m features) ◮ A variational approach to allow the method to deal with non-Gaussian likelihoods Constraining Gaussian processes by variational Fourier features Arno Solin 21/35

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend