Constraining Gaussian Processes by Variational Fourier Features - - PowerPoint PPT Presentation

constraining gaussian processes by variational fourier
SMART_READER_LITE
LIVE PREVIEW

Constraining Gaussian Processes by Variational Fourier Features - - PowerPoint PPT Presentation

Constraining Gaussian Processes by Variational Fourier Features Arno Solin Aalto University Joint work with Manon Kok (and earlier work with Nicolas Durrande, James Hensman, and Simo S arkk a) September 12, 2019 @arnosolin


slide-1
SLIDE 1

Constraining Gaussian Processes by Variational Fourier Features

Arno Solin Aalto University

Joint work with Manon Kok (and earlier work with Nicolas Durrande, James Hensman, and Simo S¨ arkk¨ a)

September 12, 2019

@arnosolin

arno.solin.fi

slide-2
SLIDE 2

Constraining Gaussian processes by variational Fourier features Arno Solin 2/35

Outline

Motivation Model Low-rank representation Non-Gaussian likelihoods Examples How this relates to SLAM Conclusion

slide-3
SLIDE 3

Constraining Gaussian processes by variational Fourier features Arno Solin 3/35

The idea

slide-4
SLIDE 4

Constraining Gaussian processes by variational Fourier features Arno Solin 4/35

What?

◮ Gaussian processes (GPs) provide a powerful framework for extrapolation, interpolation, and noise removal in regression and classification ◮ We constrain GPs to arbitrarily-shaped domains with boundary conditions ◮ Applications in, e.g., imaging, spatial analysis, robotics, or general ML tasks

slide-5
SLIDE 5

Constraining Gaussian processes by variational Fourier features Arno Solin 5/35

Why is this non-trivial?

GPs provide convenient ways for model specification and inference, but . . . ◮ Issue #1: How to represent this prior? ◮ Issue #2: Limitations in scaling do large data sets ◮ Issue #3: Limitations in dealing with non-Gaussian likelihoods

slide-6
SLIDE 6

Constraining Gaussian processes by variational Fourier features Arno Solin 6/35

Hilbert Space Methods for Reduced-Rank GPs

slide-7
SLIDE 7

Constraining Gaussian processes by variational Fourier features Arno Solin 7/35

Problem formulation

◮ Gaussian process (GP) regression problem: f(x) ∼ GP(0, κ(x, x′)), yi = f(xi) + εi. ◮ The GP-regression has cubic computational complexity O(n3) in the number of measurements. ◮ This results from the inversion of an n × n matrix: E[f(x∗)] = κ(x∗, x1:n) (κ(x1:n, x1:n) + σ2

nI)−1 y

V[f(x∗)] = κ(x∗, x∗) − κ(x∗, x1:n) (κ(x1:n, x1:n) + σ2

nI)−1 κ(x1:n, x∗).

◮ Various sparse, reduced-rank, and related approximations have been developed for mitigating this problem.

slide-8
SLIDE 8

Constraining Gaussian processes by variational Fourier features Arno Solin 8/35

Covariance operator

◮ For covariance function κ(x, x′) we can define covariance

  • perator:

K φ =

  • κ(·, x′) φ(x′) dx′.

◮ For stationary covariance function κ(x, x′) κ(r); r = x − x′ we get S(ω) =

  • κ(r) e−i ωTr dr.

◮ The transfer function corresponding to the operator K is S(ω) = F[K]. ◮ The spectral density S(ω) also gives the approximate eigenvalues of the operator K.

slide-9
SLIDE 9

Constraining Gaussian processes by variational Fourier features Arno Solin 9/35

Laplacian operator series

◮ In isotropic case S(ω) S(ω), we can expand S(ω) = a0 + a1ω2 + a2(ω2)2 + a3(ω2)3 + · · · . ◮ The Fourier transform of the Laplace operator ∇2 is −ω2, i.e., K = a0 + a1(−∇2) + a2(−∇2)2 + a3(−∇2)3 + · · · . ◮ Defines a pseudo-differential operator as a series of differential operators. ◮ Let us now approximate the Laplacian operators with a Hilbert method...

slide-10
SLIDE 10

Constraining Gaussian processes by variational Fourier features Arno Solin 10/35

Series expansions of GPs

◮ Assume a covariance function κ(x, x′) and an inner product, say, f, g =

f(x) g(x) w(x) dx. ◮ The inner product induces a Hilbert-space of (random) functions. ◮ If we fix a basis {φj(x)}, a Gaussian process f(x) can be expanded into a series f(x) =

  • j=1

fj φj(x), where fj are jointly Gaussian. ◮ If we select φj to be the eigenfunctions of κ(x, x′) w.r.t. ·, ·, then this becomes a Karhunen–Lo` eve series. ◮ In the Karhunen–Lo` eve case the coefficients fj are independent Gaussian.

slide-11
SLIDE 11

Constraining Gaussian processes by variational Fourier features Arno Solin 11/35

Hilbert-space approximation of the Laplacian

◮ Consider the eigenvalue problem for the Laplacian operators:

  • −∇2φj(x) = λ2

j φj(x),

x ∈ Ω, φj(x) = 0, x ∈ ∂Ω. ◮ The eigenfunctions φj(·) are orthonormal w.r.t. inner product f, g =

f(x) g(x) dx,

φi(x) φj(x) dx = δij. ◮ The negative Laplacian has the formal kernel ℓ(x, x′) =

  • j

λ2

j φj(x) φj(x′)

in the sense that −∇2f(x) =

  • ℓ(x, x′) f(x′) dx′.
slide-12
SLIDE 12

Constraining Gaussian processes by variational Fourier features Arno Solin 12/35

Approximation of the covariance function

◮ Recall that we have the expansion K = a0 + a1(−∇2) + a2(−∇2)2 + a3(−∇2)3 + · · · . ◮ Substituting the formal kernel gives κ(x, x′) ≈ a0 + a1 ℓ1(x, x′) + a2 ℓ2(x, x′) + a3 ℓ3(x, x′) + · · · =

  • j
  • a0 + a1 λ2

j + a2 λ4 j + a3 λ6 j + · · ·

  • φj(x) φj(x′).

◮ Evaluating the spectral density series at ω2 = λ2

j gives

S(λj) = a0 + a1λ2

j + a2λ4 j + a3λ6 j + · · · .

◮ This leads to the final approximation κ(x, x′) ≈

  • j

S(λj) φj(x) φj(x′).

slide-13
SLIDE 13

Constraining Gaussian processes by variational Fourier features Arno Solin 13/35

Accuracy of the approximation

5ℓ ν → ∞ ν = 7

2

ν = 5

2

ν = 3

2

ν = 1

2

Exact m = 12 m = 32 m = 64 m = 128

Approximations to covariance functions of the Mat´ ern class of various degrees of smoothness; ν = 1/2 corresponds to the exponential Ornstein–Uhlenbeck covariance function, and ν → ∞ to the squared exponential (exponentiated quadratic) covariance function.

slide-14
SLIDE 14

Constraining Gaussian processes by variational Fourier features Arno Solin 14/35

Gaussian processes on a sphere

Easy to apply in simple domains (hyper-spheres, hyper-cubes, . . . )

slide-15
SLIDE 15

Constraining Gaussian processes by variational Fourier features Arno Solin 15/35

Reduced-rank method for GP regression

◮ Recall the GP-regression problem f(x) ∼ GP(0, κ(x, x′)) yi = f(xi) + εi. ◮ Let us now approximate f(x) ≈

m

  • j=1

fj φj(x), where fj ∼ N(0, S(λj)). ◮ Via the matrix inversion lemma we then get E[f(x∗)] ≈ φT

∗(ΦTΦ + σ2 nΛ−1)−1ΦTy,

V[f(x∗)] ≈ σ2

nφT ∗(ΦTΦ + σ2 nΛ−1)−1φ∗.

slide-16
SLIDE 16

Constraining Gaussian processes by variational Fourier features Arno Solin 16/35

Computational complexity

◮ The computation of ΦTΦ takes O(nm2) operations. ◮ The covariance function parameters do not enter Φ and we need to evaluate ΦTΦ only once (nice in parameter estimation). ◮ The scaling in input dimensionality can be quite bad—but depends on the chosen domain.

slide-17
SLIDE 17

Constraining Gaussian processes by variational Fourier features Arno Solin 17/35

Airline delay example

◮ Every commercial flight in the US for 2008 (n ≈ 6 M). ◮ Inputs, x: Age of the aircraft, route distance, airtime, departure time, arrival time, day of the week, day of the month, and month. ◮ Target, y: Delay at landing (in minutes). ◮ Additive model: f(x) ∼ GP(0,

8

  • d=1

κse(xd, x′

d))

yi = f(xi) + εi, εi ∼ N(0, σ2

n)

slide-18
SLIDE 18

Constraining Gaussian processes by variational Fourier features Arno Solin 17/35

Airline delay example

◮ Every commercial flight in the US for 2008 (n ≈ 6 M). ◮ Inputs, x: Age of the aircraft, route distance, airtime, departure time, arrival time, day of the week, day of the month, and month. ◮ Target, y: Delay at landing (in minutes). ◮ Additive model: f(x) ∼ GP(0,

8

  • d=1

κse(xd, x′

d))

yi = f(xi) + εi, εi ∼ N(0, σ2

n)

Results

slide-19
SLIDE 19

Constraining Gaussian processes by variational Fourier features Arno Solin 18/35

slide-20
SLIDE 20

Constraining Gaussian processes by variational Fourier features Arno Solin 19/35

The model

In terms of a GP prior and a likelihood, this can be written as

  • f(x) ∼ GP(0, κ(x, x′)),

x ∈ Ω s.t. f(x) = 0, x ∈ ∂Ω y | f ∼

n

  • i=1

p(yi | f(xi)) where (xi, yi) are the n input–output pairs

slide-21
SLIDE 21

Constraining Gaussian processes by variational Fourier features Arno Solin 20/35

Why is this non-trivial?

GPs provide convenient ways for model specification and inference, but . . . ◮ Issue #1: How to represent this prior? ◮ Issue #2: Limitations in scaling do large data sets ◮ Issue #3: Limitations in dealing with non-Gaussian likelihoods

slide-22
SLIDE 22

Constraining Gaussian processes by variational Fourier features Arno Solin 21/35

Addressing the three issues

◮ As a pre-processing step, we solve a Fourier-like generalised harmonic feature representation of the GP prior in the domain of interest ◮ Both constrains the GP and attains a low-rank representation that is used for speeding up inference ◮ The method scales as O(nm2) in prediction and O(m3) in hyperparameter learning (n number of data, m features) ◮ A variational approach to allow the method to deal with non-Gaussian likelihoods

slide-23
SLIDE 23

Constraining Gaussian processes by variational Fourier features Arno Solin 22/35

Low-rank representation

◮ Given a domain Ω ⊂ Rd (d typically 1–3), we project the GP onto the eigenbasis of the Laplace operator, ∇2, that solves the eigenvalue problem:

  • −∇2φj(x) = λ2

j φj(x),

x ∈ Ω, φj(x) = 0, x ∈ ∂Ω. ◮ The approximate eigenvalues and eigenfunctions of the Laplacian in Ω (s.t. the the boundary conditions) can be solved numerically

slide-24
SLIDE 24

Constraining Gaussian processes by variational Fourier features Arno Solin 23/35

Domain and discrete Laplacian

+ −∇2 ≈

Finite difference approximation of the operator in a discrete grid of the image.

slide-25
SLIDE 25

Constraining Gaussian processes by variational Fourier features Arno Solin 24/35

Harmonic basis functions

slide-26
SLIDE 26

Constraining Gaussian processes by variational Fourier features Arno Solin 25/35

Representation of the GP prior

◮ We require the covariance function κ(·, ·) to be stationary ◮ Leverage the link between stationary covariance functions and the Laplacian for approximating the covariance function by the eigendecomposition and the spectral density function: κ(x, x′) ≈

m

  • j=1

S(λj) φj(x) φj(x′) = ΦΛΦT, where s(·) is the spectral density function of κ(·, ·) ◮ As Φ does not depend on the hyperparameters and Λ is diagonal, we also get a computational boost

slide-27
SLIDE 27

Constraining Gaussian processes by variational Fourier features Arno Solin 26/35

Samples from the GP prior

slide-28
SLIDE 28

Constraining Gaussian processes by variational Fourier features Arno Solin 27/35

Non-Gaussian likelihoods

◮ For non-Gaussian likelihoods, we set up a variational approach and maximize the ELBO ◮ In practice, we form a Gaussian approximation to the posterior q(u), for the set of m harmonic basis functions ◮ Optimise the ELBO with respect to the mean and variance

  • f the approximation
slide-29
SLIDE 29

Constraining Gaussian processes by variational Fourier features Arno Solin 28/35

Regression example

◮ Alternative approaches: Zero-noise measurements along the boundary for constraining the GP , and applying general-purpose approximations

slide-30
SLIDE 30

Constraining Gaussian processes by variational Fourier features Arno Solin 29/35

Regression example

◮ Naive full GP (baseline) ◮ Our method ◮ Fully independent training conditional (FITC) ◮ Variational Fourier features (VFF)

slide-31
SLIDE 31

Constraining Gaussian processes by variational Fourier features Arno Solin 30/35

Banana classification example

◮ The outermost decision boundary comes form the prior (know boundary of uncertainty) ◮ The posterior improves with the number of harmonic basis functions

slide-32
SLIDE 32

Constraining Gaussian processes by variational Fourier features Arno Solin 31/35

Modelling tick density in the Netherlands

◮ 9 months of tick bites from https://tekenradar.nl ◮ 4,446 data points ◮ A log-Gaussian Cox process model (Poisson likelihood) ◮ Modelling the log intensity as a GP with boundary conditions

slide-33
SLIDE 33

Constraining Gaussian processes by variational Fourier features Arno Solin 32/35

Simultaneous localisation and mapping (SLAM)

View on YouTube: https://youtu.be/pbwWLoh6mvI

Kok and Solin. Scalable Magnetic Field SLAM in 3D Using Gaussian Process Maps. FUSION’18.

slide-34
SLIDE 34

Constraining Gaussian processes by variational Fourier features Arno Solin 33/35

Recap

◮ Constraining GPs to arbitrarily-shaped domains with boundary conditions ◮ Utilizes the link between the stationary covariance functions and the Laplace operator ◮ Applications in, e.g., imaging, spatial analysis, robotics, or general ML tasks

slide-35
SLIDE 35

Constraining Gaussian processes by variational Fourier features Arno Solin 34/35

Bibliography

  • A. Solin and M. Kok (2019). Know your boundaries: Constraining

Gaussian processes by variational harmonic features. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR 89:2193–2202.

  • A. Solin and S. S¨

arkk¨ a (2019). Hilbert space methods for reduced-rank Gaussian process regression. Statistics and Computing.

  • J. Hensman, N. Durrande, and A. Solin (2018). Variational Fourier

features for Gaussian processes. Journal of Machine Learning Research (JMLR), 18(151):1–52.

  • M. Kok and A. Solin (2018). Scalable magnetic field SLAM in 3D using

Gaussian process maps. Proceedings of the International Conference

  • n Information Fusion (FUSION), pages 1353–1360.
slide-36
SLIDE 36

Constraining Gaussian processes by variational Fourier features Arno Solin 35/35

◮ Homepage: http://arno.solin.fi ◮ Twitter: @arnosolin