SLIDE 1 Gaussian Processes for Big Data
James Hensman
joint work with
Nicol´
SLIDE 2
Overview
Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples
SLIDE 3
Overview
Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples
SLIDE 4
Motivation
Inference in a GP has the following demands:
Complexity: O(n3) Storage: O(n2)
Inference in a sparse GP has the following demands:
Complexity: O(nm2) Storage: O(nm) where we get to pick m!
SLIDE 5 Still not good enough!
Big Data
◮ In parametric models, stochastic optimisation is used. ◮ This allows for application to Big Data.
This work
◮ Show how to use Stochastic Variational Inference in GPs ◮ Stochastic optimisation scheme: each step requires O(m3)
SLIDE 6
Overview
Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples
SLIDE 7 Computational savings
Knn ≈ Qnn = KnmK−1
mmKmn
Instead of inverting Knn, we make a low rank (or Nystr¨
approximation, and invert Kmm instead.
SLIDE 8 Information capture
Everything we want to do with a GP involves marginalising f
◮ Predictions ◮ Marginal likelihood ◮ Estimating covariance parameters
The posterior of f is the central object. This means inverting Knn.
SLIDE 9 X, y
input space (X) f u n c t i
v a l u e s
SLIDE 10 X, y f(x) ∼ GP
input space (X) f u n c t i
v a l u e s
SLIDE 11 X, y f(x) ∼ GP p(f) = N (0, Knn)
input space (X) f u n c t i
v a l u e s
SLIDE 12 X, y f(x) ∼ GP p(f) = N (0, Knn) p(f | y, X)
input space (X) f u n c t i
v a l u e s
SLIDE 13
Introducing u
Take and extra M points on the function, u = f(Z). p(y, f, u) = p(y | f)p(f | u)p(u)
SLIDE 14
Introducing u
SLIDE 15 Introducing u
Take and extra M points on the function, u = f(Z). p(y, f, u) = p(y | f)p(f | u)p(u) p(y | f) = N
- y|f, σ2I
- p(f | u) = N
- f|KnmKmmıu,
K
SLIDE 16 X, y f(x) ∼ GP p(f) = N (0, Knn) p(f | y, X) Z, u p(u) = N (0, Kmm)
input space (X) f u n c t i
v a l u e s
SLIDE 17 X, y f(x) ∼ GP p(f) = N (0, Knn) p(f | y, X) p(u) = N (0, Kmm)
input space (X) f u n c t i
v a l u e s
SLIDE 18 The alternative posterior
Instead of doing
p(f | y, X) = p(y | f)p(f | X)
We’ll do
p(u | y, Z) = p(y | u)p(u | Z)
SLIDE 19 The alternative posterior
Instead of doing
p(f | y, X) = p(y | f)p(f | X)
We’ll do
p(u | y, Z) = p(y | u)p(u | Z)
but p(y | u) involves inverting Knn
SLIDE 20 Variational marginalisation of f
ln p(y | u) = ln
SLIDE 21 Variational marginalisation of f
ln p(y | u) = ln
ln p(y | u) = ln Ep(f | u,X) p(y | f)
SLIDE 22 Variational marginalisation of f
ln p(y | u) = ln
ln p(y | u) = ln Ep(f | u,X) p(y | f) ln p(y | u) ≥ Ep(f | u,X) ln p(y | f) ln p(y | u)
SLIDE 23 Variational marginalisation of f
ln p(y | u) = ln
ln p(y | u) = ln Ep(f | u,X) p(y | f) ln p(y | u) ≥ Ep(f | u,X) ln p(y | f) ln p(y | u) No inversion of Knn required
SLIDE 24 An approximate likelihood
n
N
mnK−1 mmu, σ2
exp
2σ2
mnK−1 mmkmn
- A straightforward likelihood approximation, and a penalty
term
SLIDE 25
Overview
Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples
SLIDE 26 log p(y | X) ≥ L1 + log p(u) − log q(u)
q(u) L3.
(1) L3 =
n
mnK−1 mmm, β−1
− 1 2β ki,i − 1 2tr (SΛi)
(2)
SLIDE 27 Optimisation
The variational objective L3 is a function of
◮ the parameters of the covariance function ◮ the parameters of q(u) ◮ the inducing inputs, Z
Strategy: set Z. Take the data in small minibatches, take stochastic gradient steps in the covariance function parameters, stochastic natural gradient steps in the parameters of q(u).
SLIDE 28
Overview
Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples
SLIDE 29
SLIDE 30 UK apartment prices
◮ Monthly price paid data for February to October 2012
(England and Wales)
◮ from http://data.gov.uk/dataset/
land-registry-monthly-price-paid-data/
◮ 75,000 entries ◮ Cross referenced against a postcode database to get
lattitude and longitude
◮ Regressed the normalised logarithm of the apartment
prices
SLIDE 31
SLIDE 32
SLIDE 33 Airline data
◮ Flight delays for every
commercial flight in the USA from January to April 2008.
◮ Average delay was 30
minutes.
◮ We randomly selected
800,000 datapoints (we have limited memory!)
◮ 700,000 train, 100,000 test
Month DayOfMonth DayOfWeek DepTime ArrTime AirTime Distance PlaneAge 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Inverse lengthscale
SLIDE 34 N=800 N=1000 N=1200 32 33 34 35 36 37
RMSE
GPs on subsets
200 400 600 800 1000 1200
iteration
32 33 34 35 36 37
SVI GP
SLIDE 35
SLIDE 36
Download the code!
github.com/SheffieldML/GPy
Cite our paper!
Hensman, Fusi and Lawrence, Gaussian Processes for Big Data Proceedings of UAI 2013