Gaussian Processes for Big Data James Hensman joint work with - - PowerPoint PPT Presentation

gaussian processes for big data
SMART_READER_LITE
LIVE PREVIEW

Gaussian Processes for Big Data James Hensman joint work with - - PowerPoint PPT Presentation

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes


slide-1
SLIDE 1

Gaussian Processes for Big Data

James Hensman

joint work with

Nicol´

  • Fusi, Neil D. Lawrence
slide-2
SLIDE 2

Overview

Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

slide-3
SLIDE 3

Overview

Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

slide-4
SLIDE 4

Motivation

Inference in a GP has the following demands:

Complexity: O(n3) Storage: O(n2)

Inference in a sparse GP has the following demands:

Complexity: O(nm2) Storage: O(nm) where we get to pick m!

slide-5
SLIDE 5

Still not good enough!

Big Data

◮ In parametric models, stochastic optimisation is used. ◮ This allows for application to Big Data.

This work

◮ Show how to use Stochastic Variational Inference in GPs ◮ Stochastic optimisation scheme: each step requires O(m3)

slide-6
SLIDE 6

Overview

Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

slide-7
SLIDE 7

Computational savings

Knn ≈ Qnn = KnmK−1

mmKmn

Instead of inverting Knn, we make a low rank (or Nystr¨

  • m)

approximation, and invert Kmm instead.

slide-8
SLIDE 8

Information capture

Everything we want to do with a GP involves marginalising f

◮ Predictions ◮ Marginal likelihood ◮ Estimating covariance parameters

The posterior of f is the central object. This means inverting Knn.

slide-9
SLIDE 9

X, y

input space (X) f u n c t i

  • n

v a l u e s

slide-10
SLIDE 10

X, y f(x) ∼ GP

input space (X) f u n c t i

  • n

v a l u e s

slide-11
SLIDE 11

X, y f(x) ∼ GP p(f) = N (0, Knn)

input space (X) f u n c t i

  • n

v a l u e s

slide-12
SLIDE 12

X, y f(x) ∼ GP p(f) = N (0, Knn) p(f | y, X)

input space (X) f u n c t i

  • n

v a l u e s

slide-13
SLIDE 13

Introducing u

Take and extra M points on the function, u = f(Z). p(y, f, u) = p(y | f)p(f | u)p(u)

slide-14
SLIDE 14

Introducing u

slide-15
SLIDE 15

Introducing u

Take and extra M points on the function, u = f(Z). p(y, f, u) = p(y | f)p(f | u)p(u) p(y | f) = N

  • y|f, σ2I
  • p(f | u) = N
  • f|KnmKmmıu,

K

  • p(u) = N (u|0, Kmm)
slide-16
SLIDE 16

X, y f(x) ∼ GP p(f) = N (0, Knn) p(f | y, X) Z, u p(u) = N (0, Kmm)

input space (X) f u n c t i

  • n

v a l u e s

slide-17
SLIDE 17

X, y f(x) ∼ GP p(f) = N (0, Knn) p(f | y, X) p(u) = N (0, Kmm)

  • p(u | y, X)

input space (X) f u n c t i

  • n

v a l u e s

slide-18
SLIDE 18

The alternative posterior

Instead of doing

p(f | y, X) = p(y | f)p(f | X)

  • p(y | f)p(f | X)df

We’ll do

p(u | y, Z) = p(y | u)p(u | Z)

  • p(y | u)p(u | Z)du
slide-19
SLIDE 19

The alternative posterior

Instead of doing

p(f | y, X) = p(y | f)p(f | X)

  • p(y | f)p(f | X)df

We’ll do

p(u | y, Z) = p(y | u)p(u | Z)

  • p(y | u)p(u | Z)du

but p(y | u) involves inverting Knn

slide-20
SLIDE 20

Variational marginalisation of f

ln p(y | u) = ln

  • p(y | f)p(f | u, X)df
slide-21
SLIDE 21

Variational marginalisation of f

ln p(y | u) = ln

  • p(y | f)p(f | u, X)df

ln p(y | u) = ln Ep(f | u,X) p(y | f)

slide-22
SLIDE 22

Variational marginalisation of f

ln p(y | u) = ln

  • p(y | f)p(f | u, X)df

ln p(y | u) = ln Ep(f | u,X) p(y | f) ln p(y | u) ≥ Ep(f | u,X) ln p(y | f) ln p(y | u)

slide-23
SLIDE 23

Variational marginalisation of f

ln p(y | u) = ln

  • p(y | f)p(f | u, X)df

ln p(y | u) = ln Ep(f | u,X) p(y | f) ln p(y | u) ≥ Ep(f | u,X) ln p(y | f) ln p(y | u) No inversion of Knn required

slide-24
SLIDE 24

An approximate likelihood

  • p(y | u) =

n

  • i=1

N

  • yi|k⊤

mnK−1 mmu, σ2

exp

  • − 1

2σ2

  • knn − k⊤

mnK−1 mmkmn

  • A straightforward likelihood approximation, and a penalty

term

slide-25
SLIDE 25

Overview

Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

slide-26
SLIDE 26

log p(y | X) ≥ L1 + log p(u) − log q(u)

q(u) L3.

(1) L3 =

n

  • i=1
  • log N
  • yi|k⊤

mnK−1 mmm, β−1

− 1 2β ki,i − 1 2tr (SΛi)

  • − KL q(u) p(u)

(2)

slide-27
SLIDE 27

Optimisation

The variational objective L3 is a function of

◮ the parameters of the covariance function ◮ the parameters of q(u) ◮ the inducing inputs, Z

Strategy: set Z. Take the data in small minibatches, take stochastic gradient steps in the covariance function parameters, stochastic natural gradient steps in the parameters of q(u).

slide-28
SLIDE 28

Overview

Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

slide-29
SLIDE 29
slide-30
SLIDE 30

UK apartment prices

◮ Monthly price paid data for February to October 2012

(England and Wales)

◮ from http://data.gov.uk/dataset/

land-registry-monthly-price-paid-data/

◮ 75,000 entries ◮ Cross referenced against a postcode database to get

lattitude and longitude

◮ Regressed the normalised logarithm of the apartment

prices

slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33

Airline data

◮ Flight delays for every

commercial flight in the USA from January to April 2008.

◮ Average delay was 30

minutes.

◮ We randomly selected

800,000 datapoints (we have limited memory!)

◮ 700,000 train, 100,000 test

Month DayOfMonth DayOfWeek DepTime ArrTime AirTime Distance PlaneAge 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Inverse lengthscale

slide-34
SLIDE 34

N=800 N=1000 N=1200 32 33 34 35 36 37

RMSE

GPs on subsets

200 400 600 800 1000 1200

iteration

32 33 34 35 36 37

SVI GP

slide-35
SLIDE 35
slide-36
SLIDE 36

Download the code!

github.com/SheffieldML/GPy

Cite our paper!

Hensman, Fusi and Lawrence, Gaussian Processes for Big Data Proceedings of UAI 2013