Gaussian Processes for Big Data James Hensman joint work with - PowerPoint PPT Presentation

Gaussian Processes for Big Data James Hensman joint work with Nicol´ o Fusi, Neil D. Lawrence

Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

Motivation Inference in a GP has the following demands: O ( n 3 ) Complexity: O ( n 2 ) Storage: Inference in a sparse GP has the following demands: O ( nm 2 ) Complexity: O ( nm ) Storage: where we get to pick m !

Still not good enough! Big Data ◮ In parametric models, stochastic optimisation is used. ◮ This allows for application to Big Data. This work ◮ Show how to use Stochastic Variational Inference in GPs ◮ Stochastic optimisation scheme: each step requires O ( m 3 )

Computational savings K nn ≈ Q nn = K nm K − 1 mm K mn Instead of inverting K nn , we make a low rank (or Nystr¨ om) approximation, and invert K mm instead.

Information capture Everything we want to do with a GP involves marginalising f ◮ Predictions ◮ Marginal likelihood ◮ Estimating covariance parameters The posterior of f is the central object. This means inverting K nn .

s e u l a v n X , y o i t c n u f input space (X)

s e u l a v n X , y o i t c n u f f ( x ) ∼ G P input space (X)

s e u l a v n X , y o i t c n u f ( x ) ∼ G P f p ( f ) = N ( 0 , K nn ) input space (X)

s e u l a v n X , y o i t c n u f ( x ) ∼ G P f p ( f ) = N ( 0 , K nn ) p ( f | y , X ) input space (X)

Introducing u Take and extra M points on the function, u = f ( Z ). p ( y , f , u ) = p ( y | f ) p ( f | u ) p ( u )

Introducing u

s e u l a v n X , y o i t c n u f ( x ) ∼ G P f p ( f ) = N ( 0 , K nn ) p ( f | y , X ) Z , u p ( u ) = N ( 0 , K mm ) input space (X)

s e u l a v n X , y o i t c n u f ( x ) ∼ G P f p ( f ) = N ( 0 , K nn ) p ( f | y , X ) p ( u ) = N ( 0 , K mm ) � p ( u | y , X ) input space (X)

The alternative posterior Instead of doing p ( y | f ) p ( f | X ) � p ( f | y , X ) = p ( y | f ) p ( f | X )d f We’ll do p ( y | u ) p ( u | Z ) � p ( u | y , Z ) = p ( y | u ) p ( u | Z )d u

The alternative posterior Instead of doing p ( y | f ) p ( f | X ) � p ( f | y , X ) = p ( y | f ) p ( f | X )d f We’ll do p ( y | u ) p ( u | Z ) � p ( u | y , Z ) = p ( y | u ) p ( u | Z )d u but p ( y | u ) involves inverting K nn

Variational marginalisation of f � ln p ( y | u ) = ln p ( y | f ) p ( f | u , X )d f

An approximate likelihood � n � mm u , σ 2 � � � �� y i | k ⊤ mn K − 1 − 1 k nn − k ⊤ mn K − 1 � p ( y | u ) = N exp mm k mn 2 σ 2 i = 1 A straightforward likelihood approximation, and a penalty term

log p ( y | X ) ≥ � L 1 + log p ( u ) − log q ( u ) � q ( u ) � L 3 . (1) � � n � mm m , β − 1 � y i | k ⊤ mn K − 1 L 3 = log N i = 1 � − 1 k i , i − 1 2 β � 2tr ( S Λ i ) − KL � q ( u ) � p ( u ) � (2)

Optimisation The variational objective L 3 is a function of ◮ the parameters of the covariance function ◮ the parameters of q ( u ) ◮ the inducing inputs, Z Strategy: set Z . Take the data in small minibatches, take stochastic gradient steps in the covariance function parameters, stochastic natural gradient steps in the parameters of q ( u ).

UK apartment prices ◮ Monthly price paid data for February to October 2012 (England and Wales) ◮ from http://data.gov.uk/dataset/ land-registry-monthly-price-paid-data/ ◮ 75,000 entries ◮ Cross referenced against a postcode database to get lattitude and longitude ◮ Regressed the normalised logarithm of the apartment prices

Airline data ◮ Flight delays for every 0.9 0.8 commercial flight in the 0.7 USA from January to April Inverse lengthscale 0.6 0.5 2008. 0.4 ◮ Average delay was 30 0.3 0.2 minutes. 0.1 ◮ We randomly selected 0.0 Month DayOfMonth DayOfWeek DepTime ArrTime AirTime Distance PlaneAge 800,000 datapoints (we have limited memory!) ◮ 700,000 train, 100,000 test

GPs on subsets SVI GP 37 37 36 36 35 35 RMSE 34 34 33 33 32 32 N=800 N=1000 N=1200 0 200 400 600 800 1000 1200 iteration

Download the code! github.com/SheffieldML/GPy Cite our paper! Hensman, Fusi and Lawrence, Gaussian Processes for Big Data Proceedings of UAI 2013

Gaussian Processes for Big Data James Hensman joint work with - PowerPoint PPT Presentation

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Determining the PSF over the Full FoV of LSST using Anisotropic Gaussian Processes

Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 @GPSS2018 Zhenwen Dai (Amazon)

Scalable Gaussian Processes Zhenwen Dai Amazon 9 September 2019 @GPSS 2019 Zhenwen Dai (Amazon)

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Review of the literature on the statistical properties of linked datasets ANDREW CHESHER and LARS

real estate capital flows Miles Gibson Head of UK Research 12 End of cycle 10 fears are 8

Approximate Tree Matching with pq-Grams Nikolaus Augsten a , Michael B ohlen, Johann Gamper DIS

Vis/NIR Early Operations AIRS Science Team Meeting Solvang, California 2 May 2002 Mark

Fundamental Base of Topographic Data of Czech Land Survey Office as a Source for Database

Language, Models and Megamodels Tutorial on Megamodelling Anya Helene Bagge Bergen Language

A Common Machine Language for Communication-Exposed Architectures Bill Thies, Michal Karczmarek,

Welcome to CS61A! Programming and Computer Science This is a course about programming, which