50 Ways with GPs Richard Wilkinson School of Maths and Statistics - PowerPoint PPT Presentation

50 Ways with GPs Richard Wilkinson School of Maths and Statistics University of Sheffield Emulator workshop June 2017

Recap A Gaussian process is a random process indexed by some variable ( x ∈ X say), such that for every finite set of indices, x 1 , . . . , x n , then f = ( f ( x 1 ) , . . . , f ( x n )) has a multivariate Gaussian distribution.

Recap A Gaussian process is a random process indexed by some variable ( x ∈ X say), such that for every finite set of indices, x 1 , . . . , x n , then f = ( f ( x 1 ) , . . . , f ( x n )) has a multivariate Gaussian distribution. Why would we want to use this very restricted model?

Answer 1 Class of models is closed under various operations.

Answer 1 Class of models is closed under various operations. Closed under addition f 1 ( · ) , f 2 ( · ) ∼ GP then ( f 1 + f 2 )( · ) ∼ GP

Answer 1 Class of models is closed under various operations. Closed under addition f 1 ( · ) , f 2 ( · ) ∼ GP then ( f 1 + f 2 )( · ) ∼ GP Closed under Bayesian conditioning, i.e., if we observe D = ( f ( x 1 ) , . . . , f ( x n )) then f | D ∼ GP but with updated mean and covariance functions.

Answer 1 Class of models is closed under various operations. Closed under addition f 1 ( · ) , f 2 ( · ) ∼ GP then ( f 1 + f 2 )( · ) ∼ GP Closed under Bayesian conditioning, i.e., if we observe D = ( f ( x 1 ) , . . . , f ( x n )) then f | D ∼ GP but with updated mean and covariance functions. Closed under any linear operation. If L is a linear operator, then L ◦ f ∼ GP ( L ◦ m , L 2 ◦ k ) � e.g. df dx , f ( x ) dx , Af are all GPs

Answer 2: non-parametric/kernel regression k determines the space of functions that sample paths live in.

Answer 2: non-parametric/kernel regression k determines the space of functions that sample paths live in. Linear regression y = x ⊤ β + ǫ can be written solely in terms of inner products x ⊤ x . β = arg min || y − X β || 2 ˆ 2 + σ 2 || β || 2 2

Answer 2: non-parametric/kernel regression k determines the space of functions that sample paths live in. Linear regression y = x ⊤ β + ǫ can be written solely in terms of inner products x ⊤ x . β = arg min || y − X β || 2 ˆ 2 + σ 2 || β || 2 2 = ( X ⊤ X + σ 2 I ) X ⊤ y = X ⊤ ( XX ⊤ + σ 2 I ) − 1 y (the dual form)

Answer 2: non-parametric/kernel regression k determines the space of functions that sample paths live in. Linear regression y = x ⊤ β + ǫ can be written solely in terms of inner products x ⊤ x . β = arg min || y − X β || 2 ˆ 2 + σ 2 || β || 2 2 = ( X ⊤ X + σ 2 I ) X ⊤ y = X ⊤ ( XX ⊤ + σ 2 I ) − 1 y (the dual form) So the prediction at a new location x ′ is y ′ = x ′⊤ ˆ β = x ′⊤ X ⊤ ( XX ⊤ + σ 2 I ) − 1 y ˆ = k ( x ′ )( K + σ 2 I ) − 1 y where k ( x ′ ) := ( x ′⊤ x 1 , . . . , x ′⊤ x n ) and K ij := x ⊤ i x j

Answer 2: non-parametric/kernel regression k determines the space of functions that sample paths live in. Linear regression y = x ⊤ β + ǫ can be written solely in terms of inner products x ⊤ x . β = arg min || y − X β || 2 ˆ 2 + σ 2 || β || 2 2 = ( X ⊤ X + σ 2 I ) X ⊤ y = X ⊤ ( XX ⊤ + σ 2 I ) − 1 y (the dual form) So the prediction at a new location x ′ is y ′ = x ′⊤ ˆ β = x ′⊤ X ⊤ ( XX ⊤ + σ 2 I ) − 1 y ˆ = k ( x ′ )( K + σ 2 I ) − 1 y where k ( x ′ ) := ( x ′⊤ x 1 , . . . , x ′⊤ x n ) and K ij := x ⊤ i x j We know that we can replace x by a feature vector in linear regression, e.g., φ ( x ) = (1 x x 2 ) etc. Then K ij = φ ( x i ) ⊤ φ ( x j ) etc

For some sets of features, the inner product is equivalent to evaluating a kernel function φ ( x ) ⊤ φ ( x ′ ) ≡ k ( x , x ′ ) where k : X × X → R is a semi-positive definite function.

For some sets of features, the inner product is equivalent to evaluating a kernel function φ ( x ) ⊤ φ ( x ′ ) ≡ k ( x , x ′ ) where k : X × X → R is a semi-positive definite function. We can use an infinite dimensional feature vector φ ( x ), and because linear regression can be done solely in terms of inner-products (inverting a n × n matrix in the dual form) we never need evaluate the feature vector, only the kernel. Kernel trick: lift x into feature space by replacing inner products x ⊤ x ′ by k ( x , x ′ )

For some sets of features, the inner product is equivalent to evaluating a kernel function φ ( x ) ⊤ φ ( x ′ ) ≡ k ( x , x ′ ) where k : X × X → R is a semi-positive definite function. We can use an infinite dimensional feature vector φ ( x ), and because linear regression can be done solely in terms of inner-products (inverting a n × n matrix in the dual form) we never need evaluate the feature vector, only the kernel. Kernel trick: lift x into feature space by replacing inner products x ⊤ x ′ by k ( x , x ′ ) Kernel regression/non-parametric regression/GP regression all closely related: n � y ′ = m ( x ′ ) = ˆ α i k ( x , x i ) i =1

Generally, we don’t think about these features, we just choose a kernel. But any kernel is implicitly choosing a set of features, and our model only includes functions that are linear combinations of this set of features (this space is called the Reproducing Kernel Hilbert Space (RKHS) of k ).

Generally, we don’t think about these features, we just choose a kernel. But any kernel is implicitly choosing a set of features, and our model only includes functions that are linear combinations of this set of features (this space is called the Reproducing Kernel Hilbert Space (RKHS) of k ). Example: If (modulo some detail) , . . . , e − ( x − cN )2 φ ( x ) = ( e − ( x − c 1)2 ) 2 λ 2 2 λ 2 then as N → ∞ then � � − ( x − x ′ ) 2 φ ( x ) ⊤ φ ( x ) = exp 2 λ 2

Generally, we don’t think about these features, we just choose a kernel. But any kernel is implicitly choosing a set of features, and our model only includes functions that are linear combinations of this set of features (this space is called the Reproducing Kernel Hilbert Space (RKHS) of k ). Example: If (modulo some detail) , . . . , e − ( x − cN )2 φ ( x ) = ( e − ( x − c 1)2 ) 2 λ 2 2 λ 2 then as N → ∞ then � � − ( x − x ′ ) 2 φ ( x ) ⊤ φ ( x ) = exp 2 λ 2 Although our simulator may not lie in the RKHS defined by k , this space is much richer than any parametric regression model (and can be dense in some sets of continuous bounded functions), and is thus more likely to contain an element close to the simulator than any class of models that contains only a finite number of features. This is the motivation for non-parametric methods.

Answer 3: Naturalness of GP framework Why use Gaussian processes as non-parametric models? 1

Answer 3: Naturalness of GP framework Why use Gaussian processes as non-parametric models? One answer might come from Bayes linear methods 1 . If we only knew the expectation and variance of some random variables, X and Y , then how should we best do statistics? 1 Some crazy cats think we should do statistics without probability

Answer 3: Naturalness of GP framework Why use Gaussian processes as non-parametric models? One answer might come from Bayes linear methods 1 . If we only knew the expectation and variance of some random variables, X and Y , then how should we best do statistics? It has been shown, using coherency arguments, or geometric arguments, or..., that the best second-order inference we can do to update our beliefs about X given Y is E ( X | Y ) = E ( X ) + C ov( X , Y ) V ar( Y ) − 1 ( Y − E ( Y )) i.e., exactly the Gaussian process update for the posterior mean. So GPs are in some sense second-order optimal. 1 Some crazy cats think we should do statistics without probability

Answer 4: Uncertainty estimates from emulators We often think of our prediction as consisting of two parts point estimate uncertainty in that estimate That GPs come equipped with the uncertainty in their prediction is seen as one of their main advantages.

Answer 4: Uncertainty estimates from emulators We often think of our prediction as consisting of two parts point estimate uncertainty in that estimate That GPs come equipped with the uncertainty in their prediction is seen as one of their main advantages. It is important to check both aspects (see Lindsay’s talk)

Answer 4: Uncertainty estimates from emulators We often think of our prediction as consisting of two parts point estimate uncertainty in that estimate That GPs come equipped with the uncertainty in their prediction is seen as one of their main advantages. It is important to check both aspects (see Lindsay’s talk) Warning: the uncertainty estimates from a GP can be flawed. Note that given data D = X , y V ar( f ( x ) | X , y ) = k ( x , x ) − k ( x , X ) k ( X , X ) − 1 k ( X , x ) so that the posterior variance of f ( x ) does not depend upon y ! The variance estimates are particularly sensitive to the hyper-parameter estimates.

Example 1: Easier regression PLASIM-ENTS: Holden, Edwards, Garthwaite, W 2015 Emulate spatially resolved precipitation as a function of astronomical parameters: eccentricity, precession, obliquity.

50 Ways with GPs Richard Wilkinson School of Maths and Statistics - PowerPoint PPT Presentation

50 Ways with GPs Richard Wilkinson School of Maths and Statistics University of Sheffield Emulator workshop June 2017 Recap A Gaussian process is a random process indexed by some variable ( x X say), such that for every finite set of

Localization with GPS Localization with GPS From GPS Theory and Practice Fifth Edition

Faster GPS via the Sparse Fourier Transform Haitham Hassanieh Fadel Adib Dina Katabi Piotr Indyk

Targeted GPS spoofing Bart Hermans & Luc Gommans University of Amsterdam - RP2 How does GPS

6. Kinematic GPS and Applications Tectonic Geodesy GEOS 655 Kinematic GPS Development of

Die Mathematik des GPS Drei Segmente des GPS Koordinatensysteme Notwendige Lineare Algebra

Imp mprovin ing S Sensitivity o on K Kea CubeS eSat GPS GPS Rec ecei eiver ers Eamonn

Applications of GPS Provided Time and Frequency and Future Edward Powers United States Naval

Capturing data and use of GPS Capturing data GIS GPS Paper maps Coordinates Satellite images

GPS as a dark matter detector Andrei Derevianko University of Nevada, Reno, USA GPS.DM (?)

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 1 GPs and Computational

Measuring plate motion with GPS: 4 Introducing GPS to study tectonic plates as they move, twist,

GPS Producer Panel GPS Dairy Forum November 14-15, 2012 LaCrosse, WI Crosswind Jerseys Stefan

GPS: GPS: Working Principle and Working Principle and Interfacing Interfacing Team D1 Team

GPS Jamming and its impact on maritime navigation Dr Alan Grant Research and Development -

Pitching the GPS at the right level The GPS Main lever to ensure that the land transport

Performance Analysis of PPP Technique Using GPS-only and GPS+GLONASS in Urban Environment Reha

Lecture 14: Local linear regression non-parametric estimation, perceptron and update algo, etc

Introduction to Machine Learning Non-linear prediction with kernels Prof. Andreas Krause

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

Regression: Simple and Linear Introduction to Machine Learning Regression Principle REGRESSION

CPSC 340: Machine Learning and Data Mining Non-Parametric Models Summer 2020 Course Map

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model Aria Haghighi and Dan Klein

Handling parametric and non-parametric additive faults in LTV Systems Qinghua Zhang &

https://www.vhl.org/wp- content/uploads/2019/11/Active-Surveillance- Guidelines.pdf Guidelines

50 Ways with GPs Richard Wilkinson School of Maths and Statistics - PowerPoint PPT Presentation

50 Ways with GPs Richard Wilkinson School of Maths and Statistics University of Sheffield Emulator workshop June 2017 Recap A Gaussian process is a random process indexed by some variable ( x X say), such that for every finite set of

Localization with GPS Localization with GPS From GPS Theory and Practice Fifth Edition

Faster GPS via the Sparse Fourier Transform Haitham Hassanieh Fadel Adib Dina Katabi Piotr Indyk

Targeted GPS spoofing Bart Hermans &amp; Luc Gommans University of Amsterdam - RP2 How does GPS

6. Kinematic GPS and Applications Tectonic Geodesy GEOS 655 Kinematic GPS Development of

Die Mathematik des GPS Drei Segmente des GPS Koordinatensysteme Notwendige Lineare Algebra

Imp mprovin ing S Sensitivity o on K Kea CubeS eSat GPS GPS Rec ecei eiver ers Eamonn

Applications of GPS Provided Time and Frequency and Future Edward Powers United States Naval

Capturing data and use of GPS Capturing data GIS GPS Paper maps Coordinates Satellite images

GPS as a dark matter detector Andrei Derevianko University of Nevada, Reno, USA GPS.DM (?)

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 1 GPs and Computational

Measuring plate motion with GPS: 4 Introducing GPS to study tectonic plates as they move, twist,

GPS Producer Panel GPS Dairy Forum November 14-15, 2012 LaCrosse, WI Crosswind Jerseys Stefan

GPS: GPS: Working Principle and Working Principle and Interfacing Interfacing Team D1 Team

GPS Jamming and its impact on maritime navigation Dr Alan Grant Research and Development -

Pitching the GPS at the right level The GPS Main lever to ensure that the land transport

Performance Analysis of PPP Technique Using GPS-only and GPS+GLONASS in Urban Environment Reha

Lecture 14: Local linear regression non-parametric estimation, perceptron and update algo, etc

Introduction to Machine Learning Non-linear prediction with kernels Prof. Andreas Krause

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

Regression: Simple and Linear Introduction to Machine Learning Regression Principle REGRESSION

CPSC 340: Machine Learning and Data Mining Non-Parametric Models Summer 2020 Course Map

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model Aria Haghighi and Dan Klein

Handling parametric and non-parametric additive faults in LTV Systems Qinghua Zhang &amp;

https://www.vhl.org/wp- content/uploads/2019/11/Active-Surveillance- Guidelines.pdf Guidelines

Targeted GPS spoofing Bart Hermans & Luc Gommans University of Amsterdam - RP2 How does GPS

Handling parametric and non-parametric additive faults in LTV Systems Qinghua Zhang &