Gaussian Process Regression with Noisy Inputs Dan Cervone Harvard - PowerPoint PPT Presentation

Gaussian Process Regression with Noisy Inputs Dan Cervone Harvard Statistics Department March 3, 2015 Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Gaussian process regression Introduction A smooth response x over a surface S ⊂ R p . For s 1 , . . . , s n ∈ S ,   x ( s 1 ) . .  ∼ N ( 0 , C ( s n , s n ))   .  x ( s n ) [ C ( s n , s n )] ij = c ( s i , s j ), where c is the covariance function . Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Gaussian process regression Introduction A smooth response x over a surface S ⊂ R p . For s 1 , . . . , s n ∈ S ,   x ( s 1 ) . .  ∼ N ( 0 , C ( s n , s n ))   .  x ( s n ) [ C ( s n , s n )] ij = c ( s i , s j ), where c is the covariance function . Interpolation/prediction at unobserved locations in input space Observe x n = ( x ( s 1 ) . . . x ( s n )) ′ . Predict x ∗ k = ( x ( s ∗ 1 ) . . . x ( s ∗ k )) ′ . � k , s n ) C ( s n , s n ) − 1 x n , x ∗ k | x n ∼ N C ( s ∗ k , s n ) C ( s n , s n ) − 1 C ( s n , s ∗ C ( s ∗ k , s ∗ k ) − C ( s ∗ � k ) Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Gaussian process regression Example 2 1 value (x(s)) 0 −1 −2 0 2 4 6 8 10 location (s) Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

GPs with noisy inputs Scientific examples Location error model Instead of observing x , we observe the process y ( s ) = x ( s + u ), where u ∼ g ( u ) are errors in the input space S . Note: We observe s n , y n , but wish to predict x ( s ∗ ). Note: y is never a GP. Location errors (e.g. geocoding error, map positional error) is a problem in many scientific domains. Epidemiology [3, 10, 2]. Environmental sciences [1, 16]. Object tracking/computer vision [9, 15]. Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Measurement error GP location errors vs errors-in-variables GP input/location errors: y ( s ) = x ( s + u ) + ǫ. Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Measurement error GP location errors vs errors-in-variables GP input/location errors: y ( s ) = x ( s + u ) + ǫ. Traditional errors-in-variables model [5]: x ∗ = f θ ( x ) + ǫ. x ( s ∗ ) = f θ, s n ( x n ) + ǫ (GP regression). Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Measurement error GP location errors vs errors-in-variables GP input/location errors: y ( s ) = x ( s + u ) + ǫ. Traditional errors-in-variables model [5]: x ∗ = f θ ( x ) + ǫ. x ( s ∗ ) = f θ, s n ( x n ) + ǫ (GP regression). Observe y = x + η , ie y n = x n + η n . Common to assume η ⊥ x (classical) or η ⊥ y (Berkson). Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Measurement error GP location errors vs errors-in-variables GP input/location errors: y ( s ) = x ( s + u ) + ǫ. Traditional errors-in-variables model [5]: x ∗ = f θ ( x ) + ǫ. x ( s ∗ ) = f θ, s n ( x n ) + ǫ (GP regression). Observe y = x + η , ie y n = x n + η n . Common to assume η ⊥ x (classical) or η ⊥ y (Berkson). GP input errors does not yield a traditional errors-in-variables regression problem: Errors y ( s ) − x ( s ) depend on x ( s ). True regression function is unknown: x ( s ∗ ) = f θ, s n + u n ( y n ) + ǫ . Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Measurement error Methodology Methods to properly accounting for noisy inputs are essential for reliable inference in this regime. We seek: Optimal (MSE) point prediction, and interval predictions with correct coverage. Consistent/efficient parameter estimation. The location-error regime can actually deliver more precise predictions than the error-free regime. Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Measurement error Methodology Methods to properly accounting for noisy inputs are essential for reliable inference in this regime. We seek: Optimal (MSE) point prediction, and interval predictions with correct coverage. Consistent/efficient parameter estimation. The location-error regime can actually deliver more precise predictions than the error-free regime. We discuss three methods: Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Measurement error Methodology Methods to properly accounting for noisy inputs are essential for reliable inference in this regime. We seek: Optimal (MSE) point prediction, and interval predictions with correct coverage. Consistent/efficient parameter estimation. The location-error regime can actually deliver more precise predictions than the error-free regime. We discuss three methods: Ignoring location errors. Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Measurement error Methodology Methods to properly accounting for noisy inputs are essential for reliable inference in this regime. We seek: Optimal (MSE) point prediction, and interval predictions with correct coverage. Consistent/efficient parameter estimation. The location-error regime can actually deliver more precise predictions than the error-free regime. We discuss three methods: Ignoring location errors. Kriging (BLUP), using moment properties of error-induced process y . Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Measurement error Methodology Methods to properly accounting for noisy inputs are essential for reliable inference in this regime. We seek: Optimal (MSE) point prediction, and interval predictions with correct coverage. Consistent/efficient parameter estimation. The location-error regime can actually deliver more precise predictions than the error-free regime. We discuss three methods: Ignoring location errors. Kriging (BLUP), using moment properties of error-induced process y . MCMC on the space ( x ∗ k , u n ). Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Ignoring location errors Sometimes, you can get lucky Analyst just assumes y n = x n : “Kriging Ignoring Location Errors” (KILE) [6]: x KILE ( s ∗ ) = C ( s ∗ , s n ) C ( s n , s n ) − 1 y n . ˆ Parameter inference based on assuming y n = x n ∼ N ( 0 , C ( s n , s n )). Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Ignoring location errors Sometimes, you can get lucky Analyst just assumes y n = x n : “Kriging Ignoring Location Errors” (KILE) [6]: x KILE ( s ∗ ) = C ( s ∗ , s n ) C ( s n , s n ) − 1 y n . ˆ Parameter inference based on assuming y n = x n ∼ N ( 0 , C ( s n , s n )). Example: c ( s 1 , s 2 ) = exp( − ( s 1 − s 2 ) 2 ), and u ∼ N (0 , σ 2 u ). Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Ignoring location errors Sometimes, you can get lucky Analyst just assumes y n = x n : “Kriging Ignoring Location Errors” (KILE) [6]: x KILE ( s ∗ ) = C ( s ∗ , s n ) C ( s n , s n ) − 1 y n . ˆ Parameter inference based on assuming y n = x n ∼ N ( 0 , C ( s n , s n )). Example: c ( s 1 , s 2 ) = exp( − ( s 1 − s 2 ) 2 ), and u ∼ N (0 , σ 2 u ). 1 0 x(s) −1 observed location predictive location −2 sample GP paths 0 2 4 6 8 10 location (s) Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Ignoring location errors Sometimes, you can get lucky Analyst just assumes y n = x n : “Kriging Ignoring Location Errors” (KILE) [6]: x KILE ( s ∗ ) = C ( s ∗ , s n ) C ( s n , s n ) − 1 y n . ˆ Parameter inference based on assuming y n = x n ∼ N ( 0 , C ( s n , s n )). Example: c ( s 1 , s 2 ) = exp( − ( s 1 − s 2 ) 2 ), and u ∼ N (0 , σ 2 u ). 1.40 1.05 0.70 MSE 0.35 0.00 −4 −2 0 2 4 log ( σ u 2 ) Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Ignoring location errors Sometimes, disaster strikes Assuming known covariance funciton, KILE is not a self-efficient procedure. Self-efficiency [12]: estimator cannot be improved by removing/subsampling data. Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Ignoring location errors Sometimes, disaster strikes Assuming known covariance funciton, KILE is not a self-efficient procedure. Self-efficiency [12]: estimator cannot be improved by removing/subsampling data. Theorem Assume covariance function c and error model u ∼ g ( u ) satisfy regularity x n conditions. Let ˆ KILE ( s ∗ ) be the KILE estimator for x ( s ∗ ) given x n . Then for any s n and s ∗ , there exists s n +1 such that x n +1 KILE ( s ∗ )) 2 ] ≥ E [( x ( s ∗ ) − ˆ x n KILE ( s ∗ )) 2 ] . E [( x ( s ∗ ) − ˆ Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

Gaussian Process Regression with Noisy Inputs Dan Cervone Harvard - PowerPoint PPT Presentation

Gaussian Process Regression with Noisy Inputs Dan Cervone Harvard Statistics Department March 3, 2015 Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015 Gaussian process regression Introduction A smooth

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

Inputs and Outputs Objects with Analog Inputs A io.sch analog.sch Objects with Digital Inputs

Performance Evaluation for Text Processing of Noisy Inputs Daniel Lopresti Computer Science

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Machine Learning Regression Where we are Inputs Prob- Density ability Estimator Inputs

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Introduction to Nonparametric Bayesian Modeling and Gaussian Process Regression Piyush Rai Dept.

Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of

Time Series Regression A regression model relates a response x t to inputs z t, 1 , z t, 2 , .

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

A Quantum Interior Point Method for LPs and SDPs Iordanis Kerenidis 1 Anupam Prakash 1 1 CNRS,

Highly Scalable Parallel Sorting Edgar Solomonik and Laxmikant Kale University of Illinois at

Waiting for 6+ years Pete Beckman Argonne National Laboratory 2 Data from Peter

The Sample-Computational Tradeoff Shai Shalev-Shwartz School of Computer Science and Engineering

A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

Michael Spillane President, Product & Categories Good morning, and thank you for joining

Scalable performance analysis with Projections Sanjay Kale, http://charm.cs.illinois.edu Based

Correlations between Parallel Patterns and Multi-core Benchmarks Vivek Kale IWMSE workshop May

Gaussian Process Regression with Noisy Inputs Dan Cervone Harvard - PowerPoint PPT Presentation

Gaussian Process Regression with Noisy Inputs Dan Cervone Harvard Statistics Department March 3, 2015 Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015 Gaussian process regression Introduction A smooth

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

Inputs and Outputs Objects with Analog Inputs A io.sch analog.sch Objects with Digital Inputs

Performance Evaluation for Text Processing of Noisy Inputs Daniel Lopresti Computer Science

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Machine Learning Regression Where we are Inputs Prob- Density ability Estimator Inputs

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Introduction to Nonparametric Bayesian Modeling and Gaussian Process Regression Piyush Rai Dept.

Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of

Time Series Regression A regression model relates a response x t to inputs z t, 1 , z t, 2 , .

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

A Quantum Interior Point Method for LPs and SDPs Iordanis Kerenidis 1 Anupam Prakash 1 1 CNRS,

Highly Scalable Parallel Sorting Edgar Solomonik and Laxmikant Kale University of Illinois at

Waiting for 6+ years Pete Beckman Argonne National Laboratory 2 Data from Peter

The Sample-Computational Tradeoff Shai Shalev-Shwartz School of Computer Science and Engineering

A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

Michael Spillane President, Product &amp; Categories Good morning, and thank you for joining

Scalable performance analysis with Projections Sanjay Kale, http://charm.cs.illinois.edu Based

Correlations between Parallel Patterns and Multi-core Benchmarks Vivek Kale IWMSE workshop May

Michael Spillane President, Product & Categories Good morning, and thank you for joining