Gaussian Process Regression with Noisy Inputs Dan Cervone Harvard - - PowerPoint PPT Presentation

gaussian process regression with noisy inputs
SMART_READER_LITE
LIVE PREVIEW

Gaussian Process Regression with Noisy Inputs Dan Cervone Harvard - - PowerPoint PPT Presentation

Gaussian Process Regression with Noisy Inputs Dan Cervone Harvard Statistics Department March 3, 2015 Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015 Gaussian process regression Introduction A smooth


slide-1
SLIDE 1

Gaussian Process Regression with Noisy Inputs

Dan Cervone

Harvard Statistics Department

March 3, 2015

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-2
SLIDE 2

Gaussian process regression

Introduction

A smooth response x over a surface S ⊂ Rp. For s1, . . . , sn ∈ S,    x(s1) . . . x(sn)    ∼ N(0, C(sn, sn)) [C(sn, sn)]ij = c(si, sj), where c is the covariance function.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-3
SLIDE 3

Gaussian process regression

Introduction

A smooth response x over a surface S ⊂ Rp. For s1, . . . , sn ∈ S,    x(s1) . . . x(sn)    ∼ N(0, C(sn, sn)) [C(sn, sn)]ij = c(si, sj), where c is the covariance function.

Interpolation/prediction at unobserved locations in input space

Observe xn = (x(s1) . . . x(sn))′. Predict x∗

k = (x(s∗ 1 ) . . . x(s∗ k ))′.

x∗

k|xn ∼ N

  • C(s∗

k, sn)C(sn, sn)−1xn,

C(s∗

k, s∗ k) − C(s∗ k, sn)C(sn, sn)−1C(sn, s∗ k)

  • Dan Cervone (Harvard Statistics Department)

GP Regression with Noisy Inputs March 3, 2015

slide-4
SLIDE 4

Gaussian process regression

Example

2 4 6 8 10 −2 −1 1 2 location (s) value (x(s))

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-5
SLIDE 5

Gaussian process regression

Example

2 4 6 8 10 −2 −1 1 2 location (s) value (x(s))

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-6
SLIDE 6

Gaussian process regression

Example

2 4 6 8 10 −2 −1 1 2 location (s) value (x(s))

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-7
SLIDE 7

GPs with noisy inputs

Scientific examples

Location error model

Instead of observing x, we observe the process y(s) = x(s + u), where u ∼ g(u) are errors in the input space S. Note: We observe sn, yn, but wish to predict x(s∗). Note: y is never a GP. Location errors (e.g. geocoding error, map positional error) is a problem in many scientific domains. Epidemiology [3, 10, 2]. Environmental sciences [1, 16]. Object tracking/computer vision [9, 15].

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-8
SLIDE 8

Measurement error

GP location errors vs errors-in-variables

GP input/location errors: y(s) = x(s + u) + ǫ.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-9
SLIDE 9

Measurement error

GP location errors vs errors-in-variables

GP input/location errors: y(s) = x(s + u) + ǫ. Traditional errors-in-variables model [5]: x∗ = fθ(x) + ǫ. x(s∗) = fθ,sn(xn) + ǫ (GP regression).

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-10
SLIDE 10

Measurement error

GP location errors vs errors-in-variables

GP input/location errors: y(s) = x(s + u) + ǫ. Traditional errors-in-variables model [5]: x∗ = fθ(x) + ǫ. x(s∗) = fθ,sn(xn) + ǫ (GP regression). Observe y = x + η, ie yn = xn + ηn. Common to assume η ⊥ x (classical) or η ⊥ y (Berkson).

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-11
SLIDE 11

Measurement error

GP location errors vs errors-in-variables

GP input/location errors: y(s) = x(s + u) + ǫ. Traditional errors-in-variables model [5]: x∗ = fθ(x) + ǫ. x(s∗) = fθ,sn(xn) + ǫ (GP regression). Observe y = x + η, ie yn = xn + ηn. Common to assume η ⊥ x (classical) or η ⊥ y (Berkson). GP input errors does not yield a traditional errors-in-variables regression problem: Errors y(s) − x(s) depend on x(s). True regression function is unknown: x(s∗) = fθ,sn+un(yn) + ǫ.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-12
SLIDE 12

Measurement error

Methodology

Methods to properly accounting for noisy inputs are essential for reliable inference in this regime. We seek: Optimal (MSE) point prediction, and interval predictions with correct coverage. Consistent/efficient parameter estimation. The location-error regime can actually deliver more precise predictions than the error-free regime.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-13
SLIDE 13

Measurement error

Methodology

Methods to properly accounting for noisy inputs are essential for reliable inference in this regime. We seek: Optimal (MSE) point prediction, and interval predictions with correct coverage. Consistent/efficient parameter estimation. The location-error regime can actually deliver more precise predictions than the error-free regime. We discuss three methods:

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-14
SLIDE 14

Measurement error

Methodology

Methods to properly accounting for noisy inputs are essential for reliable inference in this regime. We seek: Optimal (MSE) point prediction, and interval predictions with correct coverage. Consistent/efficient parameter estimation. The location-error regime can actually deliver more precise predictions than the error-free regime. We discuss three methods: Ignoring location errors.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-15
SLIDE 15

Measurement error

Methodology

Methods to properly accounting for noisy inputs are essential for reliable inference in this regime. We seek: Optimal (MSE) point prediction, and interval predictions with correct coverage. Consistent/efficient parameter estimation. The location-error regime can actually deliver more precise predictions than the error-free regime. We discuss three methods: Ignoring location errors. Kriging (BLUP), using moment properties of error-induced process y.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-16
SLIDE 16

Measurement error

Methodology

Methods to properly accounting for noisy inputs are essential for reliable inference in this regime. We seek: Optimal (MSE) point prediction, and interval predictions with correct coverage. Consistent/efficient parameter estimation. The location-error regime can actually deliver more precise predictions than the error-free regime. We discuss three methods: Ignoring location errors. Kriging (BLUP), using moment properties of error-induced process y. MCMC on the space (x∗

k, un).

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-17
SLIDE 17

Ignoring location errors

Sometimes, you can get lucky

Analyst just assumes yn = xn: “Kriging Ignoring Location Errors” (KILE) [6]: ˆ xKILE(s∗) = C(s∗, sn)C(sn, sn)−1yn. Parameter inference based on assuming yn = xn ∼ N(0, C(sn, sn)).

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-18
SLIDE 18

Ignoring location errors

Sometimes, you can get lucky

Analyst just assumes yn = xn: “Kriging Ignoring Location Errors” (KILE) [6]: ˆ xKILE(s∗) = C(s∗, sn)C(sn, sn)−1yn. Parameter inference based on assuming yn = xn ∼ N(0, C(sn, sn)). Example: c(s1, s2) = exp(−(s1 − s2)2), and u ∼ N(0, σ2

u).

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-19
SLIDE 19

Ignoring location errors

Sometimes, you can get lucky

Analyst just assumes yn = xn: “Kriging Ignoring Location Errors” (KILE) [6]: ˆ xKILE(s∗) = C(s∗, sn)C(sn, sn)−1yn. Parameter inference based on assuming yn = xn ∼ N(0, C(sn, sn)). Example: c(s1, s2) = exp(−(s1 − s2)2), and u ∼ N(0, σ2

u).

x(s) −2 −1 1 location (s) 2 4 6 8 10

  • bserved location

predictive location sample GP paths

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-20
SLIDE 20

Ignoring location errors

Sometimes, you can get lucky

Analyst just assumes yn = xn: “Kriging Ignoring Location Errors” (KILE) [6]: ˆ xKILE(s∗) = C(s∗, sn)C(sn, sn)−1yn. Parameter inference based on assuming yn = xn ∼ N(0, C(sn, sn)). Example: c(s1, s2) = exp(−(s1 − s2)2), and u ∼ N(0, σ2

u).

MSE 0.00 0.35 0.70 1.05 1.40 log(σu

2)

−4 −2 2 4

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-21
SLIDE 21

Ignoring location errors

Sometimes, disaster strikes

Assuming known covariance funciton, KILE is not a self-efficient procedure. Self-efficiency [12]: estimator cannot be improved by removing/subsampling data.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-22
SLIDE 22

Ignoring location errors

Sometimes, disaster strikes

Assuming known covariance funciton, KILE is not a self-efficient procedure. Self-efficiency [12]: estimator cannot be improved by removing/subsampling data.

Theorem

Assume covariance function c and error model u ∼ g(u) satisfy regularity

  • conditions. Let ˆ

xn

KILE(s∗) be the KILE estimator for x(s∗) given xn. Then for any

sn and s∗, there exists sn+1 such that E[(x(s∗) − ˆ xn+1

KILE (s∗))2] ≥ E[(x(s∗) − ˆ

xn

KILE(s∗))2]. Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-23
SLIDE 23

Ignoring location errors

Sometimes, disaster strikes

Assuming known covariance funciton, KILE is not a self-efficient procedure. Self-efficiency [12]: estimator cannot be improved by removing/subsampling data.

Theorem

Assume covariance function c and error model u ∼ g(u) satisfy regularity

  • conditions. Let ˆ

xn

KILE(s∗) be the KILE estimator for x(s∗) given xn. Then for any

sn and s∗, there exists sn+1 such that E[(x(s∗) − ˆ xn+1

KILE (s∗))2] ≥ E[(x(s∗) − ˆ

xn

KILE(s∗))2].

Regularity conditions: c twice differentiable everywhere. k(s1, s2) = E[c(s1 + u1, s2 + u2)] twice differentiable everywhere except s1 = s2.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-24
SLIDE 24

Ignoring location errors

Sometimes, disaster strikes

Example: c(s1, s2) = exp(−(s1 − s2)2), and u ∼ N(0, 0.04).

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-25
SLIDE 25

Ignoring location errors

Sometimes, disaster strikes

Example: c(s1, s2) = exp(−(s1 − s2)2), and u ∼ N(0, 0.04).

x(s) −2 −1 1 location (s) 2 4 6 8 10

  • bserved location

predictive location sample GP paths

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-26
SLIDE 26

Ignoring location errors

Sometimes, disaster strikes

Example: c(s1, s2) = exp(−(s1 − s2)2), and u ∼ N(0, 0.04).

log MSE −2 2 4 6 8 new location s11 2 4 6 8 10

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-27
SLIDE 27

Kriging (Best Linear Unbiased Prediction)

KALE [6]

Second moment properties of y(s) and (x(s∗), y(s)): k(s1, s2) = C[y(s1), y(s2)] = E[c(s1 + u1, s2 + u2)] for s1 = s2 k(s, s) = C[y(s), y(s)] = E[c(s + u, s + u)] k∗(s, s∗) = C[y(s), x(s∗)] = E[c(s + u, s∗)].

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-28
SLIDE 28

Kriging (Best Linear Unbiased Prediction)

KALE [6]

Second moment properties of y(s) and (x(s∗), y(s)): k(s1, s2) = C[y(s1), y(s2)] = E[c(s1 + u1, s2 + u2)] for s1 = s2 k(s, s) = C[y(s), y(s)] = E[c(s + u, s + u)] k∗(s, s∗) = C[y(s), x(s∗)] = E[c(s + u, s∗)]. k is the covariance function for y, and we can use it for Kriging adjusting for location error (KALE) [6]: ˆ xKALE(s∗) = K∗(s∗, sn)K(sn, sn)−1yn.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-29
SLIDE 29

Kriging (Best Linear Unbiased Prediction)

KALE [6]

Second moment properties of y(s) and (x(s∗), y(s)): k(s1, s2) = C[y(s1), y(s2)] = E[c(s1 + u1, s2 + u2)] for s1 = s2 k(s, s) = C[y(s), y(s)] = E[c(s + u, s + u)] k∗(s, s∗) = C[y(s), x(s∗)] = E[c(s + u, s∗)]. k is the covariance function for y, and we can use it for Kriging adjusting for location error (KALE) [6]: ˆ xKALE(s∗) = K∗(s∗, sn)K(sn, sn)−1yn. For any error structure u, k is a valid covariance function if and only if c is. If c is known, then KALE dominates KILE in MSE.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-30
SLIDE 30

Kriging (Best Linear Unbiased Prediction)

Covariance function k

Sometimes, k is available in closed form: Example: for c(s1, s2) = τ 2 exp(−β||s1 − s2||2) and ui ∼ N(0, σ2

uIp),

k(s1, s2) = τ 2 (1 + 4βσ2

u)p/2 exp

β 1 + 4βσ2

u

||s1 − s2||2

  • .

Not generally true that c and k have same functional form.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-31
SLIDE 31

Kriging (Best Linear Unbiased Prediction)

Covariance function k

Sometimes, k is available in closed form: Example: for c(s1, s2) = τ 2 exp(−β||s1 − s2||2) and ui ∼ N(0, σ2

uIp),

k(s1, s2) = τ 2 (1 + 4βσ2

u)p/2 exp

β 1 + 4βσ2

u

||s1 − s2||2

  • .

Not generally true that c and k have same functional form. Most commonly, k is computed by Monte Carlo. k(s1, s2) ≈ 1 M

M

  • i=1

c(s1 + u1i, s2 + u2i) uji

iid

∼ g(uj) for i = 1, . . . , M.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-32
SLIDE 32

Kriging (Best Linear Unbiased Prediction)

Interval estimation

We get interval estimates for KALE by deriving the distribution function of prediction errors:

Proposition

Let W (un) = V[x(s∗)] + γ′C(sn + un, sn + un)γ − 2γ′C(sn + un, s∗) where γ = K(sn, sn)−1K∗(sn, s∗). Then P(x(s∗) − ˆ xKALE(s∗) < z) = E

  • Φ
  • z
  • W (un)
  • ,

where Φ is the standard normal distribution function.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-33
SLIDE 33

Kriging (Best Linear Unbiased Prediction)

Interval estimation

We get interval estimates for KALE by deriving the distribution function of prediction errors:

Proposition

Let W (un) = V[x(s∗)] + γ′C(sn + un, sn + un)γ − 2γ′C(sn + un, s∗) where γ = K(sn, sn)−1K∗(sn, s∗). Then P(x(s∗) − ˆ xKALE(s∗) < z) = E

  • Φ
  • z
  • W (un)
  • ,

where Φ is the standard normal distribution function. These yield confidence intervals, not conditional probability intervals.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-34
SLIDE 34

Kriging (Best Linear Unbiased Prediction)

Parameter estimation

Inferring parameters of covariance function: Likelihood: L(θ; yn) ∝

  • |Cθ(sn+un, sn+un)| exp
  • −1

2y′

nCθ(sn + un, sn + un)−1yn

  • dun.

Stochastic EM.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-35
SLIDE 35

Kriging (Best Linear Unbiased Prediction)

Parameter estimation

Inferring parameters of covariance function: Likelihood: L(θ; yn) ∝

  • |Cθ(sn+un, sn+un)| exp
  • −1

2y′

nCθ(sn + un, sn + un)−1yn

  • dun.

Stochastic EM.

Pseudo-likelihood, based on Gaussian approximation to first two moments [6, 5]: ˜ L(θ; yn) ∝ |Kθ(sn, sn)|−1/2 exp

  • −1

2y′

nKθ(sn, sn)−1yn

  • .

Pseudo-score is an unbiased estimating equation. Maximum pseudo-likelihood estimator is asymptotically normal under proper domain conditions.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-36
SLIDE 36

MCMC

yn contains information about location errors un: ˆ x(s∗) = E[x(s∗)|yn] = C(s∗, sn + un)[C(sn + un, sn + un)]−1yn

  • π(un|yn)dun.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-37
SLIDE 37

MCMC

yn contains information about location errors un: ˆ x(s∗) = E[x(s∗)|yn] = C(s∗, sn + un)[C(sn + un, sn + un)]−1yn

  • π(un|yn)dun.

Dominates KALE in MSE. x(s∗)|yn yields conditional probability intervals. Naturally incorporates parameter estimation/uncertainty.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-38
SLIDE 38

MCMC

Distributional assumptions

KALE/KILE does not require Gaussian assumption for x to obtain point predictions, but MCMC procedure does.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-39
SLIDE 39

MCMC

Distributional assumptions

KALE/KILE does not require Gaussian assumption for x to obtain point predictions, but MCMC procedure does. Similar to arguments in favor of conjugate priors when using squared error loss [14], it makes sense to assume normality in x when Kriging.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-40
SLIDE 40

MCMC

Distributional assumptions

KALE/KILE does not require Gaussian assumption for x to obtain point predictions, but MCMC procedure does. Similar to arguments in favor of conjugate priors when using squared error loss [14], it makes sense to assume normality in x when Kriging. Let Π0,C be the family of joint distributions for xn with first two moments 0, C. For π1, π2 ∈ Π0,C, let Rπ1(π2) = Eπ1[(Eπ2[x(s∗)|xn] − x(s∗))2].

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-41
SLIDE 41

MCMC

Distributional assumptions

KALE/KILE does not require Gaussian assumption for x to obtain point predictions, but MCMC procedure does. Similar to arguments in favor of conjugate priors when using squared error loss [14], it makes sense to assume normality in x when Kriging. Let Π0,C be the family of joint distributions for xn with first two moments 0, C. For π1, π2 ∈ Π0,C, let Rπ1(π2) = Eπ1[(Eπ2[x(s∗)|xn] − x(s∗))2]. If π0 is Gaussian, then for all π ∈ Π0,C, Rπ(π) ≤ Rπ(π0) = Rπ0(π0) ≤ Rπ0(π).

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-42
SLIDE 42

MCMC

Distributional assumptions

KALE/KILE does not require Gaussian assumption for x to obtain point predictions, but MCMC procedure does. Similar to arguments in favor of conjugate priors when using squared error loss [14], it makes sense to assume normality in x when Kriging. Let Π0,C be the family of joint distributions for xn with first two moments 0, C. For π1, π2 ∈ Π0,C, let Rπ1(π2) = Eπ1[(Eπ2[x(s∗)|xn] − x(s∗))2]. If π0 is Gaussian, then for all π ∈ Π0,C, Rπ(π) ≤ Rπ(π0) = Rπ0(π0) ≤ Rπ0(π). Rπ0(π) − Rπ0(π0) is the cost of incorrectly assuming π when x is Gaussian. Rπ(π0) − Rπ(π) is the opportunity cost of a Gaussian assumption.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-43
SLIDE 43

MCMC

Gradient methods

This problem favors gradient-based MCMC samplers (HMC, MALA): log(π(θ, un|yn)) = −1 2 log(|Cθ(un)|) − 1 2y′

nCθ(un)−1yn + const.

∂ ∂ui log(π(θ, un|yn)) = 1 2Tr

  • Cθ(un)−1

∂ ∂ui Cθ(un) Cθ(un)−1yny′

n − In

  • + ∂

∂ui log(π(un)) ∂ ∂θi log(π(θ, un|yn)) = 1 2Tr

  • Cθ(un)−1

∂ ∂θi Cθ(un) Cθ(un)−1yny′

n − In

  • + ∂

∂θi log(π(θ)) where Cθ(un) = Cθ(sn + un, sn + un). Computational complexity of both log-likelihood and gradient dominated by Cθ(un)−1.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-44
SLIDE 44

MCMC

Multimodality

Multimodality is a common problem. In error-free regime, likelihood for θ can be multimodal [20]. In isotropic model with location errors un, π(yn|un, θ) constant for un across contours preserving pairwise distances.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-45
SLIDE 45

MCMC

Multimodality

Multimodality is a common problem. In error-free regime, likelihood for θ can be multimodal [20]. In isotropic model with location errors un, π(yn|un, θ) constant for un across contours preserving pairwise distances. Example of isolated modes: n = 2, p = 1. c(s1, s2) = exp(−(s1 − s2)2) + σ2

x1[s1 = s2].

ui ∼ N(0, σ2

u).

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-46
SLIDE 46

MCMC

Multimodality

Multimodality is a common problem. In error-free regime, likelihood for θ can be multimodal [20]. In isotropic model with location errors un, π(yn|un, θ) constant for un across contours preserving pairwise distances. Example of isolated modes: n = 2, p = 1. c(s1, s2) = exp(−(s1 − s2)2) + σ2

x1[s1 = s2].

ui ∼ N(0, σ2

u).

−4 −2 2 4 −4 −2 2 4 u1 u2

σ2

u = 2, σ2 x = 0.0001 −4 −2 2 4 −4 −2 2 4 u1 u2

σ2

u = 2, σ2 x = 1 −0.5 0.0 0.5 −0.5 0.0 0.5 u1 u2

σ2

u = 0.1, σ2 x = 0.0001

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-47
SLIDE 47

Simulation study

Simulation study compares data: c(s1, s2) = τ 2 exp(−β||s1 − s2||2) + σ2

x1[s1 = s2].

location errors: ui

iid

∼ N(0, σ2

uI2).

methods: KILE, KALE, HMC. tasks: parameter inference, point prediction, interval prediction. scenarios: parameters assumed known, parameters first estimated.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-48
SLIDE 48

Simulation study

Simulation study compares data: c(s1, s2) = τ 2 exp(−β||s1 − s2||2) + σ2

x1[s1 = s2].

location errors: ui

iid

∼ N(0, σ2

uI2).

methods: KILE, KALE, HMC. tasks: parameter inference, point prediction, interval prediction. scenarios: parameters assumed known, parameters first estimated. Parameter Values used τ 2 1 β 0.001, 0.01, 0.1, 0.5, 1, 2 σ2

x

0.0001, 0.01, 0.1, 0.5, 1 σ2

u

0.0001, 0.01, 0.1, 0.5, 1

Parameter values used in simulation study.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-49
SLIDE 49

Simulation study

Simulation study compares data: c(s1, s2) = τ 2 exp(−β||s1 − s2||2) + σ2

x1[s1 = s2].

location errors: ui

iid

∼ N(0, σ2

uI2).

methods: KILE, KALE, HMC. tasks: parameter inference, point prediction, interval prediction. scenarios: parameters assumed known, parameters first estimated. Parameter Values used τ 2 1 β 0.001, 0.01, 0.1, 0.5, 1, 2 σ2

x

0.0001, 0.01, 0.1, 0.5, 1 σ2

u

0.0001, 0.01, 0.1, 0.5, 1

Parameter values used in simulation study. 2 4 6 8 2 4 6 8

β = 0.001 black = observed locations; white = predicted locations.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-50
SLIDE 50

Simulation study

Simulation study compares data: c(s1, s2) = τ 2 exp(−β||s1 − s2||2) + σ2

x1[s1 = s2].

location errors: ui

iid

∼ N(0, σ2

uI2).

methods: KILE, KALE, HMC. tasks: parameter inference, point prediction, interval prediction. scenarios: parameters assumed known, parameters first estimated. Parameter Values used τ 2 1 β 0.001, 0.01, 0.1, 0.5, 1, 2 σ2

x

0.0001, 0.01, 0.1, 0.5, 1 σ2

u

0.0001, 0.01, 0.1, 0.5, 1

Parameter values used in simulation study. 2 4 6 8 2 4 6 8

β = 0.1 black = observed locations; white = predicted locations.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-51
SLIDE 51

Simulation study

Simulation study compares data: c(s1, s2) = τ 2 exp(−β||s1 − s2||2) + σ2

x1[s1 = s2].

location errors: ui

iid

∼ N(0, σ2

uI2).

methods: KILE, KALE, HMC. tasks: parameter inference, point prediction, interval prediction. scenarios: parameters assumed known, parameters first estimated. Parameter Values used τ 2 1 β 0.001, 0.01, 0.1, 0.5, 1, 2 σ2

x

0.0001, 0.01, 0.1, 0.5, 1 σ2

u

0.0001, 0.01, 0.1, 0.5, 1

Parameter values used in simulation study. 2 4 6 8 2 4 6 8

β = 2 black = observed locations; white = predicted locations.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-52
SLIDE 52

Simulation study (known parameters)

MSE ratios

Parameters assumed known. Nugget: σ2

x = 0.0001. Relative MSPE for KALE/KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.73 0.77 0.95 1.00 1.00 0.60 0.58 0.78 0.93 1.00 0.22 0.24 0.32 0.49 1.00 0.28 0.31 0.54 0.95 1.00 0.75 0.83 0.98 1.00 1.00 1.02 1.03 1.01 1.00 1.00 Relative MSPE for HMC/KALE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.87 0.83 0.90 1.00 1.00 0.78 0.69 0.81 0.91 1.00 0.71 0.78 0.72 0.82 0.99 0.93 0.89 0.94 0.98 1.00 0.95 0.95 0.97 1.00 1.00 0.96 0.96 0.99 1.00 1.00

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-53
SLIDE 53

Simulation study (known parameters)

MSE ratios

Parameters assumed known. Nugget: σ2

x = 0.01. Relative MSPE for KALE/KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 1.00 1.00 1.00 1.00 1.00 0.93 0.94 0.99 1.00 1.00 0.64 0.68 0.87 1.00 1.00 0.41 0.48 0.68 0.98 1.00 0.79 0.79 0.97 1.00 1.00 1.02 1.02 1.01 1.00 1.00 Relative MSPE for HMC/KALE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.96 0.99 1.00 1.00 1.00 0.92 0.93 0.99 1.00 1.00 0.77 0.71 0.87 0.99 1.00 0.91 0.94 0.93 0.99 1.00 0.95 0.95 0.98 1.00 1.00 0.96 0.96 0.99 1.00 1.00

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-54
SLIDE 54

Simulation study (known parameters)

MSE ratios

Parameters assumed known. Nugget: σ2

x = 0.1. Relative MSPE for KALE/KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 1.00 1.00 1.00 1.00 1.00 0.96 0.99 1.00 1.00 1.00 0.86 0.94 0.99 1.00 1.00 0.67 0.77 0.92 1.00 1.00 0.84 0.89 0.99 1.00 1.00 1.03 1.01 1.00 1.00 1.00 Relative MSPE for HMC/KALE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 1.00 1.00 1.00 1.00 1.00 0.95 0.98 1.00 1.00 1.00 0.81 0.89 0.98 1.00 1.00 0.94 0.92 0.97 1.00 1.00 0.95 0.96 0.98 1.00 1.00 0.96 0.97 0.99 1.00 1.00

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-55
SLIDE 55

Simulation study (known parameters)

MSE ratios

Parameters assumed known. Nugget: σ2

x = 1. Relative MSPE for KALE/KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99 1.00 1.00 1.00 1.00 0.92 0.96 1.00 1.00 1.00 0.97 0.98 1.00 1.00 1.00 1.01 1.01 1.00 1.00 1.00 Relative MSPE for HMC/KALE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99 0.98 1.00 1.00 1.00 0.98 0.99 0.99 1.00 1.00 0.98 0.99 1.00 1.00 1.00 0.98 0.98 1.00 1.00 1.00

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-56
SLIDE 56

Simulation study (known parameters)

Interval coverage (KILE only)

Parameters assumed known.

95% interval coverage for KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.33 0.44 0.66 0.91 0.95 0.11 0.15 0.31 0.71 0.95 0.04 0.07 0.12 0.32 0.92 0.41 0.45 0.66 0.90 0.95 0.84 0.88 0.93 0.95 0.95 0.95 0.95 0.95 0.95 0.95

σ2

x = 0.0001 95% interval coverage for KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.91 0.93 0.95 0.95 0.95 0.66 0.77 0.91 0.94 0.95 0.31 0.42 0.70 0.91 0.95 0.48 0.55 0.74 0.92 0.95 0.85 0.88 0.93 0.95 0.95 0.95 0.95 0.95 0.95 0.95

σ2

x = 0.01

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-57
SLIDE 57

Simulation study (known parameters)

Interval coverage (KILE only)

Parameters assumed known.

95% interval coverage for KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.95 0.95 0.95 0.95 0.95 0.89 0.92 0.95 0.95 0.95 0.66 0.79 0.91 0.95 0.95 0.70 0.77 0.88 0.94 0.95 0.88 0.90 0.94 0.95 0.95 0.95 0.95 0.95 0.95 0.95

σ2

x = 0.1 95% interval coverage for KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.95 0.95 0.95 0.95 0.95 0.94 0.95 0.95 0.95 0.95 0.89 0.91 0.94 0.95 0.95 0.89 0.92 0.94 0.95 0.95 0.93 0.94 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95

σ2

x = 1

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-58
SLIDE 58

Simulation study

Estimating parameters

When parameters τ 2, β, σ2

x are unknown:

KILE: estimated with maximum likelihood. KALE: estimated with maximum pseudo-likelihood. HMC: given flat priors over reasonable range and sampled jointly with un.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-59
SLIDE 59

Simulation study

Estimating parameters

When parameters τ 2, β, σ2

x are unknown:

KILE: estimated with maximum likelihood. KALE: estimated with maximum pseudo-likelihood. HMC: given flat priors over reasonable range and sampled jointly with un. Recall the form of k for this simulation: k(s1, s2) = τ 2 (1 + 4βσ2

u)p/2 exp

β 1 + 4βσ2

u

||s1 − s2||2

  • .

τ 2, β, σ2

u not identifiable.

MLE invariance yields same estimated covariance function for KALE/KILE, though Kriging equations will be different.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-60
SLIDE 60

Simulation study (unknown parameters)

MSE ratios

Parameters unknown and first estimated. Nugget: σ2

x = 0.0001. Relative MSPE for KALE/KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.98 0.97 1.00 1.00 1.00 1.02 0.97 1.00 1.00 1.00 0.88 0.95 0.97 1.00 1.00 1.00 0.97 0.98 0.99 1.00 0.98 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 Relative MSPE for HMC/KALE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.92 0.97 0.96 0.99 0.99 0.77 0.74 0.83 0.92 1.00 0.82 0.80 0.77 0.85 0.98 1.00 0.94 1.00 1.09 1.14 0.98 0.96 0.98 1.02 1.03 0.98 0.98 0.97 0.98 0.97

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-61
SLIDE 61

Simulation study (unknown parameters)

MSE ratios

Parameters unknown and first estimated. Nugget: σ2

x = 0.01. Relative MSPE for KALE/KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 1.03 1.03 1.00 1.00 1.00 1.01 0.97 1.00 1.00 1.00 0.90 0.98 0.99 1.00 1.00 0.99 0.97 0.98 0.99 1.00 0.99 1.00 1.00 1.00 1.00 1.00 1.00 0.99 1.00 1.00 Relative MSPE for HMC/KALE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.98 1.06 1.05 1.05 1.05 0.77 0.92 0.95 0.99 0.99 0.79 0.75 0.88 0.98 0.99 0.93 0.98 0.99 1.03 1.06 0.98 0.96 0.99 1.01 1.05 0.99 0.97 0.98 0.97 0.97

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-62
SLIDE 62

Simulation study (unknown parameters)

MSE ratios

Parameters unknown and first estimated. Nugget: σ2

x = 0.1. Relative MSPE for KALE/KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 1.05 1.02 1.01 1.00 1.00 1.02 0.86 1.01 1.00 1.00 0.94 0.98 1.01 1.00 1.00 0.97 0.99 0.98 1.00 1.00 0.99 1.00 0.99 1.00 1.00 0.99 1.00 0.99 1.00 1.00 Relative MSPE for HMC/KALE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 1.31 1.14 1.16 1.18 1.15 0.85 0.97 0.98 0.99 0.99 0.82 0.87 0.97 0.99 0.98 0.99 0.97 0.98 0.98 0.98 0.97 0.95 0.97 1.00 1.00 0.98 0.95 0.97 0.98 0.96

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-63
SLIDE 63

Simulation study (unknown parameters)

MSE ratios

Parameters unknown and first estimated. Nugget: σ2

x = 1. Relative MSPE for KALE/KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 1.03 0.90 0.92 1.00 1.00 1.08 1.05 1.00 1.00 1.00 1.01 1.02 1.01 1.00 1.00 0.99 1.00 1.01 1.00 1.00 0.99 1.00 1.00 1.00 1.00 0.99 0.99 0.99 1.00 1.00 Relative MSPE for HMC/KALE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 1.58 1.35 1.35 1.21 1.56 1.17 1.10 1.28 1.11 1.13 0.91 0.95 1.05 1.06 1.01 0.96 0.94 0.96 0.96 0.98 0.95 0.94 0.94 0.94 0.95 0.94 0.95 0.96 0.95 0.96

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-64
SLIDE 64

Simulation study (unknown parameters)

Interval coverage

Nugget: σ2

x = 0.0001.

95% interval coverage for KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.92 0.94 0.93 0.94 0.94 0.87 0.89 0.91 0.95 0.94 0.75 0.82 0.88 0.93 0.92 0.57 0.68 0.87 0.91 0.93 0.54 0.63 0.82 0.87 0.89 0.46 0.57 0.68 0.68 0.58 95% interval coverage for KALE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.97 0.99 0.96 0.95 0.94 0.95 0.95 0.96 0.98 0.94 0.93 0.94 0.95 0.97 0.94 0.75 0.82 0.93 0.94 0.93 0.62 0.71 0.87 0.88 0.89 0.46 0.57 0.69 0.68 0.58 95% interval coverage for HMC σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.98 0.98 0.96 0.96 0.96 0.94 0.96 0.96 0.97 0.96 0.94 0.94 0.95 0.97 0.96 0.90 0.92 0.94 0.95 0.95 0.88 0.89 0.91 0.90 0.92 0.87 0.87 0.87 0.86 0.86 Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-65
SLIDE 65

Simulation study (unknown parameters)

Interval coverage

Nugget: σ2

x = 0.01.

95% interval coverage for KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.92 0.93 0.92 0.93 0.93 0.88 0.91 0.92 0.93 0.93 0.76 0.85 0.91 0.92 0.93 0.55 0.71 0.86 0.89 0.90 0.56 0.59 0.80 0.85 0.89 0.48 0.57 0.67 0.70 0.73 95% interval coverage for KALE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.94 0.94 0.93 0.93 0.93 0.94 0.95 0.94 0.94 0.93 0.92 0.94 0.96 0.94 0.93 0.70 0.84 0.92 0.92 0.90 0.58 0.64 0.85 0.86 0.89 0.43 0.57 0.68 0.70 0.73 95% interval coverage for HMC σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.97 0.97 0.97 0.97 0.97 0.96 0.96 0.96 0.96 0.96 0.94 0.95 0.96 0.95 0.95 0.91 0.93 0.93 0.94 0.94 0.88 0.89 0.91 0.91 0.90 0.87 0.87 0.87 0.87 0.86 Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-66
SLIDE 66

Simulation study (unknown parameters)

Interval coverage

Nugget: σ2

x = 0.1.

95% interval coverage for KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.92 0.91 0.92 0.91 0.93 0.87 0.90 0.92 0.92 0.91 0.74 0.85 0.92 0.92 0.93 0.61 0.71 0.88 0.89 0.92 0.53 0.66 0.79 0.83 0.87 0.45 0.55 0.66 0.64 0.62 95% interval coverage for KALE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.93 0.92 0.92 0.91 0.93 0.91 0.92 0.92 0.92 0.91 0.88 0.93 0.94 0.93 0.93 0.71 0.82 0.93 0.90 0.92 0.53 0.71 0.82 0.84 0.87 0.45 0.56 0.66 0.63 0.62 95% interval coverage for HMC σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.98 0.98 0.98 0.97 0.97 0.98 0.97 0.97 0.96 0.96 0.95 0.96 0.96 0.96 0.96 0.92 0.93 0.95 0.94 0.95 0.90 0.90 0.92 0.91 0.92 0.88 0.88 0.88 0.87 0.87 Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-67
SLIDE 67

Simulation study (unknown parameters)

Interval coverage

Nugget: σ2

x = 1.

95% interval coverage for KILE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.88 0.89 0.89 0.86 0.86 0.88 0.85 0.91 0.87 0.88 0.77 0.83 0.88 0.87 0.90 0.62 0.70 0.85 0.88 0.88 0.59 0.62 0.72 0.79 0.80 0.54 0.60 0.57 0.57 0.56 95% interval coverage for KALE σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 0.87 0.87 0.89 0.86 0.86 0.89 0.86 0.89 0.87 0.88 0.82 0.86 0.89 0.87 0.90 0.65 0.70 0.86 0.88 0.88 0.53 0.58 0.72 0.79 0.80 0.43 0.52 0.56 0.57 0.56 95% interval coverage for HMC σu

2

1e−04 0.01 0.1 0.5 1 0.001 0.01 0.1 0.5 1 2 β 1.00 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.97 0.96 0.97 0.97 0.96 0.96 0.96 0.94 0.95 0.95 Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-68
SLIDE 68

Data example

Interpolating northern hemisphere temperature anomolies for summer 20111

50 60 70 80 −100 100

longitude latitude −2 −1 1 temp

1Data available: http://www.cru.uea.ac.uk/cru/data/temperature/

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-69
SLIDE 69

Data example

Interpolating northern hemisphere temperature anomolies for summer 20111

50 60 70 80 −100 100

longitude latitude −2 −1 1 temp

Temperatures are averaged over April–September time window and 5◦ × 5◦ long–lat grid cell. Values expressed as anomolies relative to 1860-2010 average [18]. We further subtract the 2011 mean. Numerous pre-processing steps and adjustments to data [4, 13, 7].

1Data available: http://www.cru.uea.ac.uk/cru/data/temperature/

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-70
SLIDE 70

Data example

Interpolating northern hemisphere temperature anomolies for summer 20111

50 60 70 80 −100 100

longitude latitude −2 −1 1 temp

Temperatures are averaged over April–September time window and 5◦ × 5◦ long–lat grid cell. Values expressed as anomolies relative to 1860-2010 average [18]. We further subtract the 2011 mean. Numerous pre-processing steps and adjustments to data [4, 13, 7]. Geo-referencing by grid cells is a location error problem.

1Data available: http://www.cru.uea.ac.uk/cru/data/temperature/

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-71
SLIDE 71

Data example

Interpolating northern hemisphere temperature anomolies for summer 2011

Covariance function is based on distance along the Earth’s surface [19]: c(s1, s2) = τ 2 exp(−β∆) + σ2

x1[s1 = s2]

∆ = 2r arcsin

  • sin2

φ2 − φ1 2

  • + cos(φ1) cos(φ2) sin2

ψ2 − ψ1 2

  • ,

s = (ψ, φ) are longitude, latitude pairs. r = 6371 is the Earth’s radius in km.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-72
SLIDE 72

Data example

Interpolating northern hemisphere temperature anomolies for summer 2011

Covariance function is based on distance along the Earth’s surface [19]: c(s1, s2) = τ 2 exp(−β∆) + σ2

x1[s1 = s2]

∆ = 2r arcsin

  • sin2

φ2 − φ1 2

  • + cos(φ1) cos(φ2) sin2

ψ2 − ψ1 2

  • ,

s = (ψ, φ) are longitude, latitude pairs. r = 6371 is the Earth’s radius in km. We assume location errors are i.i.d. in terms of distance on the Earth’s surface: ui ∼ N

  • 0, σ2

u

180 πr 2

1 cos2(φi)

1

  • .

σ2

u = 500 yields a 28km expected distance between s + u and s.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-73
SLIDE 73

Data example

Interpolating northern hemisphere temperature anomolies for summer 2011

KALE/KILE approach:

50 60 70 80 −100 100

longitude latitude −1 1 temp

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-74
SLIDE 74

Data example

Interpolating northern hemisphere temperature anomolies for summer 2011

KALE/KILE approach: ˆ τ 2 ˆ β ˆ σ2

x

KILE 1.167 1.428 × 10−4 0.075 KALE 1.167 1.430 × 10−4 0.074

Parameter estimates

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-75
SLIDE 75

Data example

Interpolating northern hemisphere temperature anomolies for summer 2011

KALE/KILE approach: ˆ τ 2 ˆ β ˆ σ2

x

KILE 1.167 1.428 × 10−4 0.075 KALE 1.167 1.430 × 10−4 0.074

Parameter estimates

KALE - KILE for point predictions:

50 60 70 80 −100 100

longitude latitude −0.001 0.000 0.001 0.002 temp

KALE - KILE for interval length:

50 60 70 80 −100 100

longitude latitude −0.002 −0.001 0.000 0.001 0.002 temp

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-76
SLIDE 76

Data example

Interpolating northern hemisphere temperature anomolies for summer 2011

HMC approach

50 60 70 80 −100 100

longitude latitude −1 1 temp

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-77
SLIDE 77

Data example

Interpolating northern hemisphere temperature anomolies for summer 2011

HMC approach This looks different from Kriging estimates HMC also averages over posterior parameter uncertainty. More meaningful comparison is against σ2

u = 0 model using HMC.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-78
SLIDE 78

Data example

Interpolating northern hemisphere temperature anomolies for summer 2011

HMC approach This looks different from Kriging estimates HMC also averages over posterior parameter uncertainty. More meaningful comparison is against σ2

u = 0 model using HMC.

{σ2

u = 500} − {σ2 u = 0} point predictions:

50 60 70 80 −100 100

longitude latitude −0.04 0.00 0.04 temp

{σ2

u = 500} − {σ2 u = 0} interval lengths:

50 60 70 80 −100 100

longitude latitude −0.15 −0.12 −0.09 temp

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-79
SLIDE 79

Data example

Interpolating northern hemisphere temperature anomolies for summer 2011

HMC differs in parameter inference for {σ2

u = 500} and {σ2 u = 0} models:

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-80
SLIDE 80

Data example

Interpolating northern hemisphere temperature anomolies for summer 2011

HMC differs in parameter inference for {σ2

u = 500} and {σ2 u = 0} models: 10000 20000 30000 40000 50000 0e+00 1e−04 2e−04 3e−04

beta density

sig2u 500

HMC with {σ2

u = 0} agrees with parameter inference from KALE/KILE.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-81
SLIDE 81

Data example

Interpolating northern hemisphere temperature anomolies for summer 2011

HMC differs in parameter inference for {σ2

u = 500} and {σ2 u = 0} models: 25 50 75 100 0.00 0.05 0.10 0.15

sig2x density

sig2u 500

HMC with {σ2

u = 0} agrees with parameter inference from KALE/KILE.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-82
SLIDE 82

Conclusions

Location errors and noisy inputs are a common (but often ignored) problem in GP regression.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-83
SLIDE 83

Conclusions

Location errors and noisy inputs are a common (but often ignored) problem in GP regression. Analyst can get away with ignoring location errors when: Spatial correlations are either very strong or very weak and location errors are sufficiently small. Nugget variance σ2

x is large.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-84
SLIDE 84

Conclusions

Location errors and noisy inputs are a common (but often ignored) problem in GP regression. Analyst can get away with ignoring location errors when: Spatial correlations are either very strong or very weak and location errors are sufficiently small. Nugget variance σ2

x is large.

Kriging using moment properties of y (KALE) is an acceptable solution in some situations: Dominates KILE in MSE when covariance parameters are known. Provides correct confidence intervals when covariance parameters are known.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-85
SLIDE 85

Conclusions

Location errors and noisy inputs are a common (but often ignored) problem in GP regression. Analyst can get away with ignoring location errors when: Spatial correlations are either very strong or very weak and location errors are sufficiently small. Nugget variance σ2

x is large.

Kriging using moment properties of y (KALE) is an acceptable solution in some situations: Dominates KILE in MSE when covariance parameters are known. Provides correct confidence intervals when covariance parameters are known. Using MCMC, we gain several additional advantages: Dominates KALE in MSE when covariance parameters are known. Provides correct confidence intervals when covariance parameters are known. Naturally incorporates uncertainty from estimated parameters.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-86
SLIDE 86

Conclusions

Location errors and noisy inputs are a common (but often ignored) problem in GP regression. Analyst can get away with ignoring location errors when: Spatial correlations are either very strong or very weak and location errors are sufficiently small. Nugget variance σ2

x is large.

Kriging using moment properties of y (KALE) is an acceptable solution in some situations: Dominates KILE in MSE when covariance parameters are known. Provides correct confidence intervals when covariance parameters are known. Using MCMC, we gain several additional advantages: Dominates KALE in MSE when covariance parameters are known. Provides correct confidence intervals when covariance parameters are known. Naturally incorporates uncertainty from estimated parameters. Difficulties that remain: Prior sensitivity is an issue, particularly for spatial problems. MCMC covergence issues due to multiple (isolated) modes. Coverage guarantees when parameters are estimated.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-87
SLIDE 87

Future work

Climate reconstruction

temperature site ice core tree ring varve Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-88
SLIDE 88

Future work

Climate reconstruction

temperature site ice core tree ring varve

Incorporating proxy data, with location uncertainties [11]. Spatiotemporal heteroskedasticity in location errors. Nonstationary covariance behavior [17, 8].

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-89
SLIDE 89

Thanks to

This work: Natesh Pillai Peter Huybers Luke Bornn My dissertation committee: Natesh Pillai Carl Morris Luke Bornn Faculty, classmates, and friends in the Statistics Department.

Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-90
SLIDE 90

References I

[1]

  • J. J. Barber, A. E. Gelfand, and J. A. Silander.

Modelling map positional error to infer true feature location. Canadian Journal of Statistics, 34(4):659–676, 2006. [2]

  • L. Beale, J. J. Abellan, S. Hodgson, and L. Jarup.

Methodologic issues and approaches to spatial epidemiology. Environ Health Perspect, 116(8):1105–1110, 2008. [3]

  • M. R. Bonner, D. Han, J. Nie, P. Rogerson, J. E. Vena, and J. L. Freudenheim.

Positional accuracy of geocoded addresses in epidemiologic research. Epidemiology, 14(4):408–412, 2003. [4]

  • P. Brohan, J. J. Kennedy, I. Harris, S. F. Tett, and P. D. Jones.

Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850. Journal of Geophysical Research: Atmospheres (1984–2012), 111(D12), 2006. [5]

  • R. J. Carroll, D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu.

Measurement error in nonlinear models: a modern perspective. CRC press, 2006. [6]

  • N. Cressie and J. Kornak.

Spatial statistics in the presence of location error with an application to remote sensing of the environment. Statistical science, 18(4):436–456, 2003. [7]

  • P. Jones, D. Lister, T. Osborn, C. Harpham, M. Salmon, and C. Morice.

Hemispheric and large-scale land-surface air temperature variations: An extensive revision and an update to 2010. Journal of Geophysical Research: Atmospheres (1984–2012), 117(D5), 2012. [8]

  • M. Jun and M. L. Stein.

Nonstationary covariance models for global data. The Annals of Applied Statistics, pages 1271–1289, 2008. [9]

  • D. Koller, K. Daniilidis, T. Thorhallson, and H.-H. Nagel.

Model-based object tracking in traffic scenes. In Computer VisionECCV’92, pages 437–452. Springer, 1992. Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-91
SLIDE 91

References II

[10]

  • N. Krieger, J. T. Chen, P. D. Waterman, M.-J. Soobader, S. Subramanian, and R. Carson.

Geocoding and monitoring of us socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter? the public health disparities geocoding project. American journal of epidemiology, 156(5):471–482, 2002. [11]

  • M. E. Mann, Z. Zhang, M. K. Hughes, R. S. Bradley, S. K. Miller, S. Rutherford, and F. Ni.

Proxy-based reconstructions of hemispheric and global surface temperature variations over the past two millennia. Proceedings of the National Academy of Sciences, 105(36):13252–13257, 2008. [12] X.-L. Meng. Multiple-imputation inferences with uncongenial sources of input. Statistical Science, pages 538–558, 1994. [13]

  • C. P. Morice, J. J. Kennedy, N. A. Rayner, and P. D. Jones.

Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The hadcrut4 data set. Journal of Geophysical Research: Atmospheres (1984–2012), 117(D8), 2012. [14]

  • C. N. Morris.

Natural exponential families with quadratic variance functions: statistical theory. The Annals of Statistics, pages 515–529, 1983. [15]

  • D. Pfoser and C. S. Jensen.

Capturing the uncertainty of moving-object representations. In Advances in Spatial Databases, pages 111–131. Springer, 1999. [16]

  • D. Rocchini, J. Hortal, S. Lengyel, J. M. Lobo, A. Jimenez-Valverde, C. Ricotta, G. Bacaro, and A. Chiarucci.

Accounting for uncertainty when mapping species distributions: the need for maps of ignorance. Progress in Physical Geography, 35(2):211–226, 2011. [17]

  • M. L. Stein.

Space–time covariance functions. Journal of the American Statistical Association, 100(469):310–321, 2005. [18]

  • M. P. Tingley.

A bayesian anova scheme for calculating climate anomalies, with applications to the instrumental temperature record. Journal of Climate, 25(2):777–791, 2012. Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015

slide-92
SLIDE 92

References III

[19]

  • M. P. Tingley and P. Huybers.

A bayesian algorithm for reconstructing climate anomalies in space and time. part i: Development and applications to paleoclimate reconstruction problems. Journal of Climate, 23(10):2759–2781, 2010. [20]

  • J. Warnes and B. Ripley.

Problems with likelihood estimation of covariance functions of spatial gaussian processes. Biometrika, 74(3):640–642, 1987. Dan Cervone (Harvard Statistics Department) GP Regression with Noisy Inputs March 3, 2015