BAYESIAN CALIBRATION OF COMPUTER MODELS Bayesian inference & - - PowerPoint PPT Presentation

bayesian calibration of computer models bayesian
SMART_READER_LITE
LIVE PREVIEW

BAYESIAN CALIBRATION OF COMPUTER MODELS Bayesian inference & - - PowerPoint PPT Presentation

BAYESIAN CALIBRATION OF COMPUTER MODELS Bayesian inference & Markov chain Monte Carlo Gaussian processes, Computer model calibration and prediction Dave Higdon, Virginia Tech, Brian Williams & Jim Gattiker Statistical Sciences Group,


slide-1
SLIDE 1

BAYESIAN CALIBRATION OF COMPUTER MODELS Bayesian inference & Markov chain Monte Carlo Gaussian processes, Computer model calibration and prediction Dave Higdon, Virginia Tech, Brian Williams & Jim Gattiker Statistical Sciences Group, LANL

h = 3.5 cm h = 3.8 cm h = 4 cm h = 4 cm h = 4.1 cm h = 4.4 cm

slide-2
SLIDE 2

BRIEF INTRO TO BAYESIAN COMPUTER MODEL CALIBRATION

slide-3
SLIDE 3

Motivation

2 4 6 8 10 1 2 3 4 5 6 7

drop time drop height (floor)

2 4 6 8 10 1 2 3 4 5 6 7

drop time drop height (floor)

2 4 6 8 10 1 2 3 4 5 6 7

drop time drop height (floor)

Data generated from model:

d2z dt2 = −1 − .3dz dt +

simulation model:

d2z dt2 = −1

statistical model: y(z) = η(z) + δ(z) + Improved physics model:

d2z dt2 = −1 − θdz dt +

statistical model: y(z) = η(z, θ) + δ(z) +

slide-4
SLIDE 4

An Example

0.5 1 0.5 1 −2 2 t (a) model runs x η(x,t) 0.5 1 −3 −2 −1 1 2 3 x y(x), η(x,t)

θ

.5 1

(b) data & prior uncertainty 0.5 1 0.5 1 −1 1 2 3 t (c) posterior mean for η(x,t) x η(x,t) 0.5 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1

(d) calibrated simulator prediction 0.5 1 −3 −2 −1 1 2 3 x δ(x) (e) posterior model discrepancy 0.5 1 −3 −2 −1 1 2 3 x y(x), ζ(x) (f) calibrated prediction

slide-5
SLIDE 5

An Example - with discrepancy

0.5 1 0.5 1 −2 2 t (a) model runs x η(x,t) 0.5 1 −3 −2 −1 1 2 3 x y(x), η(x,t)

θ

.5 1

(b) data & prior uncertainty 0.5 1 0.5 1 −1 1 2 3 t (c) posterior mean for η(x,t) x η(x,t) 0.5 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1

(d) calibrated simulator prediction 0.5 1 −3 −2 −1 1 2 3 x δ(x) (e) posterior model discrepancy 0.5 1 −3 −2 −1 1 2 3 x y(x), ζ(x) (f) calibrated prediction

slide-6
SLIDE 6

BRIEF INTRO TO BAYESIAN SOLUTIONS FOR INVERSE PROBLEMS

slide-7
SLIDE 7

Bayesian analysis of an inverse problem

0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 prior uncertainty 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 posterior uncertainty

  • A simple example...

x experimental conditions θ model calibration parameters ζ(x) true physical system response given inputs x η(x, θ) forward simulator response at x and θ. y(x) experimental observation of the physical system e(x)

  • bservation error of the experimental data

Assume: y(x) = ζ(x) + e(x) = η(x, θ) + e(x) θ unknown.

slide-8
SLIDE 8

Data for the toy inverse problem

0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 prior uncertainty 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 posterior uncertainty

n = 5 physical observations variable data i 1 2 3 4 5 x 0.05 0.25 0.52 0.65 0.91 y 2.2731 0.7371 0.1138 −0.2254 −0.6807

slide-9
SLIDE 9

Likelihood

0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3

x y(x), η(x,θ)

0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7

L(y|θ) θ

L(y|η(x, θ)) ∝ λ

n 2

y exp

⎧ ⎪ ⎨ ⎪ ⎩−

1 2 · .252λy(y − η(x, θ))T(y − η(x, θ))

⎫ ⎪ ⎬ ⎪ ⎭

  • L(y|θ) is the probability model for the data y given θ
  • tells us which values of θ are likely given the observed data y
  • can combine with the prior π(θ) to describe posterior uncertainty for θ
slide-10
SLIDE 10

Bayes’ Rule

π(θ|y) ∝ L(y|θ) × π(θ)

  • pointwise multiplication over the support of θ
  • very general approach for inference
  • prior pdf for θ is required
  • normalizing π(θ|y) is generally difficult, but rarely necessary
  • high dimensional θ can lead to computational challenges
slide-11
SLIDE 11

Bayes’ Rule (independent components)

π(θ)

0.2 0.4 0.6 0.8 1

L(y1|θ)

0.2 0.4 0.6 0.8 1

L(y2|θ)

0.2 0.4 0.6 0.8 1

L(y3|θ)

0.2 0.4 0.6 0.8 1

L(y4|θ)

0.2 0.4 0.6 0.8 1

L(y5|θ)

0.2 0.4 0.6 0.8 1

π(θ|y) θ

  • With independent data, the likelihood is a

product of independent components: L(y|θ) =

n

  • i=1 L(yi|θ)

=

n

  • i=1 exp
  • −1

2λy(yi − η(xi, θ))2

  • (here we fix λy = 4).
  • A central limit theorem:
  • 0. yi ∼ L(yi|θ), i = 1, . . . , n, independent
  • 1. regularity on L(yi|θ)’s
  • 2. prior support for π(θ) covers true θ

π(θ|y) → dnorm(θ, λ−1

n )

where λn = d2

dθ2 log π(θ|y)

slide-12
SLIDE 12

Exploring the posterior distribution

  • Use Markov chain Monte Carlo to build a Markov chain with stationary dis-

tribution π(θ|y)

  • Realizations are a (correlated) sample from π(θ|y)
  • π(θ|y) need not be normalized
slide-13
SLIDE 13

Metropolis recipe for MCMC

1 2 3 4 0.3 0.4 0.5 0.6 0.7 0.8 0.9

iteration θ

accept accept reject

π(θ|y)

Initialize chain at θ0

  • 1. Given current realization θt, generate θ∗ from a symmetric kernel q(θt → θ∗)

i.e. q(θt → θ∗) = q(θ∗ → θt)

  • 2. Compute acceptance probability α = min
  • 1, π(θ∗|y)

π(θt|y)

  • 3. Set θt+1 = θ∗ with probability α, otherwise θt+1 = θt
  • 4. Iterate steps 1 – 3
slide-14
SLIDE 14

Metropolis sampling for the inverse problem

500 1000 1500 2000 2500 3000 3500 4000 0.3 0.4 0.5 0.6 0.7 0.8

iteration θ

0.3 0.4 0.5 0.6 0.7 0.8

θ density

0.3 0.4 0.5 0.6 0.7 0.8

θ density

0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 prior uncertainty 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 posterior uncertainty

  • chain θ0, θ1, . . . , θ4000 is a draw from π(θ|y)
  • use Monte Carlo sample to estimate expec-

tations, variances, probabilities, etc.

slide-15
SLIDE 15

Treating error precision λy as unknown

Sampling model: yi = η(xi, θ) + ei, where ei

iid

∼ N(0, 1/λy) which gives likelihood: L(y|θ, λy) ∝ λ

n 2

y exp

⎧ ⎪ ⎨ ⎪ ⎩−

1 2 · .252λy

n

  • i=1(yi − η(xi, θ))2

⎫ ⎪ ⎬ ⎪ ⎭

Priors π(θ) ∝ I[0 ≤ θ ≤ 1] π(λy) ∝ λay−1

y

exp{−byλy}, ay = 5, by = 5

2 4 0.2 0.4 0.6 0.8 1 λ

Posterior π(θ, λy|y) ∝ L(y|η(x, θ), λy) × π(θ) × π(λy) ∝ λ

n 2

y exp

⎧ ⎪ ⎨ ⎪ ⎩−

1 2 · .252λy

n

  • i=1(yi − η(xi, θ))2

⎫ ⎪ ⎬ ⎪ ⎭ × I[0 ≤ θ ≤ 1] ×

λay−1

y

exp{−byλy}

slide-16
SLIDE 16

Use single site Metropolis to sample from π(θ, λy|y)

π(θ, λy|y) ∝ L(y|η(x, θ), λy) × π(θ) × π(λy) ∝ λ

n 2

y exp

⎧ ⎪ ⎨ ⎪ ⎩−

1 2 · .252λy

n

  • i=1(yi − η(xi, θ))2

⎫ ⎪ ⎬ ⎪ ⎭ × I[0 ≤ θ ≤ 1] ×

λay−1

y

exp{−byλy} Initialize (θ, λy)0

  • 1. Given current realization (θ, λy)t, generate θ∗ from a symmetric kernel

q((θt, λt

y) → ((θ∗, λt y))

  • 2. Compute acceptance probability α = min

⎧ ⎨ ⎩1,

π(θ∗,λt

y|y)

π(θt,λt

y|y)

⎫ ⎬ ⎭

  • 3. Set θt+1 = θ∗ with probability α, otherwise θt+1 = θt
  • 4. same thing for λy using q((θt+1, λt

y) → ((θt+1, λ∗ y)) & α = min

⎧ ⎨ ⎩1,

π(θt+1,λ∗

y|y)

π(θt+1,λt

y|y)

⎫ ⎬ ⎭

  • 5. Iterate steps 1 – 4

Such an approach may require many evaluations or η(xi, θ)!

slide-17
SLIDE 17

The resulting MCMC draws from π(θ, λy|y)

0.5 1 1.5 2 2.5 3 3.5 0.3 0.4 0.5 0.6 0.7 0.8

λy θ

slide-18
SLIDE 18

The resulting MCMC draws from π(θ, λy|y)

500 1000 1500 2000 2500 3000 3500 4000 0.5 1 iteration θ 0.2 0.4 0.6 0.8 1 500 1000 1500 θ 500 1000 1500 2000 2500 3000 3500 4000 2 4 6 iteration λ 1 2 3 4 5 500 1000 1500 λ 1 2 3 4 5 0.2 0.4 0.6 0.8 1 λ θ

slide-19
SLIDE 19

Posterior for η(x, θ)

0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 prior uncertainty 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 posterior uncertainty

The sample η(x, θ1), . . . , η(x, θM) are draws from π(η(x, θ)|y) Empirical draws give prediction uncertainty

slide-20
SLIDE 20

Using Importance Sampling (IS) to construct π(θ|y)

Importance sampling:

  • draw θ1, . . . , θT ∼ π(θ)
  • compute IS weights wt = L(y|θt), t = 1, . . . , T.
  • estimate π(θ|y) by the empirical (pdf)

value θ1 · · · θT prob w1/w+ · · · wT/w+ Straightforward to estimated predictive pdf for η(x′, θ)|y value η(x′, θ1) · · · η(x′, θT) prob w1/w+ · · · wT/w+

0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 prior uncertainty 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 posterior uncertainty

slide-21
SLIDE 21

An Inverse Problem in Hydrology

Time Well.NW 10 20 30 40 50 60 0.0 0.10 0.20 0.30 Time Well.N 10 20 30 40 50 60 0.0 0.10 0.20 0.30 Time Well.NE 10 20 30 40 50 60 0.0 0.10 0.20 0.30 Time Well.W 10 20 30 40 50 60 0.0 0.10 0.20 0.30 10 20 30 40 50 60 10 20 30 40 50 60 Time Well.E 10 20 30 40 50 60 0.0 0.10 0.20 0.30 Time Well.SW 10 20 30 40 50 60 0.0 0.10 0.20 0.30 Time Well.S 10 20 30 40 50 60 0.0 0.10 0.20 0.30 Time Well.SE 10 20 30 40 50 60 0.0 0.10 0.20 0.30

L(y|η(z)) ∝ |Σ|−1

2 exp

⎧ ⎪ ⎨ ⎪ ⎩−1

2(y − η(z))TΣ−1(y − η(z))

⎫ ⎪ ⎬ ⎪ ⎭

π(z|λz) ∝ λ

m 2

z exp

⎧ ⎪ ⎨ ⎪ ⎩−1

2zTWzz

⎫ ⎪ ⎬ ⎪ ⎭

π(λz) ∝ λaz−1

z

exp {bzλz} π(z, λz|y) ∝ L(y|η(z)) × π(z|λz) × π(λz)

slide-22
SLIDE 22

Posterior realizations of z with a MRF prior

True Permeability Field 10 20 30 40 50 60 10 20 30 40 50 60 MCMC realization 10 20 30 40 50 60 10 20 30 40 50 60 MCMC realization 10 20 30 40 50 60 10 20 30 40 50 60 MCMC realization 10 20 30 40 50 60 10 20 30 40 50 60 MCMC realization 10 20 30 40 50 60 10 20 30 40 50 60 Posterior Mean Permeabilitiy Field 10 20 30 40 50 60 10 20 30 40 50 60

slide-23
SLIDE 23

Bayesian inference - iid N(µ, 1) example

Observed data y are a noisy versions of µ y(si) = µ + i with i

iid

∼ N(0, 1), k = 1, . . . , n

2 4 6 8 −5 5 i yi

sampling model prior for µ L(y|µ) ∝

n

i=1 exp{−1 2(yi − µ)2}

π(µ) ∝ N(0, 1/λµ), λµ small posterior density for µ π(µ|y) ∝ L(y|µ) × π(µ) ∝

n

  • i=1 exp{−1

2(yi − µ)2} × exp{−1 2λµµ2} ∝ exp

⎧ ⎪ ⎨ ⎪ ⎩−1

2

  • n(µ − ¯

yn)2 + λµ(µ − 0)2

⎪ ⎬ ⎪ ⎭ × f(y)

⇒ µ|y ∼ N

⎛ ⎜ ⎜ ⎝¯

ynn + 0 · λµ n + λµ , 1 n + λµ

⎞ ⎟ ⎟ ⎠

λµ→0

→ N

⎛ ⎜ ⎝¯

yn, 1 n

⎞ ⎟ ⎠

slide-24
SLIDE 24

Bayesian inference - iid N(µ, λ−1

y ) example Observed data y are a noisy versions of µ y(si) = µ+i with i

iid

∼ N(0, λ−1

y ), k = 1, . . . , n

2 4 6 8 −5 5 i yi

sampling model prior for µ, λy L(y|µ) ∝

n

i=1 λ

1 2

y exp{−1 2λy(yi − µ)2}

π(µ, λy) = π(µ) × π(λy) π(µ) ∝ 1, π(λy) ∝ λay−1

y

e−byλy posterior density for (µ, λy) π(µ, λy|y) ∝ L(y|µ) × π(µ) × π(λy) ∝ λ

n 2

y exp{−1

2λy

n

  • i=1(yi − µ)2} × λay−1

y

exp{−byλy} π(µ, λ|y) is not so easy recognize. Can explore π(µ, λ|y) numerically or via Monte Carlo.

slide-25
SLIDE 25

Full conditional distributions for π(µ, λ|y)

π(µ, λy|y) ∝ λ

n 2

y exp{−1

2λy

n

  • i=1(yi − µ)2} × λay−1

y

exp{−byλy} Though π(µ, λ|y) is not of a simple form, its conditional distributions are: π(µ|λy, y) ∝ exp{−1 2λy

n

  • i=1(yi − µ)2}

⇒ µ|λy, y ∼ N

⎛ ⎜ ⎜ ⎝¯

yn, 1 nλy

⎞ ⎟ ⎟ ⎠

π(λy|µ, y) ∝ λay+n

2−1

y

exp

⎧ ⎪ ⎨ ⎪ ⎩by + 1

2

n

  • i=1(yi − µ)2

⎫ ⎪ ⎬ ⎪ ⎭

⇒ λy|µ, y ∼ Γ

⎛ ⎜ ⎝ay + n

2, by + 1 2

n

  • i=1(yi − µ)2

⎞ ⎟ ⎠ .

slide-26
SLIDE 26

Markov Chain Monte Carlo – Gibbs sampling

Given full conditionals for π(µ, λy|y), one can use Markov chain Monte Carlo (MCMC) to obtain draws from the posterior The Gibbs sampler is a MCMC scheme which iteratively replaces each parame- ter,in turn, by a draw from its full conditional: initialize parameters at (µ, λy)0 for t = 1, . . . , niter { set µ = a draw from N

⎛ ⎜ ⎜ ⎝¯

yn, 1 nλy

⎞ ⎟ ⎟ ⎠

set λy = a draw from Γ

⎛ ⎜ ⎝a + n

2, b + 1 2

n

  • i=1(yi − µ)2

⎞ ⎟ ⎠

} (Be sure to use newly updated µ when updating λy) Draws (µ, λy)1, . . . , (µ, λy)niter are a dependent sample from π(µ, λy|y). In practice, initial portion of the sample is discarded to remove effect of initial- ization values (µ0, λ0

y).

slide-27
SLIDE 27

Gibbs sampling for π(µ, λy|y)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −1.5 −1 −0.5 0.5 1 1.5 2 λy µ

slide-28
SLIDE 28

posterior summary for π(µ, λy|y)

2 4 6 8 −5 5 i yi 1 2 3 4 5 −1 −0.5 0.5 1 1.5 2 λy µ 200 400 600 800 1000 −2 2 4 6 8 iteration µ λy 2 4 6 8 −5 5 i µ

slide-29
SLIDE 29

Gibbs sampler: intuition

Gibbs sampler for a bivariate normal density π(z) = π(z1, z2) ∝

  • 1 ρ

ρ 1

  • −1

2

exp

⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩−1

2 ( z1 z2 )

⎛ ⎜ ⎝ 1

ρ ρ 1

⎞ ⎟ ⎠

−1 ⎛

⎜ ⎝ z1

z2

⎞ ⎟ ⎠ ⎫ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎭

Full conditionals of π(z): z1|z2 ∼ N(ρz2, 1 − ρ2) z2|z1 ∼ N(ρz1, 1 − ρ2)

  • initialize chain with

z0 ∼ N

⎛ ⎜ ⎝ ⎛ ⎜ ⎝ 0 ⎞ ⎟ ⎠ , ⎛ ⎜ ⎝ 1

ρ ρ 1

⎞ ⎟ ⎠ ⎞ ⎟ ⎠

  • draw z1

1 ∼ N(ρz0 2, 1 − ρ2)

now (z1

1, z0 2)T ∼ π(z)

✉ ✻ ✉ ✲ ✉

z0 = (z0

1, z0 2)

(z1

1, z0 2)

(z1

1, z1 2) = z1

✲ ✻

z1 z2

slide-30
SLIDE 30

Gibbs sampler: intuition

Gibbs sampler gives z0, z2, . . . , zT which can be treated as dependent draws from π(z). If z0 is not a draw from π(z), then the initial realizations will not have the correct distribution. In practice, the first 100?, 1000? realizations are discarded. The draws can be used to make inference about π(z):

  • Posterior mean of z is estimated by:

⎛ ⎜ ⎝ ˆ

µ1 ˆ µ2

⎞ ⎟ ⎠ = 1

T

T

  • k=1

⎛ ⎜ ⎝ zk

1

zk

2

⎞ ⎟ ⎠

  • Posterior probabilities:
  • P(z1 > 1) = 1

T

T

  • k=1 I[zk

1 > 1]

  • P(z1 > z2) = 1

T

T

  • k=1 I[zk

1 > zk 2]

  • 90% interval: (z[5%]

1

, z[95%]

1

).

✲ ✻

z1 z2

slide-31
SLIDE 31

Sampling of π(µ, λ|y) via Metropolis

Initialize parameters at some setting (µ, λy)0. For t = 1, . . . , T { update µ|λy, y {

  • generate proposal µ∗ ∼ U[µ − rµ, µ + rµ].
  • compute acceptance probability

α = min

⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩1, π(µ∗, λy|y)

π(µ, λy|y)

⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭

  • update µ to new value:

µnew =

⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩

µ∗ with probability α µ with probability 1 − α } update λy|µ, y analagously } Here we ran for T = 1000 scans, giving realizations (µ, λy)1, . . . , (µ, λy)T from the posterior. Discarded the first 100 for burn in. Note: proposal width rµ tuned so that µ∗ is accepted about half the time; proposal width rλy tuned so that λ∗

y is accepted about half the time.

slide-32
SLIDE 32

Metropolis sampling for π(µ, λy|y)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −1.5 −1 −0.5 0.5 1 1.5 2 λy µ

slide-33
SLIDE 33

posterior summary for π(µ, λy|y)

2 4 6 8 −5 5 i yi 1 2 3 4 5 −1.5 −1 −0.5 0.5 1 1.5 2 λy µ 200 400 600 800 1000 −1 1 2 3 4 5 6 iteration µ λy 2 4 6 8 −5 5 i µ

slide-34
SLIDE 34

Sampling from non-standard multivariate distributions

Nick Metropolis – Computing pioneer at Los Alamos National Laboratory − inventor of the Monte Carlo method − inventor of Markov chain Monte Carlo: Equation of State Calculations by Fast Computing Machines (1953) by N. Metropolis, A. Rosenbluth,

  • M. Rosenbluth, A. Teller and E. Teller, Journal of

Chemical Physics. Originally implemented on the MANIAC1 computer at LANL Algorithm constructs a Markov chain whose realiza- tions are draws from the target (posterior) distribu- tion. Constructs steps that maintain detailed balance.

slide-35
SLIDE 35

Gibbs Sampling and Metropolis for a bivariate normal density

π(z1, z2) ∝

  • 1 ρ

ρ 1

  • −1

2

exp

⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩−1

2 ( z1 z2 )

⎛ ⎜ ⎝ 1

ρ ρ 1

⎞ ⎟ ⎠

−1 ⎛

⎜ ⎝ z1

z2

⎞ ⎟ ⎠ ⎫ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎭

sampling from the full conditionals z1|z2 ∼ N(ρz2, 1 − ρ2) z2|z1 ∼ N(ρz1, 1 − ρ2) also called heat bath Metropolis updating: generate z∗

1 ∼ U[z1 − r, z1 + r]

calculate α = min{1, π(z∗

1,z2)

π(z1,z2) = π(z∗

1|z2)

π(z1|z2)}

set znew

1

=

⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩

z∗

1 with probability α

z1 with probability 1 − α

slide-36
SLIDE 36

Kernel basis representation for spatial processes z(s)

Define m basis functions k1(s), . . . , km(s).

−2 2 4 6 8 10 12 0.0 0.4 0.8 s basis

Here kj(s) is normal density cetered at spatial location ωj: kj(s) = 1 √ 2π exp{−1 2(s − ωj)2} set z(s) =

m

  • j=1 kj(s)xj where x ∼ N(0, Im).

Can represent z = (z(s1), . . . , z(sn))T as z = Kx where Kij = kj(si)

slide-37
SLIDE 37

x and k(s) determine spatial processes z(s)

kj(s)xj z(s)

−2 2 4 6 8 10 12 −0.5 0.5 1.5 s basis −2 2 4 6 8 10 12 −0.5 0.5 1.5 s z(s)

Continuous representation: z(s) =

m

  • j=1 kj(s)xj where x ∼ N(0, Im).

Discrete representation: For z = (z(s1), . . . , z(sn))T, z = Kx where Kij = kj(si)

slide-38
SLIDE 38

Formulation for the 1-d example

Data y = (y(s1), . . . , y(sn))T observed at locations s1, . . . , sn. Once knot locations ωj, j = 1, . . . , m and kernel choice k(s) are specified, the remaining model formulation is trivial: Likelihood: L(y|x, λy) ∝ λ

n 2

y exp

⎧ ⎪ ⎨ ⎪ ⎩−1

2λy(y − Kx)T(y − Kx)

⎫ ⎪ ⎬ ⎪ ⎭

where Kij = k(ωj − si). Priors: π(x|λx) ∝ λ

m 2

x exp

⎧ ⎪ ⎨ ⎪ ⎩−1

2λxxTx

⎫ ⎪ ⎬ ⎪ ⎭

π(λx) ∝ λax−1

x

exp{−bxλx} π(λy) ∝ λay−1

y

exp{−byλy} Posterior: π(x, λx, λy|y) ∝ λay+n

2−1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • ×

λax+m

2 −1

x

exp

  • −λx[bx + .5xTx]
slide-39
SLIDE 39

Posterior and full conditionals

Posterior: π(x, λx, λy|y) ∝ λay+n

2−1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • ×

λax+m

2 −1

x

exp

  • −λx[bx + .5xTx]
  • Full conditionals:

π(x| · · ·) ∝ exp{−1 2[λyxTKTKx − 2λyxTKTy + λxxTx]} π(λx| · · ·) ∝ λax+m

2 −1

x

exp

  • −λx[bx + .5xTx]
  • π(λy| · · ·) ∝ λay+n

2−1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • Gibbs sampler implementation

x| · · · ∼ N((λyKTK + λxIm)−1λyKTy, (λyKTK + λxIm)−1) λx| · · · ∼ Γ(ax + m 2 , bx + .5xTx) λy| · · · ∼ Γ(ay + n 2, by + .5(y − Kx)T(y − Kx))

slide-40
SLIDE 40

1-d example

m = 6 knots evenly spaced between −.3 and 1.2. n = 5 data points at s = .05, .25, .52, .65, .91. k(s) is N(0, sd = .3) ay = 10, by = 10 · (.252) ⇒ strong prior at λy = 1/.252; ax = 1, bx = .001

0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 s y(s) 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 s kj(s) basis functions 200 400 600 800 1000 −4 −2 2 4 6 8 iteration xj 200 400 600 800 1000 10 20 30 40 50 iteration λx, λy

slide-41
SLIDE 41

1-d example

From posterior realizations of knot weights x, one can construct posterior real- izations of the smooth fitted function z(s) =

m

j=1 kj(s)xj.

Note strong prior on λy required since n is small.

0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 s z(s), y(s) posterior realizations of z(s) 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 s z(s), y(s) mean & pointwise 90% bounds

slide-42
SLIDE 42

References

  • D. Gammerman and H. F. Lopes (2006) Markov Chain Monte Carlo, Chap-

man & Hall.

  • A. Gelman, J. B. Carlin, H. S. Stern and D. B. Rubin (1995) Bayesian Data

Analysis, Chapman & Hall.

  • M. Schervish (1995) Theory of Statistics, Springer-Verlag.
  • Besag, J., P. J. Green, D. Higdon, and K. Mengersen (1995), Bayesian com-

putation and stochastic systems (with Discussion), Statistical Science, 10, 3-66.

  • D. Higdon (2002) Space and space-time modeling using process convolutions,

in Quantitative Methods for Current Environmental Issues (C. Anderson and V. Barnett and P. C. Chatwin and A. H. El-Shaarawi, eds),37–56.

  • D. Higdon, H. Lee and C. Holloman (2003) Markov chain Monte Carlo

approaches for inference in computationally intensive inverse problems, in Bayesian Statistics 7, Proceedings of the Seventh Valencia Interna- tional Meeting (Bernardo, Bayarri, Berger, Dawid, Heckerman, Smith and West, eds).

slide-43
SLIDE 43

GAUSSIAN PROCESSES 1

slide-44
SLIDE 44

Gaussian process models for spatial phenomena

1 2 3 4 5 6 7 −2 −1 1 2 s z(s)

An example of z(s) of a Gaussian process model on s1, . . . , sn z =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

z(s1) . . . z(sn)

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ∼ N ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

. . .

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ , ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

Σ

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ , with Σij = exp{−||si − sj||2},

where ||si − sj|| denotes the distance between locations si and sj. z has density π(z) = (2π)−n

2|Σ|−1 2 exp{−1

2zTΣ−1z}.

slide-45
SLIDE 45

Realizations from π(z) = (2π)−n

2|Σ|−1 2 exp{−1

2zTΣ−1z}

1 2 3 4 5 6 7 −2 −1 1 2 z(s) 1 2 3 4 5 6 7 −2 −1 1 2 z(s) 1 2 3 4 5 6 7 −2 −1 1 2 s z(s)

model for z(s) can be extended to continuous s

slide-46
SLIDE 46

Generating multivariate normal realizations

Independent normals are standard for any computer package u ∼ N(0, In) Well known property of normals: if u ∼ N(µ, Σ), then z = Ku ∼ N(Kµ, KΣKT) Use this to construct correlated realizations from iid ones. Want z ∼ N(0, Σ)

  • 1. compute square root matrix L such that LLT = Σ;
  • 2. generate u ∼ N(0, In);
  • 3. Set z = Lu ∼ N(0, LInLT = Σ)
  • Any square root matrix L will do here.
  • Columns of L are basis functions for representing realizations z.
  • L need not be square – see over or under specified bases.
slide-47
SLIDE 47

Standard Cholesky decomposition

z = N(0, Σ), Σ = LLT, z = Lu where u ∼ N(0, In), L lower triangular Σij = exp{−||si − sj||2}, s1, . . . , s20 equally spaced between 0 and 10 :

20 15 10 5 5 10 15 20 rows columns 2 4 6 8 10 −1.0 −0.5 0.0 0.5 1.0 s basis

slide-48
SLIDE 48

Cholesky decomposition with pivoting

z = N(0, Σ), Σ = LLT, z = Lu where u ∼ N(0, In), L permuted lower triangular Σij = exp{−||si − sj||2}, s1, . . . , s20 equally spaced between 0 and 10 :

20 15 10 5 5 10 15 20 rows columns 2 4 6 8 10 −1.0 −0.5 0.0 0.5 1.0 s basis

slide-49
SLIDE 49

Singular value decomposition

z = N(0, Σ), Σ = UΛU T = LLT, z = Lu where u ∼ N(0, In) Σij = exp{−||si − sj||2}, s1, . . . , s20 equally spaced between 0 and 10 :

20 15 10 5 5 10 15 20 rows columns 2 4 6 8 10 −1.0 −0.5 0.0 0.5 1.0 s basis

slide-50
SLIDE 50

Conditioning on some observations of z(s)

1 2 3 4 5 6 7 −2 −1 1 2 z(s)

We observe z(s2) and z(s5) – what do we now know about {z(s1), z(s3), z(s4), z(s6), z(s7), z(s8)}?

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

z(s2) z(s5) z(s1) z(s3) z(s4) z(s6) z(s7) z(s8)

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

∼ N

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

,

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

1 .0001 .0001 1

  • .3679

· · · · · · .0001 .3679 . . . . . . .0001

  • 1

· · · . . . ... . . . · · · 1

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

slide-51
SLIDE 51

Conditioning on some observations of z(s)

⎛ ⎜ ⎝ z1

z2

⎞ ⎟ ⎠ ∼ N ⎛ ⎜ ⎝ ⎛ ⎜ ⎝ 0 ⎞ ⎟ ⎠ , ⎛ ⎜ ⎝ Σ11

Σ12 Σ21 Σ22

⎞ ⎟ ⎠ ⎞ ⎟ ⎠ , z2|z1 ∼ N(Σ21Σ−1

11 z1, Σ22 − Σ21Σ−1 11 Σ12)

1 2 3 4 5 6 7 −2 −1 1 2 z(s) conditional mean 1 2 3 4 5 6 7 −2 −1 1 2 z(s) contitional realizations s

slide-52
SLIDE 52

More examples with various covariance functions and spatial scales

Σij = exp{−(||si − sj||/scale)2} Σij = exp{−(||si − sj||/scale)1}

x z 5 10 15

  • 2
  • 1

1 2

  • Gaussian C(r), scale = 2

x z 5 10 15

  • 2
  • 1

1 2

  • Gaussian C(r), scale = 3

x z 5 10 15

  • 2
  • 1

1 2

  • Gaussian C(r), scale = 5

x z 5 10 15

  • 2
  • 1

1 2

  • Exponential C(r), scale = 1

x z 5 10 15

  • 2
  • 1

1 2

  • Exponential C(r), scale = 10

x z 5 10 15

  • 2
  • 1

1 2

  • Exponential C(r), scale = 20

x z 5 10 15

  • 2
  • 1

1 2

  • Brownian motion C(r), p = 1.5 scale = 1

x z 5 10 15

  • 2
  • 1

1 2

  • Brownian motion C(r), p = 1.5 scale = 3

x z 5 10 15

  • 2
  • 1

1 2

  • Brownian motion C(r), p = 1.5 scale = 5
slide-53
SLIDE 53

More examples with various covariance functions and spatial scales

Σij = exp{−(||si − sj||/scale)2} Σij = exp{−(||si − sj||/scale)1}

x z

  • 5

5 10 15 20

  • 3
  • 2
  • 1

1 2 3

  • Gaussian C(r), scale = 2

x z

  • 5

5 10 15 20

  • 3
  • 2
  • 1

1 2 3

  • Gaussian C(r), scale = 3

x z

  • 5

5 10 15 20

  • 3
  • 2
  • 1

1 2 3

  • Gaussian C(r), scale = 5

x z

  • 5

5 10 15 20

  • 3
  • 2
  • 1

1 2 3

  • Exponential C(r), scale = 1

x z

  • 5

5 10 15 20

  • 3
  • 2
  • 1

1 2 3

  • Exponential C(r), scale = 10

x z

  • 5

5 10 15 20

  • 3
  • 2
  • 1

1 2 3

  • Exponential C(r), scale = 20

x z

  • 5

5 10 15 20

  • 3
  • 2
  • 1

1 2 3

  • Brownian motion C(r), p = 1.5 scale = 1

x z

  • 5

5 10 15 20

  • 3
  • 2
  • 1

1 2 3

  • Brownian motion C(r), p = 1.5 scale = 3

x z

  • 5

5 10 15 20

  • 3
  • 2
  • 1

1 2 3

  • Brownian motion C(r), p = 1.5 scale = 5
slide-54
SLIDE 54

A 2-d example, conditioning on the edge

Σij = exp{−(||si − sj||/5)2}

5 10 15 20 X 5 10 15 20 Y

  • 2 -1 0 1 2 3 4

Z a realization 5 1 1 5 2 X 5 10 15 20 Y

  • 2 -1 0 1 2 3 4

Z mean conditional on Y=1 points 5 10 15 20 X 5 10 15 20 Y

  • 2 -1 0 1 2 3 4

Z realization conditional on Y=1 points 5 10 15 20 X 5 10 15 20 Y

  • 2 -1 0 1 2 3 4

Z realization conditional on Y=1 points

slide-55
SLIDE 55

Soft Conditioning (Bayes Rule)

1 2 3 4 5 6 7 −2 −1 1 2 z(s) s

Observed data y are a noisy version of z y(si) = z(si) + (si) with (sk) iid ∼ N(0, σ2

y), k = 1, . . . , n

Data spatial process prior for z(s) y Σy = σ2

yIn

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

y1 . . . yn

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

σ2

y

... σ2

y

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

µz Σz

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

. . .

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

Σz

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

L(y|z) ∝ |Σy|−1

2 exp{−1

2(y − z)TΣ−1 y (y − z)}

π(z) ∝ |Σz|−1

2 exp{−1

2zTΣ−1 z z}

slide-56
SLIDE 56

Soft Conditioning (Bayes Rule) ... continued

sampling model spatial prior L(y|z) ∝ |Σy|−1

2 exp{−1

2(y − z)TΣ−1 y (y − z)}

π(z) ∝ |Σz|−1

2 exp{−1

2zTΣ−1 z z}

⇒ π(z|y) ∝ L(y|z) × π(z) ⇒ π(z|y) ∝ exp{−1

2[zT(Σ−1 y + Σ−1 z )z + zTΣ−1 y y + f(y)]}

⇒ z|y ∼ N(V Σ−1

y y, V ), where V = (Σ−1 y + Σ−1 z )−1

1 2 3 4 5 6 7 −2 −1 1 2 z(s) conditional realizations

π(z|y) describes the updated uncertainty about z given the observations.

slide-57
SLIDE 57

Updated predictions for unobserved z(s)’s

1 2 3 4 5 6 7 −2 −1 1 2 z(s) conditional realizations

data locations yd = (y(s1), . . . , y(sn))T zd = (z(s1), . . . , z(sn))T prediction locations y∗ = (y(s∗

1), . . . , y(s∗ m))T z∗ = (z(s∗ 1), . . . , z(s∗ m))T

define y = (yd; y∗) z = (zd; z∗) Data spatial process prior for z(s) y =

⎛ ⎜ ⎝ yd

y∗

⎞ ⎟ ⎠ = ⎛ ⎜ ⎝ yd

0m

⎞ ⎟ ⎠ Σy = ⎛ ⎜ ⎝ σ2

yIn

∞Im

⎞ ⎟ ⎠

µz =

⎛ ⎜ ⎝ 0n

0m

⎞ ⎟ ⎠ Σz = ⎛ ⎜ ⎝ cov rule applied

to (s, s∗)

⎞ ⎟ ⎠

define Σ−

y =

⎛ ⎜ ⎜ ⎝

1 σ2

yIn

⎞ ⎟ ⎟ ⎠

Now the posterior distribution for z = (zd, z∗) is z|y ∼ N(V Σ−

y y, V ), where V = (Σ− y + Σ−1 z )−1

slide-58
SLIDE 58

Updated predictions for unobserved z(s)’s,

Alternative: use the conditional normal rules:

1 2 3 4 5 6 7 −2 −1 1 2 z(s) conditional realizations

data locations y = (y(s1), . . . , y(sn))T = (z(s1) + (s1), . . . , z(sn) + (sn))T prediction locations z∗ = (z(s∗

1), . . . , z(s∗ m))T

Jointly

⎛ ⎜ ⎝ y

z∗

⎞ ⎟ ⎠ ∼ N ⎛ ⎜ ⎝ ⎛ ⎜ ⎝ 0 ⎞ ⎟ ⎠ , ⎛ ⎜ ⎝ σ2

yIn

⎞ ⎟ ⎠ + Σz ⎞ ⎟ ⎠

where Σz =

⎛ ⎜ ⎝ Σz(s, s)

Σz(s, s∗) Σz(s∗, s) Σz(s∗, s∗)

⎞ ⎟ ⎠ = ⎛ ⎜ ⎝ cov rule applied

to (s, s∗)

⎞ ⎟ ⎠

(n+m)×(n+m)

Therefore z∗|y ∼ N(µ∗, Σ∗) where µ∗ = Σz(s∗, s)[σ2

yIn + Σz(s, s)]−1y

Σ∗ = Σz(s∗, s∗) − Σz(s∗, s)[σ2

zIn + Σz(s, s)]−1Σz(s, s∗)

slide-59
SLIDE 59

Example: Dioxin concentration at Piazza Road Superfund Site

x y 20 40 60 80 100 50 100 150 200

  • 1

1 2 3 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 40 60 80 100 50 100 150 200 20 40 60 80 100 50 100 150 200 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.6 0.6 0.6 0.6 0.8 0.8

data Posterior mean of z∗ pointwise posterior sd

slide-60
SLIDE 60

Bonus topic: constructing simultaneous intervals

  • generate a large sample of m-vectors z∗ from π(z∗|y).
  • compute the m-vector ˆ

z∗ that is the mean of the generated z∗s

  • compute the m-vector ˆ

σ that is the pointwise sd of the generated z∗s

  • find the constant a such that 80% of the generated z∗s are completely

contained within ˆ z∗ ± aˆ σ

1 2 3 4 5 6 7 −2 −1 1 2 z(s) posterior realizations and mean 1 2 3 4 5 6 7 −2 −1 1 2 +/− sd[z(s)] pointwise estimated sd 1 2 3 4 5 6 7 −2 −1 1 2 z(s) simultaneous 80% credible interval

slide-61
SLIDE 61

References

  • Ripley, B. (1989) Spatial Statistics, Wiley.
  • Cressie, N. (1992) Statistics for Spatial Data, Wiley.
  • Stein, M. (1999) Interpolation of Spatial Data: Some Theory for Krig-

ing, Springer.

slide-62
SLIDE 62

GAUSSIAN PROCESSES 2

slide-63
SLIDE 63

Gaussian process models revisited

Application: finding in a rod of material

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −2 −1.5 −1 −0.5 0.5 1 1.5 scaled frequency scaled acceleration

  • for various driving frequencies, acceleration
  • f rod recorded
  • the true frequency-acceleration curve is

smooth.

  • we have noisy measurements of accelera-

tion.

  • estimate resonance frequency.
  • use GP model for frequency-accel curve.
  • smoothness of GP model important here.
slide-64
SLIDE 64

Gaussian process models formulation

Take response y to be acceleration and spatial value s to be frequency.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −2 −1.5 −1 −0.5 0.5 1 1.5 scaled frequency scaled acceleration

data: y = (y1, . . . , yn)T at spatial locations s1, . . . , sn. z(s) is a mean 0 Gaussian process with covariance function Cov(z(s), z(s′)) = 1 λz exp{−β(s − s′)2} β controls strength of dependence. Take z = (z(s1), . . . , z(sn))T to be z(s) restricted to the data observations. Model the data as: y = z + , where ∼ N(0, 1 λy In) We want to find the posterior distribution for the frequency s⋆ where z(s) is maximal.

slide-65
SLIDE 65

Reparameterizing the spatial dependence parameter β

It is convenient to reparameterize β as: ρ = exp{−β(1/2)2} ⇔ β = −4 log(ρ) So ρ is the correlation between two points on z(s) separated by 1

2.

Hence z has spatial prior z|ρ, λz ∼ N(0, 1 λz R(ρ; s)) where R(ρ; s) is the correlation matrix with ij elements Rij = ρ4(si−sj)2 Prior specification for z(s) is completed by specfying priors for λz and ρ. π(λz) ∝ λaz−1

z

exp{−bzλz} if y is standardized, encourage λz to be close to 1 – eg.az = bz = 5. π(ρ) ∝ (1 − ρ)−.5 encourages ρ to be large if possible

slide-66
SLIDE 66

Bayesian model formulation

Likelihood L(y|z, λy) ∝ λ

n 2

y exp{−1

2λy(y − z)T(y − z)}

Priors π(z|λz, ρ) ∝ λ

n 2

z |R(ρ; s)|−1

2 exp{−1 2λzzTR(ρ; s)−1z}

π(λy) ∝ λay−1

y

e−byλy, uninformative here – ay = 1, by = .005 π(λz) ∝ λaz−1

z

e−bzλz, fairly informative – az = 5, bz = 5 π(ρ) ∝ (1 − ρ)−.5 Marginal likelihood (integrating out z) L(y|λ, λz, ρ) ∝ |Λ|

1 2 exp{−1 2yTΛy}

where Λ−1 = 1

λyIn + 1 λzR(ρ; s)

Posterior π(λy, λz, ρ|y) ∝ |Λ|

1 2 exp{−1 2yTΛy} × λay−1

y

e−byλy × λaz−1

z

e−bzλz × (1 − ρ)−.5

slide-67
SLIDE 67

Posterior Simulation

Use Metropolis to simulate from the posterior π(λy, λz, ρ|y) ∝ |Λ|

1 2 exp{−1 2yTΛy} × λay−1

y

e−byλy × λaz−1

z

e−bzλz × (1 − ρ)−.5 giving (after burn-in) (λy, λz, ρ)1, . . . , (λy, λz, ρ)T For any given realization (λy, λz, ρ)t, one can generate z∗ = (z(s∗

1), . . . , z(s∗ m))T

for any set of prediction locations s∗

1, . . . , s∗ m.

From previous GP stuff, we know

⎛ ⎜ ⎝ z

z∗

⎞ ⎟ ⎠ | · · · ∼ N ⎛ ⎜ ⎝V Σ−

y

⎛ ⎜ ⎝ y

0m

⎞ ⎟ ⎠ , V ⎞ ⎟ ⎠

where Σ−

y =

⎛ ⎜ ⎝ λIn ⎞ ⎟ ⎠ and V −1 = Σ−

y + λzR(ρ, (s, s∗))−1

Hence, one can generate corresponding z∗’s for each posterior realization at a fine grid around the apparent resonance frequency z⋆.

slide-68
SLIDE 68

Or use conditional normal formula with

⎛ ⎜ ⎝ y

z∗

⎞ ⎟ ⎠ | · · · ∼ N ⎛ ⎜ ⎝ ⎛ ⎜ ⎝ 0n

0m

⎞ ⎟ ⎠ , ⎛ ⎜ ⎝ λ−1

In

⎞ ⎟ ⎠ + λ−1

z R(ρ, (s, s∗))

⎞ ⎟ ⎠

where R(ρ, (s, s∗)) =

⎛ ⎜ ⎝ R(ρ, (s, s))

R(ρ, (s, s∗)) R(ρ, (s∗, s)) R(ρ, (s∗, s∗))

⎞ ⎟ ⎠ = ⎛ ⎜ ⎝ cor rule applied

to (s, s∗)

⎞ ⎟ ⎠

(n+m)×(n+m)

Therefore z∗|y ∼ N(µ∗, Σ∗) where µ∗ = λ−1

z R(ρ, (s∗, s))[λ−1 In + λ−1 z R(ρ, (s, s))]−1y

Σ∗ = λ−1

z R(ρ, (s∗, s∗)) −

λ−1

z R(ρ, (s∗, s))[λ−1 In + λ−1 z R(ρ, (s, s))]−1λ−1 z R(ρ, (s, s∗))

slide-69
SLIDE 69

MCMC output for (λy, λz, ρ)

500 1000 1500 2000 2500 3000 0.5 1 iteration rho 500 1000 1500 2000 2500 3000 0.5 1 1.5 2 iteration lamz 500 1000 1500 2000 2500 3000 50 100 150 200 250 iteration lamy

slide-70
SLIDE 70

Posterior realizations for z(s) near z⋆

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 scaled frequency scaled acceleration

slide-71
SLIDE 71

Posterior for resonance frequency z⋆

0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 10 20 30 40 50 60 70 80 90 posterior distribution for scaled resonance frequency scaled resonance frequency

slide-72
SLIDE 72

Gaussian Processes for modeling complex computer simulators

data input settings (spatial locations) y =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

y1 . . . yn

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

S =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

s1 . . . sn

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

s11 s12 · · · s1p . . . . . . . . . . . . sn1 sn2 · · · snp

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

Model responses y as a (stochastic) function of s y(s) = z(s) + (s) Vector form – restricting to the n data points y = z +

slide-73
SLIDE 73

Model response as a Gaussian processes

y(s) = z(s) + Likelihood L(y|z, λ) ∝ λ

n 2

exp{−1

2λ(y − z)T(y − z)}

Priors π(z|λz, β) ∝ λ

n 2

z |R(β)|−1

2 exp{−1 2λzzTR(β)−1z}

π(λ) ∝ λa−1

  • e−bλ, perhaps quite informative

π(λz) ∝ λaz−1

z

e−bzλz, fairly informative if data have been standardized π(ρ) ∝

p

  • k=1(1 − ρk)−.5

Marginal likelihood (integrating out z) L(y|λ, λz, β) ∝ |Λ|

1 2 exp{−1 2yTΛy}

where Λ−1 = 1

λIn + 1 λzR(β)

slide-74
SLIDE 74

GASP Covariance model for z(s)

Cov(z(si), z(sj)) = 1 λz R(β) = 1 λz

p

  • k=1 exp{−βk(sik − sjk)α}
  • Typically α = 2 ⇒ z(s) is smooth.
  • Separable covariance – a product of componentwise covariances.
  • Can handle large number of covariates/inputs p.
  • Can allow for multiway interactions.
  • βk = 0 ⇒ input k is “inactive” ⇒ variable selection
  • reparameterize: ρk = exp{−βkdα

0} – typically d0 is a halfwidth.

slide-75
SLIDE 75

Posterior Distribution and MCMC

π(λ, λz, ρ|y) ∝ |Λλ,ρ|

1 2 exp{−1 2yTΛλ,ρy} × λa−1

  • e−bλ ×

λaz−1

z

e−bzλz ×

p

  • k=1(1 − ρk)−.5
  • MCMC implementation requires Metropolis updates.
  • Realizations of z(s)|λ, ρ, y can be obtained post-hoc:

− define z∗ = (z(s∗

1), . . . , z(s∗ m))T to be predictions at locations s∗ 1, . . . , s∗ m,

then

⎛ ⎜ ⎝ z

z∗

⎞ ⎟ ⎠ | · · · ∼ N ⎛ ⎜ ⎝V Σ−

y

⎛ ⎜ ⎝ y

0m

⎞ ⎟ ⎠ , V ⎞ ⎟ ⎠

where Σ−

y =

⎛ ⎜ ⎝ λIn ⎞ ⎟ ⎠ and V −1 = Σ−

y + λzR(ρ, (s, s∗))−1

slide-76
SLIDE 76

Example: Solar collector Code (Schonlau, Hamada and Welch, 1995)

  • n = 98 model runs, varying 6 independent variables.
  • Response is the increase in heat exchange effectiveness.
  • A latin hypercube (LHC) design was used with 2-d space filling.

wind.vel

0.010 0.020 0.030 100 200 300 2 4 6 8 10 0.020 0.035 0.050 0.010 0.020 0.030

slot.width Rey.num

50 70 90 100 300

admittance plate.thickness

0.01 0.03 0.05 0.07 0.020 0.030 0.040 0.050 2 4 6 8 10 50 60 70 80 90 100 0.01 0.03 0.05 0.07

Nusselt.num

slide-77
SLIDE 77

Example: Solar collector Code

  • Fit of GASP model and predictions of 10 holdout points
  • Two most active covariates are shown here.

0.5 1 0.5 1 −4 −2 2 x4 data x5 y 500 1000 1500 2000 5 10 15 20 iteration beta(1:p) 500 1000 1500 2000 1 2 iteration lamz 500 1000 1500 2000 2 4 x 10 iteration lame 0.5 1 0.5 1 −4 −2 2 x4 posterior mean x5 −3 −2.5 −2 −1.5 −1 −3 −2.5 −2 −1.5 −1

slide-78
SLIDE 78

Example: Solar collector Code

  • Visualizing a 6-d response surface is difficult
  • 1-d marginal effects shown here.

0.5 1 −3 −2 −1 1 2 x1 y x1 0.5 1 −3 −2 −1 1 2 x2 y x2 0.5 1 −3 −2 −1 1 2 x3 y x3 0.5 1 −3 −2 −1 1 2 x4 y x4 0.5 1 −3 −2 −1 1 2 x5 y x5 0.5 1 −3 −2 −1 1 2 x6 y x6

1−D Marginal Effects

slide-79
SLIDE 79

References

  • J. Sacks, W. J. Welch, T. J. Mitchell and H. P. Wynn (1989) Design and

analysis of comuter experiments Statistical Science, 4:409–435.

slide-80
SLIDE 80

COMPUTER MODEL CALIBRATION 1

slide-81
SLIDE 81

Inference combining a physics model with experimental data

2 4 6 8 10 1 2 3 4 5 6 7

drop time drop height (floor)

2 4 6 8 10 1 2 3 4 5 6 7

drop time drop height (floor)

2 4 6 8 10 1 2 3 4 5 6 7

drop time drop height (floor)

Data generated from model:

d2z dt2 = −1 − .3dz dt +

simulation model:

d2z dt2 = −1

statistical model: y(z) = η(z) + δ(z) + Improved physics model:

d2z dt2 = −1 − θdz dt +

statistical model: y(z) = η(z, θ) + δ(z) +

slide-82
SLIDE 82

Accounting for limited simulator runs

0.5 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 data & simulations

0.5 1 0.5 1 −2 2 x model runs θ η(x,θ)

  • Borrows from Kennedy and O’Hagan (2001).

x model or system inputs θ calibration parameters ζ(x) true physical system response given inputs x η(x, θ) simulator response at x and θ. simulator run at limited input settings η = (η(x∗

1, θ∗ 1), . . . , η(x∗ m, θ∗ m))T

treat η(·, ·) as a random function use GP prior for η(·, ·) y(x) experimental observation of the physical system e(x)

  • bservation error of the experimental data

y(x) = ζ(x) + e(x) y(x) = η(x, θ) + e(x)

slide-83
SLIDE 83

OA designs for simulator runs

Example: N = 16, 3 factors each at 4 levels OA(16, 43) design 2-d projections

.5 1 .5 1 .5 1 x2 x1 x3

x_1

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x_2

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x_3

OA design ensures importance measures R2 can be accurately estimated for low dimensions Can spread out design for building a response surface emulator of η(x)

slide-84
SLIDE 84

Gaussian Process models for combining field data and complex computer simulators

field data input settings (spatial locations) y =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

y(x1) . . . y(xn)

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

x11 x12 · · · x1px . . . . . . . . . . . . xn1 xn2 · · · xnpx

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

sim data input settings x; params θ∗ η =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

η(x∗

1, θ∗ 1)

. . . η(x∗

m, θ∗ m)

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

x∗

11

· · · x∗

1px

θ∗

11

· · · θ∗

1pθ

. . . . . . . . . . . . . . . . . . x∗

m1

· · · x∗

mpx

θ∗

m1

· · · θ∗

mpθ

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

Model sim response η(x, θ) as a Gaussian process y(x) = η(x, θ) + η(x, θ) ∼ GP (0, Cη(x, θ)) ∼ iidN(0, 1/λ) Cη(x, θ) depends on px + pθ-vector ρη and λη

slide-85
SLIDE 85

Vector form – restricting to n field obs and m simulation runs y = η(θ) + η ∼ Nm(0m, Cη(ρη, λη)) ⇒

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠ ∼ Nn+m ⎛ ⎜ ⎝ ⎛ ⎜ ⎝ 0n

0m

⎞ ⎟ ⎠ , Cyη = Cη + ⎛ ⎜ ⎝ 1/λIn

1/λsIm

⎞ ⎟ ⎠ ⎞ ⎟ ⎠

where Cη = 1/ληRη

⎛ ⎜ ⎝ ⎛ ⎜ ⎝ x

x∗

⎞ ⎟ ⎠ , ⎛ ⎜ ⎝ 1θ

θ∗

⎞ ⎟ ⎠ ; ρη ⎞ ⎟ ⎠

and the correlation matrix Rη is given by Rη((x, θ), (x′, θ′); ρη) =

px

  • k=1 ρ

4(xk−x′

k)2

ηk

×

  • k=1 ρ

4(θk−θ′

k)2

η(k+px)

λs is typically set to something large like 106 to stabalize matrix computations and allow for numerical fluctuation in η(x, θ). note: the covariance matrix Cη depends on θ through its “distance”-based correlation function Rη((x, θ), (x′, θ′); ρη). We use a 0 mean for η(x, θ); an alternative is to use a linear regression mean model.

slide-86
SLIDE 86

Likelihood L(y, η|λ, ρη, λη, λs, θ) ∝ |Cyη|−1

2 exp

⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩−1

2

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠

T

C−1

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠ ⎫ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎭

Priors π(λ) ∝ λa−1

  • e−bλ

perhaps well known from observation process π(ρηk) ∝

px+pθ

  • k=1 (1 − ρηk)−.5, where ρηk = e−.52βη

k correlation at dist = .5 ∼ β(1, .5).

π(λη) ∝ λaη−1

η

e−bηλη π(λs) ∝ λas−1

s

e−bsλs π(θ) ∝ I[θ ∈ C]

  • could fix ρη, λη from prior GASP run on model output.
  • Many prefer to reparameterize ρ as β = − log(ρ)/.52 in the likelihood term
slide-87
SLIDE 87

Posterior Density

π(λ, ρη, λη, λs, θ|y, η) ∝ |Cyη|−1

2 exp

⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩−1

2

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠

T

C−1

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠ ⎫ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎭ ×

px+pθ

  • k=1 (1 − ρηk)−.5 × λaη−1

η

e−bηλη × λas−1

s

e−bsλs × λa−1

  • e−bλ × I[θ ∈ C]

If ρη, λη, and λs are fixed from a previous analysis of the simulator data, then π(λ, θ|y, η, ρη, λη, λs) ∝ |Cyη|−1

2 exp

⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩−1

2

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠

T

C−1

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠ ⎫ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎭ ×

λa−1

  • e−bλ × I[θ ∈ C]
slide-88
SLIDE 88

Accounting for limited simulation runs

0.5 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 prior uncertainty 0.5 1 0.5 1 −2 2 θ* posterior realizations of η(x,θ*) x η(x,θ*) 0.5 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 posterior uncertainty

Again, standard Bayesian estimation gives: π(θ, η(·, ·), λ, ρη, λη|y(x)) ∝ L(y(x)|η(x, θ), λ) × π(θ) × π(η(·, ·)|λη, ρη) π(λ) × π(ρη) × π(λη)

  • Posterior means and quantiles shown.
  • Uncertainty in θ, η(·, ·), nuisance parameters are incorporated into the forecast.
  • Gaussian process models for η(·, ·).
slide-89
SLIDE 89

Predicting a new outcome: ζ = ζ(x′) = η(x′, θ)

Given a MCMC realization (θ, λ, ρη, λη), a realization for ζ(x′) can be produced using Bayes rule. Data GP prior for η(x, θ)(s) v =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

y η ζ

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ Σ−

v =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

λIn λsIm

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

µz =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

0n 0m

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ Cη = λ−1

η Rη

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

x x∗ x′

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ , ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

1θ θ∗ θ

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ; ρη ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

Now the posterior distribution for v = (y, η, ζ)T is v|y, η ∼ N(µv|yη = V Σ−

v v, V ), where V = (Σ− v + C−1 η )−1

Restricting to ζ we have ζ|y, η ∼ N(µv|yη

m+n+1, Vn+m+1,n+m+1)

Alternatively, one can apply the conditional normal formula to

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

y η ζ

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ∼ N ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ , ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

λ−1

In

λ−1

s Im

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ + Cη ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

so that ζ|y, η ∼ N

⎛ ⎜ ⎝Σ21Σ−1

11

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠ , Σ22 − Σ21Σ−1

11 Σ12

⎞ ⎟ ⎠

slide-90
SLIDE 90

Accounting for model discrepancy

0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 prior uncertainty

  • Borrows from Kennedy and O’Hagan (2001).

x model or system inputs θ calibration parameters ζ(x) true physical system response given inputs x η(x, θ) simulator response at x and θ. y(x) experimental observation of the physical system δ(x) discrepancy between ζ(x) and η(x, θ) may be decomposed into numerical error and bias e(x)

  • bservation error of the experimental data

y(x) = ζ(x) + e(x) y(x) = η(x, θ) + δ(x) + e(x) y(x) = η(x, θ) + δn(x) + δb(x) + e(x)

slide-91
SLIDE 91

Accounting for model discrepancy

0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 prior uncertainty 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 posterior model uncertainty 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x δ(x) posterior model discrepancy 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), ζ(x) calibrated forecast

Again, standard Bayesian estimation gives: π(θ, η, δ|y(x)) ∝ L(y(x)|η(x, θ), δ(x)) × π(θ) × π(η) × π(δ)

  • Posterior means and 90% CI’s shown.
  • Posterior prediction for ζ(x) is obtained

by computing the posterior distribution for η(x, θ) + δ(x)

  • Uncertainty in θ, η(x, t), and δ(x) are in-

corporated into the forecast.

  • Gaussian process models for η(x, t) and

δ(x)

slide-92
SLIDE 92

Gaussian Process models for combining field data and complex computer simulators

field data input settings (spatial locations) y =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

y(x1) . . . y(xn)

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

x11 x12 · · · x1px . . . . . . . . . . . . xn1 xn2 · · · xnpx

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

sim data input settings x; params θ∗ η =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

η(x∗

1, θ∗ 1)

. . . η(x∗

m, θ∗ m)

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

x∗

11

· · · x∗

1px

θ∗

11

· · · θ∗

1pθ

. . . . . . . . . . . . . . . . . . x∗

m1

· · · x∗

mpx

θ∗

m1

· · · θ∗

mpθ

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

Model sim response η(x, θ) as a Gaussian process y(x) = η(x, θ) + δ(x) + η(x, θ) ∼ GP (0, Cη(x, θ)) δ(x) ∼ GP

  • 0, Cδ(x)
  • ∼ iidN(0, 1/λ)

Cη(x, θ) depends on px + pθ-vector ρη and λη Cδ(x) depends on px-vector ρδ and λδ

slide-93
SLIDE 93

Vector form – restricting to n field obs and m simulation runs y = η(θ) + δ + η ∼ Nm(0m, Cη(ρη, λη))

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠ ∼ Nn+m ⎛ ⎜ ⎝ ⎛ ⎜ ⎝ 0n

0m

⎞ ⎟ ⎠ , Cyη = Cη + ⎛ ⎜ ⎝ Cδ ⎞ ⎟ ⎠ ⎞ ⎟ ⎠

where Cη = 1/ληRη

⎛ ⎜ ⎝ ⎛ ⎜ ⎝ x

x∗

⎞ ⎟ ⎠ , ⎛ ⎜ ⎝ 1θ

θ∗

⎞ ⎟ ⎠ ; ρη ⎞ ⎟ ⎠ + 1/λsIm+n

Cδ = 1/λδRδ(x; ρδ) + 1/λIn and the correlation matricies Rη and Rδ are given by Rη((x, θ), (x′, θ′); ρη) =

px

  • k=1 ρ

4(xk−x′

k)2

ηk

×

  • k=1 ρ

4(θk−θ′

k)2

η(k+px)

Rδ(x, x′; ρδ) =

px

  • k=1 ρ

4(xk−x′

k)2

δk

λs is typically set to something large like 106 to stabalize matrix computations and allow for numerical fluctuation in η(x, θ). We use a 0 mean for η(x, θ); an alternative is to use a linear regression mean model.

slide-94
SLIDE 94

Likelihood L(y, η|λ, ρη, λη, λs, ρδ, λδ, θ) ∝ |Cyη|−1

2 exp

⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩−1

2

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠

T

C−1

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠ ⎫ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎭

Priors π(λ) ∝ λa−1

  • e−bλ

perhaps well known from observation process π(ρηk) ∝

px+pθ

  • k=1 (1 − ρηk)−.5, where ρηk = e−.52βη

k correlation at dist = .5 ∼ β(1, .5).

π(λη) ∝ λaη−1

η

e−bηλη π(λs) ∝ λas−1

s

e−bsλs π(ρδk) ∝

px

  • k=1(1 − ρδk)−.5, where ρδk = e−.52βδ

k

π(λδ) ∝ λaδ−1

δ

e−bδλδ, π(θ) ∝ I[θ ∈ C]

  • could fix ρη, λη from prior GASP run on model output.
  • Again, many choose to reparameterize correlation parameters: β = − log(ρ)/.52

in the likelihood term

slide-95
SLIDE 95

Posterior Density

π(λ, ρη, λη, λs, ρδ, λδ, θ|y, η) ∝ |Cyη|−1

2 exp

⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩−1

2

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠

T

C−1

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠ ⎫ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎭ ×

px+pθ

  • k=1 (1 − ρηk)−.5 × λaη−1

η

e−bηλη × λas−1

s

e−bsλs ×

px

  • k=1(1 − ρδk)−.5 × λaδ−1

δ

e−bδλδ × λa−1

  • e−bλ × I[θ ∈ C]

If ρη, λη, and λs are fixed from a previous analysis ofthe simulator data, then π(λ, ρδ, λδ, θ|y, η, ρη, λη, λs) ∝ |Cyη|−1

2 exp

⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩−1

2

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠

T

C−1

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠ ⎫ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎭ ×

px

  • k=1(1 − ρδk)−.5 × λaδ−1

δ

e−bδλδ × λa−1

  • e−bλ × I[θ ∈ C]
slide-96
SLIDE 96

Predicting a new outcome: ζ = ζ(x′) = η(x′, θ) + δ(x′)

y = η(x, θ) + δ(x) + (x) η = η(x∗, θ∗) + s, s small or 0 ζ = η(x′, θ) + δ(x′), x′ univariate or multivariate ⇒

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

y η ζ

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ∼ Nn+m+1 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

0n 0m

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ , ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

λ−1

In

λ−1

s Im

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ + Cη + Cδ ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

(1) where Cη = 1/ληRη

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

x x∗ x′

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ , ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

1θ θ∗ θ

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ; ρη ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

Cδ = 1/λδRδ

⎛ ⎜ ⎝ ⎛ ⎜ ⎝ x

x′

⎞ ⎟ ⎠ ; ρδ ⎞ ⎟ ⎠ , on indicies 1, . . . , n, n + m + 1; zeros elsewhere

Given a MCMC realization (θ, λ, ρη, λη, ρδ, λδ), a realization for ζ(x′) can be produced using (1) and the conditional normal formula: ζ|y, η ∼ N

⎛ ⎜ ⎝Σ21Σ−1

11

⎛ ⎜ ⎝ y

η

⎞ ⎟ ⎠ , Σ22 − Σ21Σ−1

11 Σ12

⎞ ⎟ ⎠

slide-97
SLIDE 97

Accounting for model discrepancy

0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 prior uncertainty 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), η(x,θ)

θ

.5 1 posterior model uncertainty 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x δ(x) posterior model discrepancy 0.2 0.4 0.6 0.8 1 −3 −2 −1 1 2 3 x y(x), ζ(x) calibrated forecast

Again, standard Bayesian estimation gives: π(θ, ηn, δ|y(x)) ∝ L(y(x)|η(x, θ), δ(x)) × π(θ) × π(η) × π(δ)

  • Posterior means and 90% CI’s shown.
  • Posterior prediction for ζ(x) is obtained

by computing the posterior distribution for η(x, θ) + δ(x)

  • Uncertainty in θ, η(x, t), and δ(x) are in-

corporated into the forecast.

  • Gaussian process models for η(x, t) and

δ(x)

slide-98
SLIDE 98

References

  • T. Santner, B. J. Williams and W. I. Notz (2003) The Design and Analysis
  • f Computer Experiments, Springer.
  • M. Kennedy and A. O’Hagan (2001) Bayesian Calibration of Computer Mod-

els (with Discussion), Journal of the Royal Statistical Society B, 63, 425– 464.

  • J. Sacks, W. J. Welch, T. J. Mitchell and H. P. Wynn (1989). Design and

Analysis of computer experiments (with discussion). Statistical Science, 4, 409–423.

  • Higdon, D., Kennedy, M., Cavendish, J., Cafeo, J. and Ryne R. D. (2004)

Combining field observations and simulations for calibration and prediction. SIAM Journal of Scientific Computing, 26, 448–466.

slide-99
SLIDE 99

COMPUTER MODEL CALIBRATION 2 DEALING WITH MULTIVARIATE OUTPUT

slide-100
SLIDE 100

Carry out simulated implosions using Neddermeyer’s model

Sequence of runs carried at m input settings (x∗, θ∗

1, θ∗ 2) = (me/m, s, u0) varying

  • ver predefined ranges using an OA(32, 43)-based LH design.

  x∗

1 θ∗ 11 θ∗ 12

. . . . . . . . . x∗

m θ∗ m1 θ∗ m2

 

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10

−5

0.5 1 1.5 2 2.5 time (s) inner radius (cm) 1 2 3 4 5 x 10

−5

pi 2pi 0.5 1 1.5 2 2.5 angle (radians) time (s) inner radius (cm)

radius by time radius by time and angle φ. Each simulation produces a nη = 22 · 26 vector of radii for 22 times × 26 angles.

slide-101
SLIDE 101

Application: implosions of steel cylinders – Neddermeyer ’43

  • Initial work on implosion for fat man.
  • Use high explosive (HE) to crush steel cylindrical shells
  • Investigate the feasability of a controlled implosion
slide-102
SLIDE 102

Some History

Early work on cylinders called “beer can experi- ments.”

  • Early work not encouraging:

“...I question Dr. Neddermeyer’s serious- ness...” – Deke Parsons. “It stinks.” – R. Feynman Teller and VonNeumann were quite sup- portive of the implosion idea Data on collapsing cylinder from high speed pho- tography. Symmetrical implosion eventually accomplished using HE lenses by Kistiakowsky. Implosion played a key role in early computer ex- periments. Feynman worked on implosion calculations with IBM accounting machines. Eventually first computer with addressable mem-

  • ry was developed (MANIAC 1).
slide-103
SLIDE 103

The Experiments

slide-104
SLIDE 104

Neddermeyer’s Model

−2 2 cm

0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s

−2 2 cm

0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s

Energy from HE imparts an initial inward velocity to the cylinder v0 = me m

  • 2u0

1 + me/m mass ratio me/m of HE to steel; u0 energy per unit mass from HE. Energy converts to work done on the cylinder: work per unit mass = w = s 2ρ(1 − λ)

  • r2

i log r2 i − r2

  • log r2
  • + λ2 log λ2

ri = scaled inner radius; ro = scaled outer radius; λ = initial ri/ro; s = steel yielding stress; ρ = density of steel.

slide-105
SLIDE 105

Neddermeyer’s Model

−2 2 cm

0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s

−2 2 cm

0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s 0 1 2 3 4 X 10−5 s

ODE: dr dt =

  • 1

R2

1f(r)

  • v2

0 − s

ρg(r) 1

2

where r = inner radius of cylinder – varies with time R1 = initial outer radius of cylinder f(r) = r2 1 − λ2 ln r2 + 1 − λ2 r2

  • g(r) = (1 − λ2)−1[r2 ln r2 − (r2 + 1 − λ2) ln(r2 + 1 − λ2) − λ2 ln λ2]

λ = initial ratio of cylinder r(t = 0)/R1 constant volume condition: r2

  • uter − r2 = 1 − λ2
slide-106
SLIDE 106

Goal: use experimental data to calibrate s and u0; obtain prediction uncertainty for new experiment

−2 2 expt 1 cm −2 2 cm −2 2 −2 2 cm −2 2 expt 2 expt 3 −2 2

t = 10 µs t = 25 µs t = 45 µs

me/m ≈ .32 me/m ≈ .17 me/m ≈ .36 Hypothetical data obtained from photos at different times during the 3 exper- imental implosions. All cylinders had a 1.5in outer and a 1.0in inner radius. (λ = 2

3).

slide-107
SLIDE 107

Carry out simulated implosions using Neddermeyer’s model

Sequence of runs carried at m input settings (x∗, θ∗

1, θ∗ 2) = (me/m, s, u0) varying

  • ver predefined ranges using an OA(32, 43)-based LH design.

  x∗

1 θ∗ 11 θ∗ 12

. . . . . . . . . x∗

m θ∗ m1 θ∗ m2

 

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10

−5

0.5 1 1.5 2 2.5 time (s) inner radius (cm) 1 2 3 4 5 x 10

−5

pi 2pi 0.5 1 1.5 2 2.5 angle (radians) time (s) inner radius (cm)

radius by time radius by time and angle φ. Each simulation produces a nη = 22 · 26 vector of radii for 22 times × 26 angles.

slide-108
SLIDE 108

A 1-d implementation of the cylinder application

0.5 1 0.5 1 1 2 t (a) model runs x η(x,t) 0.5 1 −0.5 0.5 1 1.5 2 2.5 x (scaled time) y(x), η(x,t)

θ

.5 1

(b) data & prior uncertainty 0.5 1 0.5 1 1 2 t (c) posterior mean for η(x,t) x η(x,t) 0.5 1 −0.5 0.5 1 1.5 2 2.5 x (scaled time) y(x), η(x,θ)

θ

.5 1

(d) calibrated simulator prediction 0.5 1 −0.5 0.5 1 1.5 2 2.5 x (scaled time) δ(x) (e) posterior model discrepancy 0.5 1 −0.5 0.5 1 1.5 2 2.5 x (scaled time) y(x), ζ(x) (f) calibrated prediction

experimental data are collapsed radially

slide-109
SLIDE 109

Features of this basic formulation

  • Scales well with the input dimension, dim(x, θ).
  • Treats simulation model as “black box” – no need to get inside simulator.
  • Can model complicated and indirect observation processes.

Limitations of this basic formulation

  • Does not easily deal with highly multivariate data.
  • Inneficient use of multivariate simulation output.
  • Can miss important features in the physical process.

Need extension of basic approach to handle multivariate experimental observa- tions and simulation output.

slide-110
SLIDE 110

Carry out simulated implosions using Neddermeyer’s model

Sequence of runs carried at m input settings (x∗, θ∗

1, θ∗ 2) = (me/m, s, u0) varying

  • ver predefined ranges using an OA(32, 43)-based LH design.

  x∗

1 θ∗ 11 θ∗ 12

. . . . . . . . . x∗

m θ∗ m1 θ∗ m2

 

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10

−5

0.5 1 1.5 2 2.5 time (s) inner radius (cm) 1 2 3 4 5 x 10

−5

pi 2pi 0.5 1 1.5 2 2.5 angle (radians) time (s) inner radius (cm)

radius by time radius by time and angle φ. Each simulation produces a nη = 22 · 26 vector of radii for 22 times × 26 angles.

slide-111
SLIDE 111

Basis representation of simulation output

η(x, θ) =

  • i=1

ki(t, φ)wi(x, θ) Here we construct bases ki(t, φ) via principal components (EOFs):

5 −1.5 −1 −0.5 angle PC 1 (98.9% of variation) time r 5 −1.5 −1 −0.5 angle PC 2 (0.9% of variation) time r 5 −1.5 −1 −0.5 angle PC 3 (0.1% of variation) time r

basis elements do not change with φ – from symmetry of Neddermeyer’s model. Model untried settings with a GP model on weights: wi(x, θ1, θ2) ∼ GP(0, λ−1

wi R((x, θ), (x′, θ′); ρwi))

0.5 1 0.5 1 −5 5

x

PC 1

θ1 w1(x,θ)

0.5 1 0.5 1 −5 5

x

PC 2

θ1 w2(x,θ)

0.5 1 0.5 1 −5 5

x

PC 3

θ1 w3(x,θ)

slide-112
SLIDE 112

PC representation of simulation output

Ξ = [η1; · · · ; ηm] – a nη × m matrix that holds output of m simulations SVD decomposition: Ξ = UDV T Kη is 1st pη columns of [ 1

√mUD] – columns of [√mV T] have variance 1

Cylinder example:

5 −1.5 −1 −0.5 angle PC 1 (98.9% of variation) time r 5 −1.5 −1 −0.5 angle PC 2 (0.9% of variation) time r 5 −1.5 −1 −0.5 angle PC 3 (0.1% of variation) time r

pη = 3 PC’s: Kη = [k1; k2; k3] – each vector ki holds trace of PC i. ki’s do not change with φ – from symmetry of Neddermeyer’s model. Simulated trace η(x∗

i, θ∗ i1, θ∗ i2) = Kηw(x∗ i, θ∗ i1, θ∗ i2)+i, i’s iid

∼ N(0, λ−1

η ), for any

set of tried simulation inputs (x∗

i, θ∗ i1, θ∗ i2).

slide-113
SLIDE 113

Gaussian process models for PC weights

Want to evaluate η(x, θ1, θ2) at arbitrary input setting (x, θ1, θ2). Also want analysis to account for uncertainty here. Approach: model each PC weight as a Gaussian process: wi(x, θ1, θ2) ∼ GP(0, λ−1

wi R((x, θ), (x′, θ′); ρwi))

where R((x, θ), (x′, θ′); ρwi) =

px

  • k=1

ρ

4(xk−x′

k)2

wik

×

  • k=1

ρ

4(θk−θ′

k)2

wi(k+px)

(1) Restricting to the design settings ⎛ ⎝ x∗

1 θ∗ 11 θ∗ 12

. . . . . . . . . x∗

m θ∗ m1 θ∗ m2

⎞ ⎠ and specifying wi = (wi(x∗

1, θ∗ 11, θ∗ 12), . . . , wi(x∗ m, θ∗ m1, θ∗ m2))T

gives wi

iid

∼ N

  • 0, λ−1

wi R((x∗, θ∗); ρwi)

  • , i = 1, . . . , pη

where R((x∗, θ∗); ρwi)m×m is given by (??).

*note: additional nugget term wi

iid

∼ N

  • 0, λ−1

wi R((x∗, θ∗); ρwi) + λ−1 i Im

  • , i = 1, . . . , pη, may be useful.
slide-114
SLIDE 114

Gaussian process models for PC weights

At the m simulation input settings the mpη-vector w has prior disribution w = ⎛ ⎝ w1 . . . wpη ⎞ ⎠ ∼ N ⎛ ⎝ ⎛ ⎝ . . . ⎞ ⎠ , ⎛ ⎝ λ−1

w1R((x∗, θ∗); ρw1)

... λ−1

wpηR((x∗, θ∗); ρwpη)

⎞ ⎠ ⎞ ⎠ ⇒ w ∼ N(0, Σw); note Σw = Ipη ⊗ λ−1

w R((x∗, θ∗); ρw) can break down.

Emulator likelihood: η = vec([η(x∗

1, θ∗ 11, θ∗ 12); · · · ; η(x∗ m, θ∗ m1, θ∗ m2)])

L(η|w, λη) ∝ λ

mnη 2

η

exp

  • − 1

2λη(η − Kw)T(η − Kw)

  • , λη ∼ Γ(aη, bη)

where nη is the number of observations in a simulated trace and K = [Im ⊗ k1; · · · ; Im ⊗ kpη]. Equivalently L(η|w, λη) ∝ λ

mpη 2

η

exp

  • − 1

2λη(w − ˆ

w)T(KTK)(w − ˆ w)

  • ×

λ

m(nη−pη) 2

η

exp

  • − 1

2ληηT(I − K(KTK)−1KT)η

  • ∝ λ

mpη 2

η

exp

  • − 1

2λη(w − ˆ

w)T(KTK)(w − ˆ w)

  • , λη ∼ Γ(a′

η, b′ η)

a′

η = aη + m(nη−pη) 2

, b′

η = bη +

1 2ηT(I − K(KTK)−1KT)η, ˆ

w = (KTK)−1KTη.

slide-115
SLIDE 115

Gaussian process models for PC weights

Resulting posterior can then be based on computed PC weights ˆ w: ˆ w|w, λη ∼ N(w, (ληKTK)−1) w|λw, ρw ∼ N(0, Σw) ⇒ ˆ w|λη, λw, ρw ∼ N(0, (ληKTK)−1 + Σw) Resulting posterior is then: π(λη, λw, ρw| ˆ w) ∝

  • (ληKTK)−1 + Σw
  • −1

2 exp{− 1

2 ˆ

wT([ληKTK]−1 + Σw)−1 ˆ w} × λ

a′

η−1

η

e−b′

ηλη ×

  • i=1

λaw−1

wi

e−bwλwi ×

  • i=1

⎧ ⎨ ⎩

px

  • j=1

(1 − ρwij)bρ−1

  • j=1

(1 − ρwi(j+px))bρ−1 ⎫ ⎬ ⎭ MCMC via Metropolis works fine here. Bounded range of ρwij’s facilitates MCMC.

slide-116
SLIDE 116

Posterior distribution of ρw

1 2 3 0.5 1 PC1 [x θ] 1 2 3 0.5 1 PC2 [x θ] 1 2 3 0.5 1 PC3 [x θ]

Separate models by PC More opportunity to take advantage of effect sparsity

slide-117
SLIDE 117

Predicting simulator output at untried (x⋆, θ⋆

1, θ⋆ 2) Want η(x⋆, θ⋆

1, θ⋆ 2) = Kw(x⋆, θ⋆ 1, θ⋆ 2)

For a given draw (λη, λw, ρw) a draw of w⋆ can be produced:

  • ˆ

w w⋆

  • ∼ N
  • ,
  • (ληKTK)−1 0
  • + Σw,w⋆(λw, ρw)
  • Define

V = V11 V12 V21 V22

  • =

(ληKTK)−1 0

  • + Σw,w⋆(λw, ρw)
  • Then

w⋆| ˆ w ∼ N(V21V −1

11 ˆ

w, V22 − V21V −1

11 V12)

Realizations can be generated from sample of MCMC output. Lots of info (data?) makes conditioning on point estimate ( λη, λw, ρw) a good approximation to the posterior. Posterior mean or median work well for ( λη, λw, ρw)

slide-118
SLIDE 118

Comparing emulator predictions to holdout simulations

emulator 90% prediction bands and actual (holdout) simulations

1 2 3 4 5 6 x 10

−5

0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 time µs inner radius (cm)

  • holdout simulation

− 90% emulator bounds

slide-119
SLIDE 119

Exploring sensitivity of simulator output to model inputs

Simulator predictions varing 1 input, holding others at nominal

slide-120
SLIDE 120

Basic formulation – borrows from Kennedy and O’Hagan (2001)

5 2 4 x 10

−5

1 2 φ Experiment 1 time r, η 5 2 4 x 10

−5

−1 1 φ time δ 5 2 4 x 10

−5

1 2 φ time r, η+δ

x = me/m ≈ .32 θ1 = s ≈ ? θ2 = u0 ≈ ? (t, φ) simulation output space x experimental conditions θ calibration parameters ζ(x) true physical system response given conditions x η(x, θ) simulator response at x and θ. y(x) experimental observation of the physical system δ(x) discrepancy between ζ(x) and η(x, θ) may be decomposed into numerical error and bias e(x)

  • bservation error of the experimental data

y(x) = ζ(x) + e(x) y(x) = η(x, θ) + δ(x) + e(x)

slide-121
SLIDE 121

Kernel basis representation for spatial processes δ(s)

Define pδ basis functions d1(s), . . . , dpδ(s).

−2 2 4 6 8 10 12 0.0 0.4 0.8 s basis

Here dj(s) is normal density cetered at spatial location ωj: dj(s) = 1 √ 2π exp{−1 2(s − ωj)2} set δ(s) =

  • j=1

dj(s)vj where v ∼ N(0, λ−1

v Ipδ).

Can represent δ = (δ(s1), . . . , δ(sn))T as δ = Dv where Dij = dj(si)

slide-122
SLIDE 122

v and d(s) determine spatial processes δ(s)

dj(s)vj δ(s)

−2 2 4 6 8 10 12 −0.5 0.5 1.5 s basis −2 2 4 6 8 10 12 −0.5 0.5 1.5 s z(s)

Continuous representation: δ(s) =

  • j=1

dj(s)vj where v ∼ N(0, λ−1

v Ipδ).

Discrete representation: For δ = (δ(s1), . . . , δ(sn))T, δ = Dv where Dij = dj(si)

slide-123
SLIDE 123

Basis representation of discrepancy

time angle φ

Represent discrepancy δ(x) using basis functions and weights pδ = 24 basis functions over (t, φ); D = [d1; · · · ; dpδ]; dk’s hold basis. δ(x) = Dv(x) where v(x) ∼ GP

  • 0, λ−1

v Ipδ ⊗ R(x, x′; ρv)

  • with

R(x, x′; ρv) =

px

  • k=1

ρ

4(xk−x′

k)2

vk

(2)

slide-124
SLIDE 124

Integrated model formulation

Data y(x1), . . . , y(xn) collected for n experiments at input conditions x1, . . . , xn. Each y(xi) is a collection of nyi measurements over points indexed by (t, φ). y(xi) = η(xi, θ) + δ(xi) + ei = Kiw(xi, θ) + Div(xi) + ei y(xi)|w(xi, θ), v(xi), λy ∼ N

  • [Di; Ki]
  • v(xi)

w(xi, θ)

  • , (λyWi)−1
  • Since support of each y(xi) varies and doesn’t match that of sims, the basis

vectors in Ki must be interpolated from Kη; similary, Di must be computed from the support of y(xi):

pi 2pi 5 x 10

−5

−1 −0.5 angle time r pi 2pi 0 2 4 x 10

−5

−0.05 0.05 time angle r pi 2pi 5 x 10

−5

−10 −5 5 x 10 time angle r

*note: cubic spline interpolation over (time, φ) used here.

slide-125
SLIDE 125

Integrated model formulation

Define ny = ny1 + · · · + nyn, the total number of experimental data points, y to be the ny-vector from concatination of the y(xi)’s, v = vec([v(x1); · · · ; v(xn)]T) and u(θ) = vec([w(x1, θ1, θ2); · · · ; w(xn, θ1, θ2)]T) y|v, u(θ), λy ∼ N

  • B

v u(θ)

  • , (λyWy)−1
  • , λy ∼ Γ(ay, by)

(3) where Wy = diag(W1, . . . , Wn) and B = diag(D1, . . . , Dn, K1, . . . , Kn) P T

D

P T

K

  • PD and PK are permutation matricies whose rows are given by:

PD(j + n(i − 1); ·) = eT

(j−1)pδ+i, i = 1, . . . , pδ; j = 1, . . . , n

PK(j + n(i − 1); ·) = eT

(j−1)pη+i, i = 1, . . . , pη; j = 1, . . . , n

slide-126
SLIDE 126

Integrated model formulation (continued)

Equivalently (??) can be represented ˆ v ˆ u

  • v

u(θ)

  • , λy ∼ N

v u(θ)

  • , (λyBTWyB)−1
  • , λy ∼ Γ(a′

y, b′ y)

with ny = ny1 + · · · + nyn, the total number of experimental data points

  • ˆ

v ˆ u

  • = (BTWyB)−1BTWyy

a′

y = ay +

1 2[ny − n(pδ + pη)]

b′

y = by + 1

2

  • y − B

ˆ v ˆ u T Wy

  • y − B

ˆ v ˆ u

  • dimension reduction

model simulator data and discrep standard nη · m ny basis pη · m n · (pδ + pη) Basis approach particularly efficient when nη and ny are large.

slide-127
SLIDE 127

Marginal likelihood

The (marginal) likelihood L(ˆ v, ˆ u, ˆ w|λη, λw, ρw, λy, λv, ρv, θ) has the form   ˆ v ˆ u ˆ w   ∼ N       ,  Λ−1

y

0 0 Λ−1

η

  +   Σv 0 0 Σuw     where Λy = λyBTWyB Λη = ληKTK Σv = λ−1

v Ipη ⊗ R(x, x; ρv)

R(x, x; ρv) = n × n correlation matrix from applying (??) to the conditions x1, . . . , xn corresponding the the n experiments. Σuw =     

λ−1

w1R((x, θ), (x, θ); ρw1)

λ−1

w1R((x, θ), (x∗, θ∗); ρw1)

... ... λ−1

wpηR((x, θ), (x, θ); ρwpη)

λ−1

wpηR((x, θ), (x∗, θ∗); ρwpη)

λ−1

w1R((x∗, θ∗), (x, θ); ρw1)

λ−1

w1R((x∗, θ∗), (x∗, θ∗); ρw1)

... ... λ−1

wpηR((x∗, θ∗), (x, θ); ρwpη)

λ−1

wpηR((x∗, θ∗), (x∗, θ∗); ρwpη)

     Permutation of Σuw is block diagonal ⇒ can speed up computations. Only off diagonal blocks of Σuw depend on θ.

slide-128
SLIDE 128

Posterior distribution

Likelihood: L(ˆ v, ˆ u, ˆ w|λη, λw, ρw, λy, λv, ρv, θ) Prior: π(λη, λw, ρw, λy, λv, ρv, θ) ⇒ Posterior: π(λη, λw, ρw, λy, λv, ρv, θ|ˆ v, ˆ u, ˆ w) ∝ L(ˆ v, ˆ u, ˆ w|λη, λw, ρw, λy, λv, ρv, θ) × π(λη, λw, ρw, λy, λv, ρv, θ) Posterior exploration via MCMC Can take advantage of structure and sparcity to speed up sampling. A useful approximation to speed up posterior evaluation: π(λη, λw, ρw, λy, λv, ρv, θ|ˆ v, ˆ u, ˆ w) ∝ L( ˆ w|λη, λw, ρw) × π(λη, λw, ρw) × L(ˆ v, ˆ u|λη, λw, ρw, λy, λv, ρv, θ) × π(λy, λv, ρv, θ) In this approximation, experimental data is not used to inform about parameters λη, λw, ρw which govern the simulator process η(x, θ).

slide-129
SLIDE 129

Posterior distribution of model parameters (θ1, θ2)

slide-130
SLIDE 130

Posterior mean decomposition for each experiment

5 2 4 x 10

−5

1 2 φ Experiment 1 time r, η 5 2 4 x 10

−5

−1 1 φ time δ 5 2 4 x 10

−5

1 2 φ time r, η+δ 5 2 4 x 10

−5

1 2 φ Experiment 2 time r, η 5 2 4 x 10

−5

−1 1 φ time δ 5 2 4 x 10

−5

1 2 φ time r, η+δ 5 2 4 x 10

−5

1 2 φ Experiment 3 time r, η 5 2 4 x 10

−5

−1 1 φ time δ 5 2 4 x 10

−5

1 2 φ time r, η+δ

slide-131
SLIDE 131

Posterior prediction for implosion in each experiment

5 5 x 10

−5

0.5 1 1.5 2 2.5 φ Experiment 1 time r 5 5 x 10

−5

0.5 1 1.5 2 2.5 φ Experiment 2 time r 5 5 x 10

−5

0.5 1 1.5 2 2.5 φ Experiment 3 time r

5 5 x 10

−5

0.5 1 1.5 2 2.5 φ Experiment 1 time r r

slide-132
SLIDE 132

90% prediction intervals for implosions at exposure times

−2 2 −2 −1 1 2 Time 15 µs −2 2 −2 −1 1 2 Time 27 µs −2 2 −2 −1 1 2 Time 45 µs −2 2 −2 −1 1 2 Time 45 µs −2 2 −2 −1 1 2 Time 25 µs −2 2 −2 −1 1 2 Time 45 µs

Experiment 1 Experiment 2 Experiment 3

Predictions from separate analyses which hold data from the experiment being predicted.

slide-133
SLIDE 133

References

  • Heitmann, K., Higdon, D., Nakhleh, C. and Habib, S. (2006). Cosmic Cali-
  • bration. Astrophysical Journal Letters.
  • Williams, B., Higdon, D., Moore, L., McKay, M. and Keller-McNulty S.

(2006). Combining Experimental Data and Computer Simulations, with an Application to Flyer Plate Experiments, Bayesian Analysis.

  • D. Higdon, J. Gattiker, B. Williams and M. Rightley (2008). Computer Model

Calibration using High Dimensional Output, Journal of the American Sta- tistical Association.

  • Bayarri, Berger, Garcia-Donato, Liu, Palomo, Paulo, Sacks, Walsh, Cafeo,

and Parthasarathy (2007). Computer Model Validation with Functional Out-

  • put. Annals of Statistics, 1874-1906.
  • Rougier, J. (2007). Lightweight emulators for multivariate deterministic func-
  • tions. Unpublished, available at

http://www.maths.bris.ac.uk/∼mazjcr/lightweight1.pdf.