Bayesian parameter estimation in predictive engineering Damon - - PowerPoint PPT Presentation

bayesian parameter estimation in predictive engineering
SMART_READER_LITE
LIVE PREVIEW

Bayesian parameter estimation in predictive engineering Damon - - PowerPoint PPT Presentation

Bayesian parameter estimation in predictive engineering Damon McDougall Institute for Computational Engineering and Sciences, UT Austin 14th August 2014 1/27 Motivation Understand physical phenomena Observations of phenomena Mathematical


slide-1
SLIDE 1

Bayesian parameter estimation in predictive engineering

Damon McDougall

Institute for Computational Engineering and Sciences, UT Austin

14th August 2014

1/27

slide-2
SLIDE 2

Motivation

Understand physical phenomena Observations of phenomena Mathematical model of phenomena (includes some parameters that characterise behaviour) Numerical model approximating mathematical model Find parameters in a situation of interest Use the parameters to do something cool

2/27

slide-3
SLIDE 3

Understanding errors

Reality Mathematical model Numerical model errors errors Validation Verification

3/27

slide-4
SLIDE 4

Setup

Model (usually a PDE): G(u, θ) where u is the initial condition and θ are model paramaters. u: perhaps an initial condition θ: perhaps some interesting model parameters (diffusion, convection speed, permeability field, material properties) Observations: yj,k = u(xj, tk) + ηj,k, ηj,k

i.i.d

∼ N(0, σ2) ❀ y = G(θ) + η, η ∼ N(0, σ2I) Want: P(θ|y) ∝ P(y|θ)P(θ) Why?

4/27

slide-5
SLIDE 5

Do we need Bayes’ theorem?

Is Bayes’ theorem really necessary? We could minimise J(θ) = 1 2σ2 G(θ) − y2 + 1 2λ2 θ2 to get θ∗ = argminθ J(θ)

5/27

slide-6
SLIDE 6

Do we need Bayes’ theorem?

6/27

slide-7
SLIDE 7

Do we need Bayes’ theorem?

7/27

slide-8
SLIDE 8

Do we need Bayes’ theorem?

Bayesian methods involve estimating uncertainty (as well as mean). They’re equivalent. Deterministic optimisation: J(θ) = 1 2σ2 G(θ) − y2

  • misfit

+ 1 2λ2 θ2

  • regularisation

Bayesian framework: exp(−J(θ)) = exp

  • − 1

2σ2 G(θ) − y2

  • likelihood

exp

  • − 1

2λ2 θ2

  • prior

= P(y|θ)P(θ) ∝ P(θ|y)

8/27

slide-9
SLIDE 9

Method for solving Bayesian inverse problems

  • Kalman filtering/smoothing methods
  • Kalman filter (Kalman)
  • Ensemble Kalman filter (Evensen)
  • Variational methods
  • 3D VAR (Lorenc)
  • 4D VAR (Courtier, Talagrand, Lawless)
  • Particle methods
  • Particle filter (Doucet)
  • Sampling methods
  • Markov chain Monte Carlo (Metropolis, Hastings)

This list is not exhaustive. The body of work is prodigious.

9/27

slide-10
SLIDE 10

QUESO

Nutshell: QUESO gives samples from P(θ|y) (called MCMC)

  • Library for Quantifying Uncertainty in Estimation, Simulation and

Optimisation

  • Born in 2008 as part of PECOS PSAAP programme
  • Provides robust and scalable sampling algorithms for UQ in

computational models

  • Open source
  • C++
  • MPI for communication
  • Parallel chains, each chain can house several processes
  • Dependencies are MPI, Boost and GSL. Other optional features exist
  • https://github.com/libqueso/queso

10/27

slide-11
SLIDE 11

What does MCMC look like?

11/27

slide-12
SLIDE 12

What does MCMC look like?

11/27

slide-13
SLIDE 13

What does MCMC look like?

11/27

slide-14
SLIDE 14

What does MCMC look like?

11/27

slide-15
SLIDE 15

What does MCMC look like?

11/27

slide-16
SLIDE 16

What does MCMC look like?

11/27

slide-17
SLIDE 17

What does MCMC look like?

11/27

slide-18
SLIDE 18

What does MCMC look like?

11/27

slide-19
SLIDE 19

What does MCMC look like?

11/27

slide-20
SLIDE 20

What does MCMC look like?

11/27

slide-21
SLIDE 21

What does MCMC look like?

11/27

slide-22
SLIDE 22

What does MCMC look like?

11/27

slide-23
SLIDE 23

What does MCMC look like?

11/27

slide-24
SLIDE 24

What does MCMC look like?

11/27

slide-25
SLIDE 25

What does MCMC look like?

11/27

slide-26
SLIDE 26

What does MCMC look like?

11/27

slide-27
SLIDE 27

What does MCMC look like?

11/27

slide-28
SLIDE 28

What does MCMC look like?

E(θ|y) ≈ 1

N

N

k=1 θk

11/27

slide-29
SLIDE 29

How to do MCMC? Sampling P(θ|y)

  • Idea: Construct {θk}∞

k=1 cleverly such that {θk}∞ k=1 i.i.d

∼ P(θ|y)

  • 1. Let θj be the ‘current’ state in the sequence and construct a

proposal, z z = (1 − β2)

1 2 θj + βξ,

some β ∈ (0, 1)

  • 2. Define Φ(·) :=

1 2σ2 G(·) − y2

  • 3. Compute α(θj, z) = 1 ∧ exp(Φ(θj) − Φ(z))
  • 4. Let

θj+1 =

  • θ

with probability α(θj, z) θj with probability 1 − α(θj, z)

  • Take θ1 to be a draw from P(θ)

12/27

slide-30
SLIDE 30

How to do MCMC? Sampling P(θ|y)

  • Idea: Construct {θk}∞

k=1 cleverly such that {θk}∞ k=1 i.i.d

∼ P(θ|y)

  • 1. Let θj be the ‘current’ state in the sequence and construct a

proposal, z z = (1 − β2)

1 2 θj + βξ,

some β ∈ (0, 1)

  • 2. Define Φ(·) :=

1 2σ2 G(·) − y2

  • 3. Compute α(θj, z) = 1 ∧ exp(Φ(θj) − Φ(z))
  • 4. Let

θj+1 =

  • θ

with probability α(θj, z) θj with probability 1 − α(θj, z)

  • Take θ1 to be a draw from P(θ)

12/27

slide-31
SLIDE 31

How to do MCMC? Sampling P(θ|y)

  • Idea: Construct {θk}∞

k=1 cleverly such that {θk}∞ k=1 i.i.d

∼ P(θ|y)

  • 1. Let θj be the ‘current’ state in the sequence and construct a

proposal, z z = (1 − β2)

1 2 θj + βξ,

some β ∈ (0, 1)

  • 2. Define Φ(·) :=

1 2σ2 G(·) − y2

  • 3. Compute α(θj, z) = 1 ∧ exp(Φ(θj) − Φ(z))
  • 4. Let

θj+1 =

  • θ

with probability α(θj, z) θj with probability 1 − α(θj, z)

  • Take θ1 to be a draw from P(θ)

12/27

slide-32
SLIDE 32

How to do MCMC? Sampling P(θ|y)

  • Idea: Construct {θk}∞

k=1 cleverly such that {θk}∞ k=1 i.i.d

∼ P(θ|y)

  • 1. Let θj be the ‘current’ state in the sequence and construct a

proposal, z z = (1 − β2)

1 2 θj + βξ,

some β ∈ (0, 1)

  • 2. Define Φ(·) :=

1 2σ2 G(·) − y2

  • 3. Compute α(θj, z) = 1 ∧ exp(Φ(θj) − Φ(z))
  • 4. Let

θj+1 =

  • θ

with probability α(θj, z) θj with probability 1 − α(θj, z)

  • Take θ1 to be a draw from P(θ)

12/27

slide-33
SLIDE 33

How to do MCMC? Sampling P(θ|y)

  • Idea: Construct {θk}∞

k=1 cleverly such that {θk}∞ k=1 i.i.d

∼ P(θ|y)

  • 1. Let θj be the ‘current’ state in the sequence. Make a draw ξ ∼ P(θ)

and construct a proposal, z z = (1 − β2)

1 2 θj + βξ,

some β ∈ (0, 1)

  • 2. Define Φ(·) :=

1 2σ2 G(·) − y2

  • 3. Compute α(θj, z) = 1 ∧ exp(Φ(θj) − Φ(z))
  • 4. Let

θj+1 =

  • θ

with probability α(θj, z) θj with probability 1 − α(θj, z)

  • Take θ1 to be a draw from P(θ)

12/27

slide-34
SLIDE 34

How to do MCMC? Sampling P(θ|y)

  • Idea: Construct {θk}∞

k=1 cleverly such that {θk}∞ k=1 i.i.d

∼ P(θ|y)

  • 1. Let θj be the ‘current’ state in the sequence. Make a draw ξ ∼ P(θ)

and construct a proposal, z z = (1 − β2)

1 2 θj + βξ,

some β ∈ (0, 1)

  • 2. Define Φ(·) :=

1 2σ2 G(·) − y2

  • 3. Compute α(θj, z) = 1 ∧ exp(Φ(θj) − Φ(z))
  • 4. Let

θj+1 =

  • θ

with probability α(θj, z) θj with probability 1 − α(θj, z)

  • Take θ1 to be a draw from P(θ)

12/27

slide-35
SLIDE 35

Why use QUESO?

Other solutions are available, e.g. R, PyMC, emcee, MICA, Stan. QUESO solves the same problem, but:

  • Has been designed to be used with large forward problems
  • Has been used successfully with 5000+ cores
  • Leverages parallel MCMC algorithms
  • Supports for finite and infinite dimensional problems

Statistical Application QUESO Forward Code

13/27

slide-36
SLIDE 36

Why use QUESO?

Chain 1 Chain 2 Chain 3 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 MCMC MCMC MCMC

14/27

slide-37
SLIDE 37

Example 1: convection-diffusion

We are given a convection-diffusion model (uc)x − (νcx)x = s, x ∈ [0, 1], c(0) = c(1) = 0. Functions of x are: u, c and s. Constants are: ν (viscosity). The unkown is c, typically concentration. The underlying convection velocity is u. The forward problem: Given u and s, find c.

15/27

slide-38
SLIDE 38

Example 1: convection-diffusion

We are also given observations model

  • (uc)x − (νcx)x = s,

x ∈ [0, 1], c(0) = c(1) = 0.

  • bservations
  • yj = c(xj) + ηj,

ηj

i.i.d

∼ N(0, σ2), ❀ y = G(u) + η, η ∼ N(0, σ2I). The observations are of c. We wish to learn about u. We will use Bayes’s theorem: P(u|y) ∝ P(y|u)P(u) True u = 1 − cos(2πx) True s = 2π(1 − cos(2πx)) cos(2πx) + 2π sin2(2πx) + 4π2ν sin(2πx)

16/27

slide-39
SLIDE 39

Example 1: convection-diffusion

How do we know we are solving the right PDE (G) to begin with?

21 23 25 27 29 211 213 215 Number of grid points 10−10 10−8 10−6 10−4 10−2 100

L2 error H1 error

  • rder 1
  • rder 2

Note: Use the MASA [1] library to verify your forward problem.

[1] Malaya et al., MASA: a library for verification using manufactured and analytical solutions, Engineering with Computers (2012) 17/27

slide-40
SLIDE 40

Example 1: convection-diffusion

Recap Bayes’s theorem, P(u|y) ∝ P(y|u)P(u). Remember, we don’t know u but have observations and model: y = G(u) + η, η ∼ N(0, σ2I). We also need a prior on u P(u) = N(0, (−∆)−α). Aim is to get information from the posterior.

18/27

slide-41
SLIDE 41

Example 1: convection-diffusion

200 400 600 800 1000 Hundreds of iterations (k) 0.0 0.2 0.4 0.6 0.8 1.0 1.2

ukL2 1

k

k

i=0 uiL2 19/27

slide-42
SLIDE 42

Example 1: convection-diffusion

200 400 600 800 1000 Hundreds of iterations (k) 0.00 0.05 0.10 0.15 0.20

  • 1

k−1

k

i=0(ui − uik)2L2 20/27

slide-43
SLIDE 43

Example 1: convection-diffusion

Suppose we got the source term wrong: s = 2π(1 − cos(2πx)) cos(2πx) + 2π sin2(2πx) + 4π2ν sin(2πx) ˆ s = 4π2ν sin(2πx)

0.0 0.2 0.4 0.6 0.8 1.0 −400 −300 −200 −100 100 200 300 400 500

s ˆ s

21/27

slide-44
SLIDE 44

Example 1: convection-diffusion

0.0 0.2 0.4 0.6 0.8 1.0 PDE Domain 0.0 0.6 1.2 1.8 2.4 3.0 Variance in u ×10−4

Without model error With model error

22/27

slide-45
SLIDE 45

Example 1: convection-diffusion

200 400 600 800 1000 Hundreds of iterations (k) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

dck dx

  • x=1

×10−3 + 6.282

Without model error With model error Truth

23/27

slide-46
SLIDE 46

Example 2: Teleseismic earthquake model

G computes teleseismic earthquake wave phases P and SH θ are rupture constraint parameters θ = (slip magnitude, slip direction, start time, rise time) ∈ R4 Posterior P(θ|y) is a density on a four dimensional space Observations y are of the produced waveform, with noise yk = Gk(θ) + ηk, ηk

i.i.d

∼ N(0, σ2) ❀ y = G(θ) + η, η ∼ N(0, σ2I) Prior P(θ) is a uniform distribution on R4 (improper)

24/27

slide-47
SLIDE 47

Example 2: Kikuchi and Kanamori model

5.0 5.5 6.0 6.5 7.0 7.5 Slip 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Truth Uncertainty Mean −96 −94 −92 −90 −88 −86 −84 Rake 0.00 0.05 0.10 0.15 0.20 0.25 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 Rise time 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 Start time 1 2 3 4 5 25/27

slide-48
SLIDE 48

Summary

  • Regularised optimisation ⇔ Bayesian inversion
  • ∴ Bayesian inversion is not scary
  • Uncertainty quantification is crucial; prediction
  • Wealth of methods; pick your poison
  • My go-to is MCMC, but a different method may suit you better
  • Predictive validation
  • The role of experiments and their effect on prediction
  • There is a framework for this (Moser, Oliver, Terejanu, Simmons)
  • I’ll be at SIAM CSE

26/27

slide-49
SLIDE 49

Questions?

27/27