Bayesian Inverse Problems and Uncertainty Quantification Hanne - - PowerPoint PPT Presentation

bayesian inverse problems and uncertainty quantification
SMART_READER_LITE
LIVE PREVIEW

Bayesian Inverse Problems and Uncertainty Quantification Hanne - - PowerPoint PPT Presentation

Bayesian Inverse Problems and Uncertainty Quantification Hanne Kekkonen Centre for Mathematical Sciences University of Cambridge June 4, 2019 Inverse problems arise naturally from applications 1 / 27 Inverse problems are ill-posed We want to


slide-1
SLIDE 1

Bayesian Inverse Problems and Uncertainty Quantification

Hanne Kekkonen

Centre for Mathematical Sciences University of Cambridge

June 4, 2019

slide-2
SLIDE 2

Inverse problems arise naturally from applications

1 / 27

slide-3
SLIDE 3

Inverse problems are ill-posed

We want to recover the unknown u from a noisy measurement m; m = Au + noise, where A is a forward operator that usually causes loss of information.

2 / 27

slide-4
SLIDE 4

Inverse problems are ill-posed

We want to recover the unknown u from a noisy measurement m; m = Au + noise, where A is a forward operator that usually causes loss of information. Well-posedness as defined by Jacques Hadamard:

  • 1. Existence: There exists at least one solution.
  • 2. Uniqueness: There is at most one solution.
  • 3. Stability: The solution depends continuously on data.

Inverse problems are ill-posed breaking at least one of the above conditions.

2 / 27

slide-5
SLIDE 5

The naive inversion does not produce stable solutions

We want to approximate u from a measurement m = Au + n, where A : X → Y is linear and n is noise. One approach is to use the least squares method

  • u = arg min

u∈X

  • Au − m2

Y

  • .

Problem: Multiple minima and sensitive dependence on the data m.

3 / 27

slide-6
SLIDE 6

Tikhonov regularisation is a classical method for solving ill-posed problems

We want to approximate u from a measurement m = Au + n, where A : X → Y is linear and n is noise. The problem is ill-posed so we add a regularising term and get

  • u = arg min

u∈E⊂X

  • Au − m2

Y + αu2 E

  • Regularisation gives a stable approximate solution for the inverse

problem.

4 / 27

slide-7
SLIDE 7

Bayes formula combines data and a priori information

We want to reconstruct the most probable u ∈ Rk in light of Measurement information: M | u ∼ Pu with Lebesgue density ρ(m | u) = ρε(m − Au). A priori information: U ∼ Πpr with Lebesgue density πpr(u).

Bayes’ formula

We can update the prior, given a measurement, to a posterior distribution using the Bayes’ formula: π(u | m) ∝ πpr(u)ρ(m | u) The result of Bayesian inversion is the posterior distribution π(u | m).

5 / 27

slide-8
SLIDE 8

The result of Bayesian inversion is the posterior distribution, but typically one looks at estimates

Maximum a posteriori (MAP) estimate: arg max

u∈Rn π(u | m)

Conditional mean (CM) estimate:

  • Rn u π(u | m) du

6 / 27

slide-9
SLIDE 9

Gaussian example

Assume we are interested in the measurement model M = AU + N, where: A : X → Y, with X = Rd and Y = Rk. N is white Gaussian noise. U follows Gaussian prior. Posterior has density πm(u) = π(u | m) ∝ exp

  • − 1

2m − Au2

Rk − 1

2u2

Σ

  • We can use the mean of the posterior as a point estimator but having the

whole posterior allows uncertainty quantification.

7 / 27

slide-10
SLIDE 10

Why are we interested in uncertainty quantification?

8 / 27

slide-11
SLIDE 11

Uncertainty quantification has many applications

Studying the whole posterior distribution instead of just a point estimate

  • ffers us more information.

Uncertainty quantification Confidence and credible sets E.g. Weather and climate predictions Using the whole posterior Geological sensing Bayesian search theory

Figure: Search for the wreckage of Air France flight AF 447, Stone et al.

9 / 27

slide-12
SLIDE 12

What do we mean by uncertainty quantification?

  • I’m going to die?
  • POSSIBLY.
  • Possibly? You turn up when people are possibly

going to die?

  • OH, YES. IT’S QUITE THE NEW THING. IT’S

BECAUSE OF THE UNCERTAINTY PRINCIPLE.

  • What’s that?
  • I’M NOT SURE.
  • That’s very helpful.
  • I THINK IT MEANS PEOPLE MAY OR MAY NOT DIE. I HAVE TO SAY IT’S

PLAYING HOB WITH MY SCHEDULE, BUT I TRY TO KEEP UP WITH MODERN THOUGHT.

  • Terry Pratchett, The Fifth Elephant

10 / 27

slide-13
SLIDE 13

Bayesian credible set

A Bayesian credible set is a region in the posterior distribution that contains a large fraction of the posterior mass.

11 / 27

slide-14
SLIDE 14

Frequentist confidence region

11 / 27

slide-15
SLIDE 15

Consistency of a Bayesian solution

Once we have achieved a Bayesian solution the natural next step is to consider the consistency of the solution. Convergence of a point estimator to the ‘true’ u†. Contraction of the posterior distribution; Do we have Π(u : d(u, u†) > δn | m) →Pu† 0, for some δn → 0, as the sample size n → ∞. Is optimal contraction rate enough to guarantee that the Bayesian credible sets have correct frequentist coverage?

12 / 27

slide-16
SLIDE 16

Credible sets do not necessarily cover the truth well

Correctly specified prior Prior misspecified on the boundary Monard, Nickl & Paternain, The Annals of Statistics, 2019

13 / 27

slide-17
SLIDE 17

Do credible sets quantify frequentist uncertainty?

Do we have for C = C(m) Π

  • u ∈ C | m
  • ≈ 0.95

⇔ Pu†

  • u† ∈ C(m†)
  • ≈ 0.95?

14 / 27

slide-18
SLIDE 18

Do credible sets quantify frequentist uncertainty?

Do we have for C = C(m) Π

  • u ∈ C | m
  • ≈ 0.95

⇔ Pu†

  • u† ∈ C(m†)
  • ≈ 0.95?

Bernstein–von Mises Theorem (BvM)

For large sample size n, with ˆ uMLE being the maximum likelihood estimator, Π(· | m) ≈ N

  • ˆ

uMLE, 1 nI(u†)−1 , for M ∼ Pu†, whenever u† ∈ O ⊂ Rd and the prior Π has positive density on O, and the inverse Fisher information I(u†) is invertible.

14 / 27

slide-19
SLIDE 19

BvM guarantees confident credible sets

The contraction rate of the posterior distribution near u† is Π

  • u : u − u†Rd ≥ Ln

√n | m

  • →Pu† 0

as Ln, n → ∞ For a fixed d and large n computing posterior probabilities is roughly the same as computing them from N

  • ˆ

uMLE, 1

nI(f †)−1

15 / 27

slide-20
SLIDE 20

BvM guarantees confident credible sets

The contraction rate of the posterior distribution near u† is Π

  • u : u − u†Rd ≥ Ln

√n | m

  • →Pu† 0

as Ln, n → ∞ For a fixed d and large n computing posterior probabilities is roughly the same as computing them from N

  • ˆ

uMLE, 1

nI(f †)−1

Cn s.t. Π

  • u ∈ Cn | M
  • = 0.95

= ⇒ Pu†

  • u† ∈ Cn
  • → 0.95

(Bayesian credible set) (Frequentist confident set) |Cn|Rd = OPu† 1 √n

  • (Optimal diameter)

15 / 27

slide-21
SLIDE 21

Asymptotic normality of the Tikhonov regulariser

We return to the Gaussian example where the posterior is also Gaussian. The posterior mean u equals the MAP estimate which equals the Tikhonov-regulariser u = arg min

u

  • Au − m2

Rk + u2 Σ

  • .

Then the following convergence holds under Pu† √n(u − u†) → Z ∼ N(0, I(u†)−1) as n → ∞.

16 / 27

slide-22
SLIDE 22

Confident credible sets

We can now construct a confidence set for Tikhonov regulariser: Consider a credible set Cn =

  • u ∈ Rd : u − u ≤ Rn

√n

  • ,

Rn s.t. Π(Cn | m) = 0.95. Then the frequentist coverage probability of Cn will satisfy Pu†

  • u† ∈ Cn
  • → 0.95

and Rn →Pu† Φ−1(0.95) as n → ∞. Here Φ−1 is a continuous inverse of Φ = P(Z ≤ ·) with Z ∼ N(0, I(u†)−1).

17 / 27

slide-23
SLIDE 23

Discretisation of m is given by the measurement device but the discretisation of u can be chosen freely m ∈ Rk u ∈ Rn k = 4 n = 48

18 / 27

slide-24
SLIDE 24

The discretisations are independent m ∈ Rk u ∈ Rn k = 8 n = 156

19 / 27

slide-25
SLIDE 25

The discretisations are independent m ∈ Rk u ∈ Rn k = 24 n = 440

20 / 27

slide-26
SLIDE 26

The measurement is always discrete but the unknown is usually a continuous function m ∈ R4 u ∈ L2

21 / 27

slide-27
SLIDE 27

We often want to use a continuous model for theory m = Au + ε

22 / 27

slide-28
SLIDE 28

Nonparametric models

In many applications it is natural to use a statistical regression model Mi = (AU)(xi) + Ni, i = 1, ..., n, Ni ∼ N(0, 1), where xi ∈ O are measurement points and A is a forward operator. The goal is to infer U from the data (Mi).

23 / 27

slide-29
SLIDE 29

Nonparametric models

In many applications it is natural to use a statistical regression model Mi = (AU)(xi) + Ni, i = 1, ..., n, Ni ∼ N(0, 1), where xi ∈ O are measurement points and A is a forward operator. The goal is to infer U from the data (Mi). For the theory we use a continuous model, which corresponds (xi) growing dense in the domain O. If W is a Gaussian white noise process in the Hilbert space H then M = AU + εW, ε = 1 √n noise level, M ∼ Pu† Note that usually Au ∈ L2 but W ∈ H−s only with s > d/2.

23 / 27

slide-30
SLIDE 30

Gaussian priors are often used for inverse problems

Gaussian priors Π are often used in practice: see e.g Kaipio & Somersalo (2005), Stuart (2010), Dashti & Stuart (2016).

24 / 27

slide-31
SLIDE 31

Gaussian priors are often used for inverse problems

Gaussian priors Π are often used in practice: see e.g Kaipio & Somersalo (2005), Stuart (2010), Dashti & Stuart (2016). Using the Cameron-Martin theorem we can formally write dΠ(· | m) ∝ eℓ(u)dΠ(u) ∝ eℓ(u)− 1

2 u2 VΠ,

where ℓ(u) = 1

ε2 m, Au − 1 2ε2 Au2, and VΠ denotes the

Cameron-Martin space of Π.

24 / 27

slide-32
SLIDE 32

Gaussian priors are often used for inverse problems

Gaussian priors Π are often used in practice: see e.g Kaipio & Somersalo (2005), Stuart (2010), Dashti & Stuart (2016). Using the Cameron-Martin theorem we can formally write dΠ(· | m) ∝ eℓ(u)dΠ(u) ∝ eℓ(u)− 1

2 u2 VΠ,

where ℓ(u) = 1

ε2 m, Au − 1 2ε2 Au2, and VΠ denotes the

Cameron-Martin space of Π. The Cameron-Martin space characterises the directions in which a Gaussian measure can be shifted to obtain an equivalent Gaussian measure. If U ∼ N(0, Σ) then u2

VΠ = Σ−1/2u2 L2.

24 / 27

slide-33
SLIDE 33

If u is a function the classical BvM theorem does not hold

Semi-parametric approach

For fixed test function ψ ∈ Ψ we study the induced posterior distribution for the one-dimensional variable U, ψ, U ∼ Π(· | m). The idea is to determine possibly maximal families Ψ of functions ψ for which the Gaussian asymptotics 1 ε

  • U, ψ − ˆ

u(m)

  • m
  • → Z ∼ N(0, I(u†, ψ)−1)

can be obtained, as ε → 0. Above ˆ u(m) is an efficient estimator of u†, ψ. This approach has been used in recent papers Castillo & Nickl (2013, 2014), Monard, Nickl & Paternain (2019), Nickl (2018) and Giordano & Kekkonen (2018)

25 / 27

slide-34
SLIDE 34

Example: elliptic boundary value problem

Let O ⊂ Rd be bounded with C∞ boundary ∂O. We are interested in recovering the unknown source f ∈ L2(O) in

  • Lv = −∇ · (σ∇v) = f
  • n O

v = 0

  • n ∂O

from noisy observations M = L−1f + εW.

26 / 27

slide-35
SLIDE 35

Example: elliptic boundary value problem

Let O ⊂ Rd be bounded with C∞ boundary ∂O. We are interested in recovering the unknown source f ∈ L2(O) in

  • Lv = −∇ · (σ∇v) = f
  • n O

v = 0

  • n ∂O

from noisy observations M = L−1f + εW. The forward operator is A = L−1 : L2 → L2 is smoothing of order 2. We assign f a "correctly specified" centred Gaussian prior Π on L2. That is, f † ∈ Hα, α ≥ 0, and Π has Cameron-Martin space VΠ = Hr with d

2 ≤ r ≤ α + d 2.

26 / 27

slide-36
SLIDE 36

Semi-parametric BvM theorem for elliptic BVP

Let Π be the Gaussian prior described above and Π(· | M) the resulting posterior distribution.

Theorem 1 (Giordano and K. 2018)

If f ∼ Π(· | M) and ¯ f = EΠ(f | M), then for all ψ ∈ Hβ

c , β > 2 + d/2,

L 1 εf − ¯ f, ψL2

  • M
  • L

→ N(0, Lψ2

L2)

in PM

f †-probability as ε → 0.

Here we have denoted ψ ∈ Hβ

c = {ψ ∈ Hβ, supp(ψ) O}.

27 / 27