Bayesian inference and mathematical imaging. Part II: Markov chain - - PowerPoint PPT Presentation

bayesian inference and mathematical imaging part ii
SMART_READER_LITE
LIVE PREVIEW

Bayesian inference and mathematical imaging. Part II: Markov chain - - PowerPoint PPT Presentation

Bayesian inference and mathematical imaging. Part II: Markov chain Monte Carlo. Dr. Marcelo Pereyra http://www.macs.hw.ac.uk/ mp71/ Maxwell Institute for Mathematical Sciences, Heriot-Watt University January 2019, CIRM, Marseille. M.


slide-1
SLIDE 1

Bayesian inference and mathematical imaging. Part II: Markov chain Monte Carlo.

  • Dr. Marcelo Pereyra

http://www.macs.hw.ac.uk/∼mp71/

Maxwell Institute for Mathematical Sciences, Heriot-Watt University

January 2019, CIRM, Marseille.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 0 / 47

slide-2
SLIDE 2

Outline

1

Bayesian inference in imaging inverse problems

2

Proximal Markov chain Monte Carlo

3

Uncertainty quantification in astronomical and medical imaging

4

Image model selection and model calibration

5

Conclusion

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 1 / 47

slide-3
SLIDE 3

Imaging inverse problems

We are interested in an unknown image x ∈ Rd. We measure y, related to x by a statistical model p(y∣x). The recovery of x from y is ill-posed or ill-conditioned, resulting in significant uncertainty about x. For example, in many imaging problems y = Ax + w, for some operator A that is rank-deficient, and additive noise w.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 2 / 47

slide-4
SLIDE 4

The Bayesian framework

We use priors to reduce uncertainty and deliver accurate results. Given the prior p(x), the posterior distribution of x given y p(x∣y) = p(y∣x)p(x)/p(y) models our knowledge about x after observing y. In this talk we consider that p(x∣y) is log-concave; i.e., p(x∣y) = exp{−φ(x)}/Z , where φ(x) is a convex function and Z = ∫ exp{−φ(x)}dx.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 3 / 47

slide-5
SLIDE 5

Maximum-a-posteriori (MAP) estimation

The predominant Bayesian approach in imaging is MAP estimation ˆ xMAP = argmax

x∈Rd

p(x∣y), = argmin

x∈Rd

φ(x), (1) computed efficiently, even in very high dimensions, by (proximal) convex

  • ptimisation (Chambolle and Pock, 2016).
  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 4 / 47

slide-6
SLIDE 6

Illustrative example: astronomical image reconstruction

Recover x ∈ Rd from low-dimensional degraded observation y = MFx + w, where F is the continuous Fourier transform, M ∈ Cm×d is a measurement

  • perator and w is Gaussian noise. We use the model

p(x∣y) ∝ exp(−∥y − MFx∥2/2σ2 − θ∥Ψx∥1)1Rn

+(x).

(2)

y ˆ xMAP Figure : Radio-interferometric image reconstruction of the W28 supernova.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 5 / 47

slide-7
SLIDE 7

MAP estimation by proximal optimisation

To compute ˆ xMAP we use a proximal splitting algorithm. Let f (x) = ∥y − MFx∥2/2σ2 , and g(x) = θ∥Ψx∥1 + −log 1Rn

+(x),

where f and g are l.s.c. convex on Rd, and f is Lf -Lipschitz differentiable. For example, we could use a proximal gradient iteration xm+1 = prox

L−1

f

g {xm + L−1 f ∇f (xm)},

converges to ˆ xMAP at rate O(1/m), with poss. acceleration to O(1/m2). Definition For λ > 0, the λ-proximal operator of a convex l.s.c. function g is defined as (Moreau, 1962) proxλ

g(x) ≜ argmin u∈RN

g(u) + 1 2λ∣∣u − x∣∣2.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 6 / 47

slide-8
SLIDE 8

MAP estimation by proximal optimisation

The alternating direction method of multipliers (ADMM) algorithm xm+1 = proxλ

f {zm − um},

zm+1 = proxλ

g{xm+1 + um},

um+1 = um + xm+1 − zm+1, also converges to ˆ xMAP very quickly, and does not require f to be smooth. However, MAP estimation has some limitations, e.g.,

1 it provides little information about p(x∣y), 2 it struggles with unknown/partially unknown models, 3 it is not theoretically well understood (yet).

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 7 / 47

slide-9
SLIDE 9

Outline

1

Bayesian inference in imaging inverse problems

2

Proximal Markov chain Monte Carlo

3

Uncertainty quantification in astronomical and medical imaging

4

Image model selection and model calibration

5

Conclusion

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 8 / 47

slide-10
SLIDE 10

Inference by Markov chain Monte Carlo integration

Monte Carlo integration Given a set of samples X1,...,XM distributed according to p(x∣y), we approximate posterior expectations and probabilities 1 M

M

m=1

h(Xm) → E{h(x)∣y}, as M → ∞ Markov chain Monte Carlo: Construct a Markov kernel Xm+1∣Xm ∼ K(⋅∣Xm) such that the Markov chain X1,...,XM has p(x∣y) as stationary distribution. MCMC simulation in high-dimensional spaces is very challenging.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 9 / 47

slide-11
SLIDE 11

Unadjusted Langevin algorithm

Suppose for now that p(x∣y) ∈ C1. Then, we can generate samples by mimicking a Langevin diffusion process that converges to p(x∣y) as t → ∞, X ∶ dXt = 1 2∇log p (Xt∣y)dt + dWt, 0 ≤ t ≤ T, X(0) = x0. where W is the n-dimensional Brownian motion. Because solving Xt exactly is generally not possible, we use an Euler Maruyama approximation and obtain the “unadjusted Langevin algorithm” ULA ∶ Xm+1 = Xm + δ∇log p(Xm∣y) + √ 2δZm+1, Zm+1 ∼ N(0,In) ULA is remarkably efficient when p(x∣y) is sufficiently regular.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 10 / 47

slide-12
SLIDE 12

Metropolis-adjusted Langevin algorithm

ULA does not exactly target p(x∣y) because of the time-discrete

  • approximation. In many problems this estimation bias is acceptable.

This error can be removed by using a so-called Metropolis-Hastings

  • correction. Given Xm at iteration m, we perform

1 A ULA step:

X ∗ = Xm + δ∇log p(Xm∣y) + √ 2δZm+1, Zm+1 ∼ N(0,In),

2 With probability ρ(X ∗,Xm) we set Xm+1 = X ∗, else set Xm+1 = Xm,

ρ(X ∗,Xm) = min[1, p(X ∗∣y) p(Xm∣y) p(Xm∣X ∗) p(X ∗∣Xm)] .

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 11 / 47

slide-13
SLIDE 13

Metropolis-adjusted Langevin algorithm

Some observations: This correction removes the bias at the expense of additional variance. The efficiency of the method depends strongly on δ. The optimal efficiency is achieved for E(ρ) ≈ 0.6 as dimension d → ∞.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 12 / 47

slide-14
SLIDE 14

Metropolis-adjusted Langevin algorithm

Some observations: This correction removes the bias at the expense of additional variance. The efficiency of the method depends strongly on δ. The optimal efficiency is achieved for E(ρ) ≈ 0.6 as dimension d → ∞.

1 A ULA step:

X ∗ = Xm + δm+1∇log p(Xm∣y) + √ 2δm+1Zm+1, Zm+1 ∼ N(0,In),

2 With probability ρ(X ∗,Xm) we set Xm+1 = X ∗, else set Xm+1 = Xm,

ρ(X ∗,Xm) = min[1, p(X ∗∣y) p(Xm∣y) p(Xm∣X ∗) p(X ∗∣Xm)] .

3 Update δm+2 = δm+1 + αm+1(ρ(X ∗,Xm) − 0.6), for some {αm}∞

m=1.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 13 / 47

slide-15
SLIDE 15

Non-smooth models

Suppose that p(x∣y) ∝ exp{−f (x) − g(x)} (3) where f (x) and g(x) are l.s.c. convex functions from Rd → (−∞,+∞], f is Lf -Lipschitz differentiable, and g ∉ C1. For example, f (x) =

1 2σ2 ∥y − Ax∥2 2,

g(x) = α∥Bx∥† + 1S(x), for some linear operators A, B, norm ∥ ⋅ ∥†, and convex set S. Unfortunately, such non-models are beyond the scope of ULA. Idea: Regularise p(x∣y) to enable efficiently Langevin sampling.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 14 / 47

slide-16
SLIDE 16

Approximation of p(x∣y)

Moreau-Yoshida approximation of p(x∣y) (Pereyra, 2015): Let λ > 0. We propose to approximate p(x∣y) with the density pλ(x∣y) = exp[−f (x) − gλ(x)] ∫Rd exp[−f (x) − gλ(x)]dx , where gλ is the Moreau-Yoshida envelope of g given by gλ(x) = inf

u∈Rd{g(u) + (2λ)−1∥u − x∥2 2},

and where λ controls the approximation error involved.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 15 / 47

slide-17
SLIDE 17

Moreau-Yoshida approximations

Key properties (Pereyra, 2015; Durmus et al., 2018):

1 ∀λ > 0, pλ defines a proper density of a probability measure on Rd. 2 Convexity and differentiability:

pλ is log-concave on Rd. pλ ∈ C1 even if p not differentiable, with ∇log pλ(x∣y) = −∇f (x) + {proxλ

g (x) − x}/λ,

and proxλ

g (x) = argminu∈RN g(u) + 1 2λ∣∣u − x∣∣2.

∇log pλ is Lipchitz continuous with constant L ≤ Lf + λ−1.

3 Approximation error between pλ(x∣y) and p(x∣y):

limλ→0 ∥pλ − p∥TV = 0. If g is Lg-Lipchitz, then ∥pλ − p∥TV ≤ λL2

g.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 16 / 47

slide-18
SLIDE 18

Illustration

Examples of Moreau-Yoshida approximations:

p(x) ∝ exp(−∣x∣) p(x) ∝ exp(−x4) p(x) ∝ 1[−0.5,0.5](x) Figure : True densities (solid blue) and approximations (dashed red).

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 17 / 47

slide-19
SLIDE 19

Proximal ULA

We approximate X with the “regularised” auxiliary Langevin diffusion Xλ ∶ dXλ

t = 1

2∇log pλ (Xλ

t ∣y)dt + dWt,

0 ≤ t ≤ T, Xλ(0) = x0, which targets pλ(x∣y). Remark: we can make Xλ arbitrarily close to X. Finally, an Euler Maruyama discretisation of Xλ leads to the (Moreau-Yoshida regularised) proximal ULA MYULA ∶ Xm+1 = (1 − δ

λ)Xm − δ∇f {Xm} + δ λ proxλ g{Xm} +

√ 2δZm+1, where we used that ∇gλ(x) = {x − proxλ

g(x)}/λ.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 18 / 47

slide-20
SLIDE 20

Convergence results

Non-asymptotic estimation error bound

Theorem 2.1 (Durmus et al. (2018))

Let δmax

λ

= (L1 + 1/λ)−1. Assume that g is Lipchitz continuous. Then, there exist δǫ ∈ (0,δmax

λ

] and Mǫ ∈ N such that ∀δ < δǫ and ∀M ≥ Mǫ ∥δx0QM

δ − p∥TV < ǫ + λL2 g,

where QM

δ

is the kernel assoc. with M iterations of MYULA with step δ. Note: δǫ and Mǫ are explicit and tractable. If f + g is strongly convex

  • utside some ball, then Mǫ scales with order O(d log(d)). See Durmus

et al. (2018) for other convergence results.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 19 / 47

slide-21
SLIDE 21

Illustration

Illustrative examples:

p(x) ∝ exp(−∣x∣) p(x) ∝ exp(−x4) p(x) ∝ 1[−0.5,0.5](x) Figure : True densities (blue) and MC approximations (red histogram).

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 20 / 47

slide-22
SLIDE 22

Modern Bayesian computation

Recent surveys on Bayesian computation...

25th anniversary special issue on Bayesian computation

  • P. Green, K. Latuszynski, M. Pereyra, C. P. Robert, ”Bayesian computation: a perspective on

the current state, and sampling backwards and forwards”, Statistics and Computing, vol. 25,

  • no. 4, pp 835-862, Jul. 2015.

Special issue on “Stochastic simulation and optimisation in signal processing”

  • M. Pereyra, P. Schniter, E. Chouzenoux, J.-C. Pesquet, J.-Y. Tourneret, A. Hero, and S.

McLaughlin, “A Survey of Stochastic Simulation and Optimization Methods in Signal Pro- cessing” IEEE Sel. Topics in Signal Processing, vol. 10, no. 2, pp 224 - 241, Mar. 2016.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 21 / 47

slide-23
SLIDE 23

Outline

1

Bayesian inference in imaging inverse problems

2

Proximal Markov chain Monte Carlo

3

Uncertainty quantification in astronomical and medical imaging

4

Image model selection and model calibration

5

Conclusion

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 22 / 47

slide-24
SLIDE 24

Uncertainty quantification in radio-interferometric imaging

Where does the posterior probability mass of x lie? A set Cα is a posterior credible region of confidence level (1 − α)% if P[x ∈ Cα∣y] = 1 − α. The highest posterior density (HPD) region is decision-theoretically

  • ptimal (Robert, 2001)

C ∗

α = {x ∶ φ(x) ≤ γα}

with γα ∈ R chosen such that ∫C ∗

α p(x∣y)dx = 1 − α holds.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 23 / 47

slide-25
SLIDE 25

Visualising uncertainty in radio-interferometric imaging

Astro-imaging experiment with redundant wavelet frame (Cai et al., 2017).

ˆ xpenMLE (y) ˆ xMAP (by optimisation) credible intervals (scale 10 × 10) ˆ xpenMLE (y) ˆ xMAP (by optimisation) credible intervals (scale 10 × 10)

3C2888 and M31 radio galaxies (size 256 × 256 pixels). Estimation error w.r.t. MH implementation 3%.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 24 / 47

slide-26
SLIDE 26

Visualising uncertainty in radio-interferometric imaging

Astro-imaging experiment with redundant wavelet frame (Cai et al., 2017).

ˆ xpenMLE (y) ˆ xMMSE = E(x∣y) ˆ xMMSE = E(x∣y) (Px-MALA) ˆ xpenMLE (y) ˆ xMMSE = E(x∣y) ˆ xMMSE = E(x∣y) (Px-MALA)

3C2888 and M31 radio galaxies (size 256 × 256 pixels).

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 25 / 47

slide-27
SLIDE 27

Visualising uncertainty in radio-interferometric imaging

Astro-imaging experiment with redundant wavelet frame (Cai et al., 2017).

ˆ xpenMLE (y) ˆ xMMSE = E(x∣y) ˆ xMAP (by optimisation) ˆ xpenMLE (y) ˆ xMMSE = E(x∣y) ˆ xMAP (by optimisation)

3C2888 and M31 radio galaxies. Visual comparison with MAP estimation.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 26 / 47

slide-28
SLIDE 28

Outline

1

Bayesian inference in imaging inverse problems

2

Proximal Markov chain Monte Carlo

3

Uncertainty quantification in astronomical and medical imaging

4

Image model selection and model calibration

5

Conclusion

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 27 / 47

slide-29
SLIDE 29

Bayesian Model Selection

The Bayesian framework provides theory for comparing models objectively. Given K alternative models {Mj}K

j=1 with posterior densities

Mj ∶ pj(x∣y) = pj(y∣x)pj(x))/pj(y), we compute the (marginal) posterior probability of each model, i.e., p(Mj∣y) ∝ p(y∣Mj)p(Mj) (4) where p(y∣Mj) ≜ pj(y) = ∫ pj(y∣x)pj(x)dx measures model-fit-to-data. We then select for our inferences the “best” model, i.e., M∗ = argmax

j∈{1,...,K}

p(Mj∣y).

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 28 / 47

slide-30
SLIDE 30

Experiment setup

We degrade the Boat image of size 256 × 256 pixels with a 5 × 5 uniform blur operator A∗ and Gaussian noise w ∼ N(0,σ2IN) with σ = 0.5. y = A∗x + w We consider four alternative models to estimate x, given by Mj ∶ pj(x∣y) ∝ exp[−(∥y − Ajx∥2/2σ2) − βjφj(x)] (5) with fixed hyper-parameters σ and β, and where: M1: A1 is the correct blur operator and φj(x) = TV (x). M2: A2 is a mildly misspecified blur operator and φj(x) = TV (x). M3: A3 is the correct blur operator and φj(x) = ∥Ψx∥1. M4: A4 is a mildly misspecified blur operator and φj(x) = ∥Ψx∥1. where Ψ is a wavelet frame and TV (x) = ∥∇dx∥1−2 is the total-variation pseudo-norm. The βj are adjusted automatically (see model calibration).

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 29 / 47

slide-31
SLIDE 31

Monte Carlo strategy

To perform model selection we use MYULA to approximate the posterior probabilities p(Mj∣y) for j = 1,2,3,4 by Monte Carlo integration. For each model we generate n = 105 samples {X j

k}n k=1 ∼ p(x∣y,Mj) and

use the truncated harmonic mean estimator p(y∣Mj) ≈ (

n

k=1

1S⋆(X M

k )

p(X M

k ,y∣Mj)) −1

vol(S⋆), j = {1,2,3,4} (6) where S⋆ is a union of highest posterior density sets of p(x∣y,Mj), also estimated from {X j

k}n k=1.

Computing time approx. 30 minutes per model.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 30 / 47

slide-32
SLIDE 32

Numerical results

We obtain that p(M1∣y) ≈ 0.68 and p(M3∣y) ≈ 0.27 with the correct blur are the best models, p(M2∣y) < 0.05 and p(M4∣y) < 0.01 perform poorly.

y M1 ˆ xMAP (PSNR 34.1dB) p(M1∣y) ≈ 0.68 M3 ˆ xMAP (PSNR 32.9dB) p(M3∣y) ≈ 0.27 Figure : MAP estimation results for the Boat image deblurring experiment. (Note: error w.r.t. “exact” probabilities from Px-MALA approx. 0.5%.)

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 31 / 47

slide-33
SLIDE 33

Numerical results

MYULA and Px-MALA efficiency comparison:

(a) (b) Figure : (a) Convergence of the chains to the typical set of x∣y under model M1, (b) chain autocorrelation function (ACF).)

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 32 / 47

slide-34
SLIDE 34

Empirical Bayesian model calibration

For illustration, consider the class of Bayesian models p(x∣y,θ) = p(y∣x)p(x∣θ) p(y∣θ) , parametrised by a regularisation parameter θ ∈ Θ. For example, p(x∣θ) = 1 C(θ) exp{−θϕ(x)}, p(y∣x) ∝ exp{−fy(x)}, with fy and ϕ convex l.s.c. functions, and fy L-Lipschitz differentiable. We assume that p(x∣θ) is proper, i.e., C(θ) = ∫Rd exp{−θϕ(x)}dx < ∞, with C(θ) unknown and generally intractable.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 33 / 47

slide-35
SLIDE 35

Maximum-a-posteriori estimation

If θ is fixed, the posterior p(x∣y,θ) is log-concave and ˆ xMAP = argmin

x∈Rd

fy(x) + θϕ(x) is a convex optimisation problem that can be often solved efficiently. For example, the proximal gradient algorithm xm+1 = proxL−1

ϕ {xm + L−1∇fy(xm)},

converges to ˆ xMAP as m → ∞. However, when θ is unknown this significantly complicates the problem.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 34 / 47

slide-36
SLIDE 36

Regularisation parameter MLE

We adopt an empirical Bayes approach and calibrate the model maximising the evidence or marginal likelihood, i.e., ˆ θ = argmax

θ∈Θ

p(y∣θ), = argmax

θ∈Θ

∫Rd p(y,x∣θ)dx , which we solve efficiently by using a stochastic gradient algorithm driven by two proximal MCMC kernels (see Fernandez-Vidal and Pereyra (2018)). Given ˆ θ, we then straightforwardly compute ˆ xMAP = argmin

x∈Rd

fy(x) + ˆ θϕ(x). (7)

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 35 / 47

slide-37
SLIDE 37

Projected gradient algorithm

Assume that Θ is convex, and that ˆ θ is the only root of ∇θ log p(y∣θ) in Θ. Then ˆ θ is also the unique solution of the fixed-point equation θ = PΘ [θ + δ∇θ log p(y∣θ)] . where PΘ is the projection operator on Θ and δ > 0. If ∇log p(y∣θ) was tractable, we could compute ˆ θ iteratively by using θ(t+1) = PΘ [θ(t) + δt∇θ log p(y∣θ(t))], with sequence δt = αt−β, α > 0, β ∈ [1/2,1]. However, ∇log p(y∣θ) is “doubly” intractable...

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 36 / 47

slide-38
SLIDE 38

Stochastic projected gradient algorithm

To circumvent the intractability of ∇θ log p(y∣θ) we use Fisher’s identity ∇θ log p(y∣θ) = Ex∣y,θ{∇θ log p(x,y∣θ)}, = −Ex∣y,θ{ϕ + ∇θ log C(θ)}, together with the identity ∇θ log C(θ) = −Ex∣θ{ϕ(x)}, to obtain ∇θ log p(y∣θ) = Ex∣θ{ϕ(x)} − Ex∣y,θ{ϕ(x)}. This leads to the equivalent fixed-point equation θ = PΘ (θ + δEx∣θ{ϕ(x)} − δEx∣y,θ{ϕ(x)}), (8) which we solve by using a stochastic approximation algorithm.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 37 / 47

slide-39
SLIDE 39

Stochastic Approximation algorithm to compute ˆ θ

We use the following MCMC-driven stochastic gradient algorithm: Initialisation x(0),u(0) ∈ Rd, θ(0) ∈ Θ, δt = δ0t−0.8. for t = 0 to n

  • 1. MCMC update x(t+1) ∼ Mx∣y,θ(t)(⋅∣x(t)) targeting p(x∣y,θ(t))
  • 2. MCMC update u(t+1) ∼ Kx∣θ(t)(⋅∣u(t)) targeting p(x∣θ(t))
  • 3. Stoch. grad. update

θ(t+1) = PΘ [θ(t) + δtϕ(u(t+1)) − δtϕ(x(t+1))]. end for Output The iterates θ(t) → ˆ θ as n → ∞.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 38 / 47

slide-40
SLIDE 40

SAPG algorithm driven MCMC kernels

Initialisation x(0),u(0) ∈ Rd, θ(0) ∈ Θ, δt = δ0t−0.8, λ = 1/L, γ = 1/4L. for t = 0 to n

  • 1. Coupled Proximal MCMC updates: generate z(t+1) ∼ N(0,Id)

x(t+1) = (1 − γ λ)x(t) − γ∇fy (x(t)) + γ λproxθλ

ϕ (x(t)) +

√ 2γz(t+1) , u(t+1) = (1 − γ λ)u(t) + γ λproxθλ

ϕ (u(t)) +

√ 2γz(t+1) ,

  • 2. Stochastic gradient update

θ(t+1) = PΘ [θ(t) + δtϕ(u(t+1)) − δtϕ(x(t+1))]. end for Output Averaged estimator ¯ θ = n−1 ∑n

t=1 θ(t+1) converges approx. to ˆ

θ.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 39 / 47

slide-41
SLIDE 41

Illustrative example - Image deblurring with ℓ1 prior

We consider again the live-cell microscopy setup p(x∣y,θ) ∝ exp(−∥y − Ax∥2/2σ2 − θ∥x∥1), and compute ˆ θ = argmaxθ∈R+ p(y∣θ).

y ˆ xMAP

  • Reg. param θ

Figure : Molecules image deconvolution experiment, computing time 0.75 secs.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 40 / 47

slide-42
SLIDE 42

Illustrative example - Image deblurring with TV-ℓ2 prior

Similarly, for the Bayesian image deblurring model p(x∣y,θ) ∝ exp(−∥y − Ax∥2/2σ2 − α∥x∥2 − θ∥∇dx∥1−2), we compute ˆ θ = argmaxθ∈R+ p(y∣θ).

y

  • Reg. param θ

Estimation error for ˆ xMAP Figure : Boat image deconvolution experiment.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 41 / 47

slide-43
SLIDE 43

Image deblurring with TV-ℓ2 prior

Comparison with the (non-Bayesian) SUGAR method (Deledalle et al., 2014), and an oracle that knows the optimal value of θ. Average values over 6 test images of size 512 × 512 pixels. (a) Original (b) Degraded (c) Emp. Bayes (d) SUGAR Method SNR=20 dB SNR=30 dB SNR=40 dB

  • Avg. MSE
  • Avg. Time
  • Avg. MSE
  • Avg. Time
  • Avg. MSE
  • Avg. Time

θ∗(Oracle) 22.95 ± 3.10 – 21.05 ± 3.19 – 18.76 ± 3.19 – Empirical Bayes 23.24 ± 3.23 43.01 21.16 ± 3.24 41.50 18.90 ± 3.39 42.85 SUGAR 24.14 ±3.19 15.74 23.96 ± 3.26 20.87 23.94± 3.27 20.59

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 42 / 47

slide-44
SLIDE 44

Outline

1

Bayesian inference in imaging inverse problems

2

Proximal Markov chain Monte Carlo

3

Uncertainty quantification in astronomical and medical imaging

4

Image model selection and model calibration

5

Conclusion

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 43 / 47

slide-45
SLIDE 45

Conclusion

The challenges facing modern imaging sciences require a methodological paradigm shift to go beyond point estimation. The Bayesian framework can support this paradigm shift, but this requires significantly accelerating computation methods. We explored improving efficiency by integrating modern stochastic and variational approaches.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 44 / 47

slide-46
SLIDE 46

Conclusion

In the next lecture... We will explore ways of accelerating Bayesian inference even further by combining variational approaches with high-dimensional probability theory, bypassing Markov chain Monte Carlo methods.

Thank you!

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 45 / 47

slide-47
SLIDE 47

Bibliography:

Cai, X., Pereyra, M., and McEwen, J. D. (2017). Uncertainty quantification for radio interferometric imaging II: MAP estimation. ArXiv e-prints. Chambolle, A. and Pock, T. (2016). An introduction to continuous optimization for

  • imaging. Acta Numerica, 25:161–319.

Deledalle, C.-A., Vaiter, S., Fadili, J., and Peyr´ e, G. (2014). Stein unbiased gradient estimator of the risk (sugar) for multiple parameter selection. SIAM Journal on Imaging Sciences, 7(4):2448–2487. Durmus, A., Moulines, E., and Pereyra, M. (2018). Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau. SIAM J. Imaging Sci., 11(1):473–506. Fernandez-Vidal, A. and Pereyra, M. (2018). Maximum likelihood estimation of regularisation parameters. In Proc. IEEE ICIP 2018. Green, P. J., Latuszy´ nski, K., Pereyra, M., and Robert, C. P. (2015). Bayesian computation: a summary of the current state, and samples backwards and forwards. Statistics and Computing, 25(4):835–862. Moreau, J.-J. (1962). Fonctions convexes duales et points proximaux dans un espace

  • Hilbertien. C. R. Acad. Sci. Paris S´
  • er. A Math., 255:2897–2899.

Pereyra, M. (2015). Proximal Markov chain Monte Carlo algorithms. Statistics and

  • Computing. open access paper, http://dx.doi.org/10.1007/s11222-015-9567-4.
  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 46 / 47

slide-48
SLIDE 48

Pereyra, M., Bioucas-Dias, J., and Figueiredo, M. (2015). Maximum-a-posteriori estimation with unknown regularisation parameters. In Proc. Europ. Signal Process.

  • Conf. (EUSIPCO) 2015.

Robert, C. P. (2001). The Bayesian Choice (second edition). Springer Verlag, New-York. Zhu, L., Zhang, W., Elnatan, D., and Huang, B. (2012). Faster STORM using compressed sensing. Nat. Meth., 9(7):721–723.

  • M. Pereyra (MI — HWU)

Bayesian mathematical imaging 47 / 47