Bayesian inference and convex geometry: theory, methods, and - - PowerPoint PPT Presentation

bayesian inference and convex geometry theory methods and
SMART_READER_LITE
LIVE PREVIEW

Bayesian inference and convex geometry: theory, methods, and - - PowerPoint PPT Presentation

Bayesian inference and convex geometry: theory, methods, and algorithms. Dr. Marcelo Pereyra http://www.macs.hw.ac.uk/ mp71/ Maxwell Institute for Mathematical Sciences, Heriot-Watt University March 2019, Paris. M. Pereyra (MI HWU)


slide-1
SLIDE 1

Bayesian inference and convex geometry: theory, methods, and algorithms.

  • Dr. Marcelo Pereyra

http://www.macs.hw.ac.uk/✂mp71/

Maxwell Institute for Mathematical Sciences, Heriot-Watt University

March 2019, Paris.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 0 / 47

slide-2
SLIDE 2

Outline

1

Bayesian inference in imaging inverse problems

2

MAP estimation with Bayesian confidence regions

3

A decision-theoretic derivation of MAP estimation

4

Empirical Bayes MAP estimation with unknown regularisation parameters

5

Conclusion

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 1 / 47

slide-3
SLIDE 3

Imaging inverse problems

We are interested in an unknown image x ❃ Rd. We measure y, related to x by a statistical model p❼y❙x➁. The recovery of x from y is ill-posed or ill-conditioned, resulting in significant uncertainty about x. For example, in many imaging problems y Ax ✔ w, for some operator A that is rank-deficient, and additive noise w.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 2 / 47

slide-4
SLIDE 4

The Bayesian framework

We use priors to reduce uncertainty and deliver accurate results. Given the prior p❼x➁, the posterior distribution of x given y p❼x❙y➁ p❼y❙x➁p❼x➁⑦p❼y➁ models our knowledge about x after observing y. In this talk we consider that p❼x❙y➁ is log-concave; i.e., p❼x❙y➁ exp➌✏φ❼x➁➑⑦Z , where φ❼x➁ is a convex function and Z ❘ exp➌✏φ❼x➁➑dx.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 3 / 47

slide-5
SLIDE 5

Maximum-a-posteriori (MAP) estimation

The predominant Bayesian approach in imaging is MAP estimation ˆ xMAP argmax

x❃Rd

p❼x❙y➁, argmin

x❃Rd

φ❼x➁, (1) efficiently computed by convex optimisation (Chambolle and Pock, 2016). However, MAP estimation has some limitations, e.g.,

1 it provides little information about p❼x❙y➁, 2 it is not theoretically well understood (yet), 3 it struggles with unknown/partially unknown models.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 4 / 47

slide-6
SLIDE 6

Illustrative example: astronomical image reconstruction

Recover x ❃ Rd from low-dimensional degraded observation y M❋x ✔ w, where ❋ is the continuous Fourier transform, M ❃ Cm✕d is a measurement

  • perator and w is Gaussian noise. We use the model

p❼x❙y➁ ➀ exp❽✏❨y ✏ M❋x❨2⑦2σ2 ✏ θ❨Ψx❨1➂1Rn

✔❼x➁.

(2)

y ˆ xMAP Figure: Radio-interferometric image reconstruction of the W28 supernova.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 5 / 47

slide-7
SLIDE 7

Outline

1

Bayesian inference in imaging inverse problems

2

MAP estimation with Bayesian confidence regions

3

A decision-theoretic derivation of MAP estimation

4

Empirical Bayes MAP estimation with unknown regularisation parameters

5

Conclusion

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 6 / 47

slide-8
SLIDE 8

Posterior credible regions

Where does the posterior probability mass of x lie? A set Cα is a posterior credible region of confidence level ❼1 ✏ α➁% if Px ❃ Cα❙y✆ 1 ✏ α. The highest posterior density (HPD) region is decision-theoretically

  • ptimal (Robert, 2001)

C ❻

α ➌x ✂ φ❼x➁ ❇ γα➑

with γα ❃ R chosen such that ❘C ❻

α p❼x❙y➁dx 1 ✏ α holds.

We could estimate C ❻

α by numerical integration (e.g., MCMC

sampling), but in high-dimensional log-concave settings this is not necessary because something beautiful happens...

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 7 / 47

slide-9
SLIDE 9

A concentration phenomenon...

Figure: Convergence to “typical” set ➌x ✂ log p❼x❙y➁ ☎ Elog p❼x❙y➁✆➑.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 8 / 47

slide-10
SLIDE 10

Proposed approximation of C ❻

α

Theorem 2.1 (Pereyra (2016))

Suppose that the posterior p❼x❙y➁ exp➌✏φ❼x➁➑⑦Z is log-concave on Rd. Then, for any α ❃ ❼4exp❼✏d⑦3➁,1➁, the HPD region C ❻

α is contained by

˜ Cα ➌x ✂ φ❼x➁ ❇ φ❼ˆ xMAP➁ ✔ ➸ dτα ✔ d➁➑, with universal positive constant τα ➺ 16log❼3⑦α➁ independent of p❼x❙y➁. Remark 1: ˜ Cα is a conservative approximation of C ❻

α, i.e.,

x ➯ ˜ Cα Ô ✟ x ➯ C ❻

α.

Remark 2: ˜ Cα is available as a by-product in any convex inverse problem that is solved by MAP estimation!

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 9 / 47

slide-11
SLIDE 11

Approximation error bounds

Is ˜ Cα a reliable approximation of C ❻

α?

Theorem 2.2 (Finite-dimensional error bound (Pereyra, 2016))

Let ˜ γα φ❼ˆ xMAP➁ ✔ ➸ dτα ✔ d. If p❼x❙y➁ is log-concave on Rd, then 0 ❇ ˜ γα ✏ γα d ❇ 1 ✔ ηαd✏1⑦2, with universal positive constant ηα ➺ 16log❼3⑦α➁ ✔ ➺ 1⑦α. Remark 3: ˜ Cα is stable (as d becomes large, the error ❼˜ γα ✏ γα➁⑦d á 1). Remark 4: The lower and upper bounds are asymptotically tight w.r.t. d.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 10 / 47

slide-12
SLIDE 12

Uncertainty visualisation in radio-interferometric imaging

Astro-imaging experiment with redundant wavelet frame (Cai et al., 2017). Local credible intervals at scale 10 ✕ 10 pixels.

dirty image ˆ xpenML❼y➁ ˆ xMAP

  • approx. credible intervals

“exact” intervals (MCMC)

3C2888 radio galaxy (size 256 ✕ 256 pixels, comp. time 1.8 secs.)

dirty image ˆ xpenML❼y➁ ˆ xMAP

  • approx. credible intervals

“exact” intervals (MCMC)

M31 radio galaxy (size 256 ✕ 256 pixels, comp. time 1.8 secs.)

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 11 / 47

slide-13
SLIDE 13

Hypothesis testing

Bayesian hypothesis test for specific image structures (e.g., lesions) H0 ✂ The structure of interest is ABSENT in the true image H1 ✂ The structure of interest is PRESENT in the true image The null hypothesis H0 is rejected with significance α if P❼H0❙y➁ ❇ α. Theorem (Repetti et al., 2018) Let ❙ denote the region of Rd associated with H0, containing all images without the structure of interest. Then ❙ ✾ ➬ ❈α ❣ Ô ✟ P❼H0❙y➁ ❇ α . If in addition ❙ is convex, then checking ❙ ✾ ➬ ❈α ❣ is a convex problem min

¯ x, x❃Rd ❨¯

x ✏ x❨2

2

s.t. ¯ x ❃ ➬ ❈α , x ❃ ❙ .

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 12 / 47

slide-14
SLIDE 14

Uncertainty quantification in MRI imaging

ˆ xMAP ¯ x ❃ ➬ ❈0.01 x ❃ ❙ ˆ xMAP (zoom) ¯ x ❃ ➬ ❈0.01 (zoom) x ❃ ❙ (zoom)

MRI experiment: test images ¯ x x, hence we fail to reject H0 and conclude that there is little evidence to support the observed structure.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 13 / 47

slide-15
SLIDE 15

Uncertainty quantification in MRI imaging

ˆ xMAP ¯ x ❃ ❈0.01 x ❃ ❙0 ˆ xMAP (zoom) ¯ x ❃ ❈0.01 (zoom) x ❃ ❙0 (zoom)

MRI experiment: test images ¯ x ① x, hence we reject H0 and conclude that there is significant evidence in favour of the observed structure.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 14 / 47

slide-16
SLIDE 16

Outline

1

Bayesian inference in imaging inverse problems

2

MAP estimation with Bayesian confidence regions

3

A decision-theoretic derivation of MAP estimation

4

Empirical Bayes MAP estimation with unknown regularisation parameters

5

Conclusion

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 15 / 47

slide-17
SLIDE 17

Bayesian point estimators

Bayesian point estimators arise from the decision ”what point ˆ x ❃ Rd summarises x❙y best?”. The optimal decision under uncertainty is ˆ xL argmin

u❃Rd

E➌L❼u,x➁❙y➑ argmin

u❃Rd

❙ L❼u,x➁p❼x❙y➁dx where the loss L❼u,x➁ measures the “dissimilarity” between u and x. Example: Euclidean setting L❼u,x➁ ❨u ✏ x❨2 and ˆ xL ˆ xMMSE E➌x❙y➑. General desiderata:

1 L❼u,x➁ ❈ 0, ➛u,x ❃ Rd , 2 L❼u,x➁ 0 ✡

✟ u x ,

3 L strictly convex w.r.t. its first argument (for estimator uniqueness).

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 16 / 47

slide-18
SLIDE 18

Bayesian point estimators

Does the convex geometry of p❼x❙y➁ define an interesting loss L❼u,x➁? We use differential geometry to relate the convexity of p❼x❙y➁, the geometry of the parameter space, and the loss L to perform estimation.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 17 / 47

slide-19
SLIDE 19

Differential geometry

A Riemannian manifold ▼ ❼Rd,g➁, with metric g ✂ Rd ❙d

✔✔ and global

coordinate system x, is a vector space that is locally Euclidean. For any point x ❃ Rd we have an Euclidean tangent space ❚xRd with inner product ❵u,x❡ u➋g❼x➁x and norm ❨x❨ ➺ x➋g❼x➁x. This geometry is local and may vary smoothly from ❚xRd to ❚x➐Rd following the affine connection Γ ❃ Rd✕d✕d, given by Γij, k❼x➁ ∂kgi,j❼x➁.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 18 / 47

slide-20
SLIDE 20

Divergence functions

Similarly to Euclidean spaces, the manifold ❼Rd,g➁ supports divergences:

Definition 1 (Divergence functions)

A function D ✂ Rd ✕ Rd R is a divergence function on Rd if the following conditions hold for any u,x ❃ Rd: D❼u,x➁ ❈ 0, ➛u,x ❃ Rd, D❼u,x➁ 0 ✡ ✟ x u, D❼u,x➁ is strongly convex w.r.t. u, and ❈2 w.r.t u and x.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 19 / 47

slide-21
SLIDE 21

Canonical divergence

We focus on the canonical divergence on ❼Rd,g➁, a generalisation of the Euclidean squared distance to this kind of manifold:

Definition 2 (Canonical divergence (Ay and Amari, 2015))

For any ❼u,x➁ ❃ Rd ✕ Rd, the canonical divergence on ❼Rd,g➁ is given by D❼u,x➁ ❙

1 0 t ˙

γt➋g❼γt➁˙ γtdt (3) where γt is the Γ-geodesic from u to x and ˙ γt d⑦dt γt.

1 D fully species ❼Rd,g➁ and vice-versa. 2 D❼x ✔ dx,x➁ ❨dx❨2⑦2 ✔ o❼❨dx❨2➁ where ❨ ❨ is the norm on ❚xRd. 3 For Euclidean space with ❵u,x❡ u➋gx, D❼u,x➁ 1

2❼u ✏ x➁➋g❼u ✏ x➁.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 20 / 47

slide-22
SLIDE 22

Differential-geometric derivation of ˆ xMAP and ˆ xMMSE

Theorem 3 (Canonical Bayesian estimators - Part 1 (Pereyra, 2016))

Suppose that φ❼x➁ ✏log p❼x❙y➁ is strongly convex, continuous, and ❈3

  • n Rd. Let ❼Rd,g➁ be the manifold induced by φ, i.e., gi,j❼x➁ ∂i∂jφ❼x➁.

Then, the canonical divergence on ❼Rd,g➁ is the φ-Bregman divergence Dφ❼u,x➁ φ❼u➁ ✏ φ❼x➁ ✏ ➞φ❼x➁❼u ✏ x➁. Remark: Because φ is strongly convex, then φ❼u➁ ❆ φ❼x➁ ✏ ➞φ❼x➁❼u ✏ x➁ for any u ① x. The divergence Dφ❼u,x➁ quantifies this gap, related to the length of the affine geodesic from u to x on the space induced by p❼x❙y➁.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 21 / 47

slide-23
SLIDE 23

Differential-geometric derivation of ˆ xMAP and ˆ xMMSE

Theorem 4 (Canonical Bayesian estimators - Part 2 (Pereyra, 2016))

The Bayesian estimator associated with Dφ❼u,x➁ is unique and is given by ˆ xDφ ❁ argmin

u❃Rd

Ex❙yDφ❼u,x➁✆, argmin

x❃Rd

φ❼x➁, ˆ xMAP . Remark2: ˆ xMAP stems from Bayesian decision theory, and hence it stands

  • n the same theoretical footing as the core Bayesian methodologies.

Remark3: The definition of the MAP estimator as the maximiser ˆ xMAP argmaxx❃Rd p❼x❙y➁ is mainly algorithmic for these models.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 22 / 47

slide-24
SLIDE 24

Differential-geometric derivation of ˆ xMAP and ˆ xMMSE

Theorem 5 (Canonical Bayesian estimators - Part 3 (Pereyra, 2016))

Moreover, the Bayesian estimator associated with the dual canonical divergence D❻

φ❼u,x➁ Dφ❼x,u➁ is also unique and is given by

ˆ xD❻

φ ❁ argmin

u❃Rd

Ex❙yD❻

φ❼u,x➁✆,

❙Rd xp❼x❙y➁dx , ˆ xMMSE . Remark 4: ˆ xMAP and ˆ xMMSE exhibit a surprising duality, arising from the asymmetry of the canonical divergence that p❼x❙y➁ induces on Rd. Remark 5: These results carry partially to models that are not strongly convex, not smooth, or that involve constraints on the parameter space.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 23 / 47

slide-25
SLIDE 25

Expected estimation error bound

Are ˆ xMAP and ˆ xMMSE “good” estimators of x❙y ?

Proposition 3.1 (Expected canonical error bound)

Suppose that φ❼x➁ ✏log π❼x❙y➁ is convex on Rd and ❈1. Then, Ex❙y ✁D❻

φ❼ˆ

xMMSE,x➁⑦d✝ ❇ Ex❙y ✁D❻

φ❼ˆ

xMAP,x➁⑦d✝ ❇ 1.

Proposition 3.2 (Expected error w.r.t. regularisation function)

Also assume that the regularisation h❼x➁ ✏log p❼x➁ is convex. Then, Ex❙y D❻

h ❼ˆ

xMMSE,x➁⑦d✆ ❇ Ex❙y D❻

h ❼ˆ

xMAP,x➁⑦d✆ ❇ 1. Remark 6: These are high-dimensional stability results for ˆ xMAP and ˆ xMMSE; the estimation error cannot grow faster than the number of pixels.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 24 / 47

slide-26
SLIDE 26

Example 1: denoising with wavelet shrinkage prior

Consider a linear problem of the form y Ax ✔ w and a shrinkage prior on the wavelet coefficients z Wx. We consider the smoothed Laplace prior p❼z➁ ➀ exp➌✏

d

i1

λ ➻ z2

i ✔ γ2➑

where λ ❃ R✔ and γ ❃ R✔ are scale and shape parameters. The likelihood is p❼y❙z➁ ➀ exp➌✏ 1

2σ2 ❨y ✏ AW ➋z❨2 2 and hence

p❼z❙y➁ ➀ exp➌✏ 1

2σ2 ❨y ✏ AW ➋z❨2 2 ✏ d

i1

λ ➻ z2

i ✔ γ2➑

This model is ❈➟ and strongly log-concave, and hence the theory applies.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 25 / 47

slide-27
SLIDE 27

Example 1: denoising with wavelet shrinkage prior

To analyse the geometry induced by φ❼z➁ ✏log p❼z❙y➁ we suppose that A I and W ➋W I, and obtain Dφ❼u,z➁ Pd

i1 Dψ❼ui,zi➁ with

Dψ❼ui,zi➁ 1 2σ2 ❼ui ✏ zi➁2 ✔ λ ➻ z2

i ✔ γ2➻

u2

i ✔ γ2 ✏ ziui ✏ γ2

➻ z2

i ✔ γ2

. The non-quadratic term introduces additional shrinkage and leads to the differences between xMMSE and xMAP.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 26 / 47

slide-28
SLIDE 28

Example 1: denoising with wavelet shrinkage prior

Illustration with the Flinstones image (σ 0.08, λ 12 and γ 0.01).

noisy image y (SNR 17.6dB) ˆ xMAP (SNR 19.8dB) ˆ xMMSE (SNR 17.7dB) denoising functions for ˆ zMAP and ˆ zMMSE

Illustrative example of a model where the action of the shrinkage prior acts predominantly via Dψ (Note: setting γ 0 leads to ˆ xMAP with SNR 18.8dB).

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 27 / 47

slide-29
SLIDE 29

Illustrative example: astronomical image reconstruction

Generalisation warning: shrinkage priors can also act predominantly via the model (not Dψ), producing similar ˆ xMAP and ˆ xMMSE results; e.g.,

dirty image ˆ xpenML❼y➁ ˆ xMAP ˆ xMMSE

Radio-interferometric imaging of the Cygnus A galaxy Cai et al. (2017).

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 28 / 47

slide-30
SLIDE 30

Outline

1

Bayesian inference in imaging inverse problems

2

MAP estimation with Bayesian confidence regions

3

A decision-theoretic derivation of MAP estimation

4

Empirical Bayes MAP estimation with unknown regularisation parameters

5

Conclusion

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 29 / 47

slide-31
SLIDE 31

Problem statement

Consider the class of Bayesian models p❼x❙y,θ➁ p❼y❙x➁p❼x❙θ➁ p❼y❙θ➁ , parametrised by a regularisation parameter θ ❃ Θ. For example, p❼x❙θ➁ 1 C❼θ➁ exp➌✏θϕ❼x➁➑, p❼y❙x➁ ➀ exp➌✏fy❼x➁➑, with fy and ϕ convex l.s.c. functions, and fy L-Lipschitz differentiable. We assume that p❼x❙θ➁ is proper, i.e., C❼θ➁ ❙Rd exp➌✏θϕ❼x➁➑dx ❅ ➟, with C❼θ➁ unknown and generally intractable.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 30 / 47

slide-32
SLIDE 32

Maximum-a-posteriori estimation

When θ is fixed, the posterior p❼x❙y,θ➁ is log-concave and ˆ xMAP argmin

x❃Rd

fy❼x➁ ✔ θϕ❼x➁ is a convex optimisation problem that can be often solved efficiently. For example, the proximal gradient algorithm xm✔1 proxL✏1

ϕ ➌xm ✔ L✏1➞fy❼xm➁➑,

converges to ˆ xMAP at rate O❼1⑦m➁, with poss. acceleration to O❼1⑦m2➁ 1. However, θ is generally unknown, significantly complicating the problem.

1Recall that the proximal operator proxλ ϕ❼x➁ ❁ argminu❃RN ϕ❼u➁ ✔ 1 2λ❙❙u ✏ x❙❙2.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 31 / 47

slide-33
SLIDE 33

Regularisation parameter MLE

In this talk we adopt an empirical Bayes approach and consider the MLE ˆ θ argmax

θ❃Θ

p❼y❙θ➁, argmax

θ❃Θ

❙Rd p❼y,x❙θ➁dx , which we solve efficiently by using a stochastic gradient algorithm driven by two proximal MCMC kernels (see Fernandez-Vidal and Pereyra (2018)). Given ˆ θ, we then straightforwardly compute ˆ xMAP argmin

x❃Rd

fy❼x➁ ✔ ˆ θϕ❼x➁. (4)

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 32 / 47

slide-34
SLIDE 34

Projected gradient algorithm

Assume that Θ is convex, and that ˆ θ is the only root of ➞θ log p❼y❙θ➁ in Θ. Then ˆ θ is also the unique solution of the fixed-point equation θ PΘ θ ✔ δ➞θ log p❼y❙θ➁✆ . where PΘ is the projection operator on Θ and δ ❆ 0. If ➞log p❼y❙θ➁ was tractable, we could compute ˆ θ iteratively by using θ❼t✔1➁ PΘ ✂θ❼t➁ ✔ δt➞θ log p❼y❙θ❼t➁➁✞, with sequence δt αt✏β, α ❆ 0, β ❃ 1⑦2,1✆. However, ➞log p❼y❙θ➁ is “doubly” intractable...

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 33 / 47

slide-35
SLIDE 35

Stochastic projected gradient algorithm

To circumvent the intractability of ➞θ log p❼y❙θ➁ we use Fisher’s identity ➞θ log p❼y❙θ➁ Ex❙y,θ➌➞θ log p❼x,y❙θ➁➑, ✏Ex❙y,θ➌ϕ ✔ ➞θ log C❼θ➁➑, together with the identity ➞θ log C❼θ➁ ✏Ex❙θ➌ϕ❼x➁➑, to obtain ➞θ log p❼y❙θ➁ Ex❙θ➌ϕ❼x➁➑ ✏ Ex❙y,θ➌ϕ❼x➁➑. This leads to the equivalent fixed-point equation θ PΘ ❽θ ✔ δEx❙θ➌ϕ❼x➁➑ ✏ δEx❙y,θ➌ϕ❼x➁➑➂, (5) which we solve by using a stochastic approximation algorithm.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 34 / 47

slide-36
SLIDE 36

SAPG algorithm driven by MCMC kernels

Initialisation x❼0➁,u❼0➁ ❃ Rd, θ❼0➁ ❃ Θ, δt δ0t✏0.8. for t 0 to n

  • 1. MCMC update x❼t✔1➁ ✂ Mx❙y,θ❼t➁❼❙x❼t➁➁ targeting p❼x❙y,θ❼t➁➁
  • 2. MCMC update u❼t✔1➁ ✂ Kx❙θ❼t➁❼❙u❼t➁➁ targeting p❼x❙θ❼t➁➁
  • 3. Stoch. grad. update

θ❼t✔1➁ PΘ ✂θ❼t➁ ✔ δtϕ❼u❼t✔1➁➁ ✏ δtϕ❼x❼t✔1➁➁✞. end for Output The iterates θ❼t➁ ˆ θ as n ➟.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 35 / 47

slide-37
SLIDE 37

SAPG algorithm driven MCMC kernels

Initialisation x❼0➁,u❼0➁ ❃ Rd, θ❼0➁ ❃ Θ, δt δ0t✏0.8, λ 1⑦L, γ 1⑦4L. for t 0 to n

  • 1. Coupled Proximal MCMC2 updates: generate z❼t✔1➁ ✂ ◆❼0,Id➁

x❼t✔1➁ ❼1 ✏ γ λ➁x❼t➁ ✏ γ➞fy ❾x❼t➁➃ ✔ γ λproxθλ

ϕ ❾x❼t➁➃ ✔

➺ 2γz❼t✔1➁ , u❼t✔1➁ ❼1 ✏ γ λ➁u❼t➁ ✔ γ λproxθλ

ϕ ❾u❼t➁➃ ✔

➺ 2γz❼t✔1➁ ,

  • 2. Stochastic gradient update

θ❼t✔1➁ PΘ ✂θ❼t➁ ✔ δtϕ❼u❼t✔1➁➁ ✏ δtϕ❼x❼t✔1➁➁✞. end for Output Averaged estimator ¯ θ n✏1 Pn

t1 θ❼t✔1➁ converges approx. to ˆ

θ.

2Langevin MCMC kernels are highly efficient and scale as ❖❼d log❼d➁➁, see Pereyra

(2015); Durmus et al. (2018).

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 36 / 47

slide-38
SLIDE 38

Illustrative example - Image deblurring with ℓ1 prior

We consider the Bayesian image deblurring model p❼x❙y,θ➁ ➀ exp❽✏❨y ✏ Ax❨2⑦2σ2 ✏ θ❨x❨1➂, and compute ˆ θ argmaxθ❃R✔ p❼y❙θ➁.

y ˆ xMAP

  • Reg. param θ

Figure: Molecules image deconvolution experiment, computing time 0.75 secs.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 37 / 47

slide-39
SLIDE 39

Deblurring with Total-Variation Prior

SNR=20dB SNR=30 SNR=40 MSE Time (min) MSE Time (min) MSE Time (min) Best 23.29 21.39 19.06 EB 23.52 10.33 21.47 10.13 19.21 9.49 HB 25.07 0.58 22.84 1.27 19.84 3.27 DP 23.73 21.87 19.78 SUG 24.44 3.92 24.24 4.50 24.21 4.81

Original Degraded x 

EB

x 

HB

x 

DP

x 

SUG

X X X SNR=20 SNR=30 SNR=40 X Min MSE Empirical B.

  • Disc. Prin.

Hierarchical B. SUGAR 10-4 0.001 0.010 0.100 1 θ 20 30 40 50 60 70 MSE(θ)

Image:flinstones

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 38 / 47

slide-40
SLIDE 40

Denoising with Total Generalized Variation

We consider TGV 2

θ ❼u➁

inf

v❃BD❼Ω➁ θ1 ❘Ω ❙➞u ✏ v❙ ✔ θ2 ❘Ω ❙ε❼v➁❙ with k 2.

Figure: Goldhill image (Original-Degraded-Estimated MAP), SNR=12dB.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 39 / 47

slide-41
SLIDE 41

Denoising with Total Generalized Variation

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 40 / 47

slide-42
SLIDE 42

Evolution of θ through iterations starting from different initial values:

θinit=10 θinit=0.1 θinit=40

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 41 / 47

slide-43
SLIDE 43

Hyperspectral Unmixing

Objective: to recover fractional abundances x from the mixed noisy spectral signatures y measured for every pixel. Regularizer: ϕ❼x➁ θTV TV ❼x➁ ✔ θL1 ❨x❨1 s.t. x ❈ 0 We estimate θTV and θL1 for synthetic test images

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 42 / 47

slide-44
SLIDE 44

Evolution of θTV and θL1 with iterations.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 43 / 47

slide-45
SLIDE 45

SRE plots for different SNR values

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 44 / 47

slide-46
SLIDE 46

Outline

1

Bayesian inference in imaging inverse problems

2

MAP estimation with Bayesian confidence regions

3

A decision-theoretic derivation of MAP estimation

4

Empirical Bayes MAP estimation with unknown regularisation parameters

5

Conclusion

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 45 / 47

slide-47
SLIDE 47

Conclusion

The challenges facing modern imaging sciences require a methodological paradigm shift to go beyond point estimation. Great potential for synergy between Bayesian and variational approaches at algorithmic, methodological, and theoretical levels.

Thank you!

We are hiring: 4 permanent positions in probability and statistical data science!

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 46 / 47

slide-48
SLIDE 48

Bibliography:

Ay, N. and Amari, S.-I. (2015). A novel approach to canonical divergences within information geometry. Entropy, 17(12):7866. Cai, X., Pereyra, M., and McEwen, J. D. (2017). Uncertainty quantification for radio interferometric imaging II: MAP estimation. ArXiv e-prints. Chambolle, A. and Pock, T. (2016). An introduction to continuous optimization for

  • imaging. Acta Numerica, 25:161–319.

Durmus, A., Moulines, E., and Pereyra, M. (2018). Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau. SIAM J. Imaging Sci., 11(1):473–506. Fernandez-Vidal, A. and Pereyra, M. (2018). Maximum likelihood estimation of regularisation parameters. In Proc. IEEE ICIP 2018. Pereyra, M. (2015). Proximal Markov chain Monte Carlo algorithms. Statistics and

  • Computing. open access paper, http://dx.doi.org/10.1007/s11222-015-9567-4.

Pereyra, M. (2016). Maximum-a-posteriori estimation with bayesian confidence regions. SIAM J. Imaging Sci., 6(3):1665–1688. Pereyra, M. (2016). Revisiting maximum-a-posteriori estimation in log-concave models: from differential geometry to decision theory. ArXiv e-prints. Repetti, A., Pereyra, M., and Wiaux, Y. (2018). Scalable Bayesian uncertainty quantification in imaging inverse problems via convex optimisation. ArXiv e-prints. Robert, C. P. (2001). The Bayesian Choice (second edition). Springer Verlag, New-York.

  • M. Pereyra (MI — HWU)

Bayesian inference & convex geometry 47 / 47