AXDA : efficient sampling through variable splitting inspired - - PowerPoint PPT Presentation

axda efficient sampling through variable splitting
SMART_READER_LITE
LIVE PREVIEW

AXDA : efficient sampling through variable splitting inspired - - PowerPoint PPT Presentation

AXDA : efficient sampling through variable splitting inspired bayesian hierarchical models P. Chainais with Maxime Vono & Nicolas Dobigeon March 12th 2019 Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 1 / 47 Flight schedule 1


slide-1
SLIDE 1

AXDA : efficient sampling through variable splitting inspired bayesian hierarchical models

  • P. Chainais

with Maxime Vono & Nicolas Dobigeon March 12th 2019

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 1 / 47

slide-2
SLIDE 2

Flight schedule

1 Motivations 2 Splitted Gibbs sampling (SP) 3 Splitted & Augmented Gibbs sampling (SPA) 4 Asymptotically exact data augmentation: AXDA

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 2 / 47

slide-3
SLIDE 3

Outline

1 Motivations 2 Splitted Gibbs sampling (SP) 3 Splitted & Augmented Gibbs sampling (SPA) 4 Asymptotically exact data augmentation: AXDA

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 3 / 47

slide-4
SLIDE 4

Motivations: applications in image processing

Image deblurring

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 4 / 47

slide-5
SLIDE 5

Motivations: applications in image processing

Image deblurring

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 4 / 47

slide-6
SLIDE 6

Motivations: applications in image processing

Image inpainting

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 4 / 47

slide-7
SLIDE 7

Motivations: applications in image processing

Image inpainting

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 4 / 47

slide-8
SLIDE 8

Motivations: applications in image processing

Confidence intervals

30 40 50 60 70 80

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 4 / 47

slide-9
SLIDE 9

Motivations

I have a dream...

◮ solve complex ill-posed inverse problems ◮ big data in large dimensions ◮ excellent performances ◮ fast inference algorithms ◮ credibility intervals

with maybe some additional options such as:

◮ parallel distributed computing ◮ privacy preserving

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 5 / 47

slide-10
SLIDE 10

Motivations

I have a dream...

◮ solve complex ill-posed inverse problems ◮ big data in large dimensions ◮ excellent performances ◮ fast inference algorithms ◮ credibility intervals

with maybe some additional options such as:

◮ parallel distributed computing ◮ privacy preserving

= ⇒ Bayesian approach + MCMC method!

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 5 / 47

slide-11
SLIDE 11

Motivations

The optimization-based approach

Inverse problems & optimization = define a cost function : f(x) = f1(x|y) + f2(x) where f2 is typically

◮ convex (or not) ◮ not differentiable ⇒ proximal operators ◮ a sum of various penalties

Solution: proximal operators

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 6 / 47

slide-12
SLIDE 12

Motivations

The optimization-based approach

Inverse problems & optimization = define a cost function : f(x) = f1(x|y) + f2(x) where f2 is typically

◮ convex (or not) ◮ not differentiable ⇒ proximal operators ◮ a sum of various penalties

Solution: proximal operators and splitting techniques arg min

x

f1(x) + f2(z) such that x = z maybe relaxed to (simplified version of ADMM) arg min

x

f1(x) + f2(z) + α 2 x − z2

2 + uT (x − z)

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 6 / 47

slide-13
SLIDE 13

Motivations

The Bayesian approach

Inverse problems & Bayes posterior ∝ likelihood(f1) × prior(f2) = define a posterior distribution p(x|y) = p1(x|y) · p2(x) where p2 is typically

◮ log-concave (or not) ↔ f2 convex ◮ conjugate ⇒ easy sampling/inference ◮ a combination of various prior

Solution: MCMC methods and Gibbs sampling xi ∼ p(xi|x\i) ∀1 ≤ i ≤ d

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 7 / 47

slide-14
SLIDE 14

Motivations

The Bayesian approach

Inverse problems & Bayes posterior ∝ likelihood(f1) × prior(f2) = define a posterior distribution p(x|y) = p1(x|y) · p2(x) where p2 is typically

◮ log-concave (or not) ↔ f2 convex ◮ conjugate ⇒ easy sampling/inference ◮ a combination of various prior

Solution: MCMC methods and Gibbs sampling xi ∼ p(xi|x\i) ∀1 ≤ i ≤ d Can we adapt splitting and augmentation from optimization ? πρ(x, z, u) ∝ exp

  • −f1(x) − f2(z) −

1 2ρ2 u − x + z2

2 −

1 2α2 u2

  • Centrale Lille - CRIStAL

Pierre Chainais March 12th 2019 7 / 47

slide-15
SLIDE 15

Motivations

The Bayesian approach

Inverse problems & Bayes posterior ∝ likelihood(f1) × prior(f2) = define a posterior distribution p(x|y) = p1(x|y) · p2(x) Computational motivations: difficult sampling

◮ non-conjugate priors [conj. priors ⇒ easy inference] ◮ rich models: complicated prior distributions ◮ big datasets: expensive likelihood computation

Strategy: ❉✐✈✐❞❡✲❚♦✲❈♦♥q✉❡r = ⇒ splitting (SP) and augmentation (SPA)

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 8 / 47

slide-16
SLIDE 16

Outline

1 Motivations 2 Splitted Gibbs sampling (SP) 3 Splitted & Augmented Gibbs sampling (SPA) 4 Asymptotically exact data augmentation: AXDA

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 9 / 47

slide-17
SLIDE 17

Splitted Gibbs sampling (SP)

π(x) ∝ exp [−f1(x) − f2(x)] ⇓ πρ(x, z) ∝ exp

  • −f1(x) − f2(z) −

1 2ρ2 x − z2

2

  • θx

x y f2 f1 θz z ρ x y f2 φρ f1

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 10 / 47

slide-18
SLIDE 18

Splitted Gibbs sampling (SP)

π(x) ∝ exp [−f1(x) − f2(x)] ⇓ πρ(x, z) ∝ exp [−f1(x) − f2(z) − φρ(x, z)] θx x y f2 f1 θz z ρ x y f2 φρ f1

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 10 / 47

slide-19
SLIDE 19

Splitted Gibbs sampling (SP): Theorem 1

Consider the marginal of x under πρ: pρ(x) =

  • Rd πρ(x, z)dz ∝
  • Rd exp [−f1(x) − f2(z) − φρ(x, z)] dz .

Theorem

Assume that in the limiting case ρ → 0, φρ is such that exp (−φρ(x, z))

  • Rd exp (−φρ(x, z)) dx −

− − →

ρ→0 δx(z)

Then pρ coincides with π when ρ → 0, that is pρ − πTV − − − →

ρ→0 0

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 11 / 47

slide-20
SLIDE 20

Splitted Gibbs sampling (SP): marginal distributions

Full conditional distributions under the split distribution πρ: πρ(x|z) ∝ exp (−f1(x) − φρ(x, z)) πρ(z|x) ∝ exp (−f2(z) − φρ(x, z)) . Note that f1 and f2 are now separated in 2 distinct distributions

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 12 / 47

slide-21
SLIDE 21

Splitted Gibbs sampling (SP): marginal distributions

Full conditional distributions under the split distribution πρ: πρ(x|z) ∝ exp

  • −f1(x) −

1 2ρ2 x − z2

2

  • πρ(z|x) ∝ exp
  • −f2(z) −

1 2ρ2 x − z2

2

  • .

Note that f1 and f2 are now separated in 2 distinct distributions State of the art sampling methods:

◮ P-MYULA = proximal MCMC, (Pereyra 2016; Durmus et al. 2018) ◮ Fourier or Aux-V1 or E-PO for Gaussian variables ◮ ...

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 12 / 47

slide-22
SLIDE 22

Splitted Gibbs sampling (SP): inverse problems

Linear Gaussian inverse problems

y = Px + n, where P = damaging operator and n ∼ N

  • 0d, σ2Id
  • = noise.

   f1(x) = 1 2σ2 y − Px2

2

∀x ∈ Rd, f2(x) = τψ(x), τ > 0. Then the SP conditional distributions are: πρ(x|z) = N

  • µx, Qx−1

πρ(z|x) ∝ exp

  • −τψ(z) −

1 2ρ2 z − x2

2

  • ,

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 13 / 47

slide-23
SLIDE 23

Splitted Gibbs sampling (SP): efficient sampling

Linear Gaussian inverse problems

πρ(x|z) = N

  • µx, Qx−1

πρ(z|x) ∝ exp

  • −τψ(z) −

1 2ρ2 z − x2

2

  • ,

Examples:

◮ Convex non-smooth

ψ(x) = TV, ℓ1 sparsity... ⇒ proximal MCMC

◮ Tikhonov regularization

ψ(z) = Qz2

2 ⇒ Gaussian variables

(e.g. P or Q diagonalizable in Fourier→ E-PO)

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 14 / 47

slide-24
SLIDE 24

Splitted Gibbs sampling (SP): TV deblurring

Linear Gaussian inverse problems

Posterior distribution p (x|y) ∝ exp

  • − 1

2σ2 Px − y2

2 − βTV(x)

  • where P = damaging operator (blur, binary mask...) and

TV(x) =

  • 1≤i,j≤N
  • (∇x)i,j
  • 2

Direct sampling is challenging

1 generally high dimension of the image, 2 non-conjugacy of the TV-based prior, 3 non-differentiability of g (= Hamiltonian Monte Carlo algorithms) Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 15 / 47

slide-25
SLIDE 25

Splitted Gibbs sampling (SP): TV deblurring

Linear Gaussian inverse problems

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 16 / 47

slide-26
SLIDE 26

Splitted Gibbs sampling (SP): TV deblurring

Linear Gaussian inverse problems

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 16 / 47

slide-27
SLIDE 27

Splitted Gibbs sampling (SP): TV deblurring

Linear Gaussian inverse problems 20 25 30 35 40 45 50 55 60 65

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 16 / 47

slide-28
SLIDE 28

Splitted Gibbs sampling (SP): TV deblurring

Linear Gaussian inverse problems

SALSA FISTA SGS P-MYULA time (s) 1 10 470 3600 time (× var. split.) 1 10 1 7.7

  • nb. iterations

22 214 ∼ 104 105 SNR (dB) 17.87 17.86 18.36 17.97 Rk : ρ2 = 9

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 17 / 47

slide-29
SLIDE 29

Splitted Gibbs sampling (SP): TV deblurring

Linear Gaussian inverse problems

Short auto-correlation of the Markov chain 1 100 200 300 400 500 Lag 0.0 0.2 0.4 0.6 0.8 1.0 ACF SGS P-MYULA

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 18 / 47

slide-30
SLIDE 30

Splitted Gibbs sampling (SP): TV deblurring

Linear Gaussian inverse problems

ρ = comput. time compromise/quality 100 101 102 103 104 Iteration t 105 106 107 108 109 1010 f(x) + g(x)

MYULA 10−1 100 101

ρ

AXDA Gibbs

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 18 / 47

slide-31
SLIDE 31

Outline

1 Motivations 2 Splitted Gibbs sampling (SP) 3 Splitted & Augmented Gibbs sampling (SPA) 4 Asymptotically exact data augmentation: AXDA

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 19 / 47

slide-32
SLIDE 32

Splitted & Augmented Gibbs sampling (SPA)

Motivation for augmentation: better mixing properties of the Markov chain πρ,α p(x, z, u; ρ, α) ∝ exp [−f(x) − g(z)] × exp [−φ1(x, z − u; ρ) − φ2(u; α)] Assumption 2 φ2 and φ1 are such that ∀x, z ∈ Rd,

  • Rd exp [−φ1(x, z − u; ρ) − φ2(u; α)] du

∝ exp [−φ1(x, z; η(ρ, α))] . (1)

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 20 / 47

slide-33
SLIDE 33

Splitted & Augmented Gibbs sampling (SPA)

πρ,α p(x, z, u; ρ, α) ∝ exp [−f(x) − g(z)] × exp [−φ1(x, z − u; ρ) − φ2(u; α)] x θx ρ z θz u θu α

Variable

  • f

interest Variable splitting Data aug- mentation

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 21 / 47

slide-34
SLIDE 34

Splitted & Augmented Gibbs sampling (SPA)

SPA Gibbs sampler

The conditional split-augmented distributions are: p(x|z, u; ρ) ∝ exp [−f(x) − φ1(x, z − u; ρ)] p(z|x, u; ρ) ∝ exp [−g(z) − φ1(x, z − u; ρ)] p(u|x, z; ρ, α) ∝ exp [−φ2(u; α)] × exp [−φ1(x, z − u; ρ)] .

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 22 / 47

slide-35
SLIDE 35

Splitted & Augmented Gibbs sampling (SPA)

SPA Gibbs sampler

The conditional split-augmented distributions are: p(x|z, u; ρ) ∝ exp

  • −f(x) −

1 2ρ2 x − z + u2

2

  • p(z|x, u; ρ) ∝ exp
  • −g(z) −

1 2ρ2 x − z + u2

2

  • p(u|x, z; ρ, α) ∝ exp
  • −u2

2

2α2 − 1 2ρ2 x − z + u2

2

  • .

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 22 / 47

slide-36
SLIDE 36

SPA & ADMM

By replacing each Gibbs sampling step by optimizations, ADMM appears: Algorithm 1: ADMM (scaled version) Input: Functions f, g, penalty ρ2, init. t ← 0, z(0),u(0)

1 while stopping criterion not satisfied do 2

x(t) ∈ arg minx − log p

  • x|z(t−1), u(t−1); ρ
  • ;

3

z(t) ∈ arg minz − log p

  • z|x(t), u(t−1); ρ
  • ;

4

u(t) = u(t−1) + x(t) − z(t) ;

5

t ← t + 1 ;

6 end

Output: Approximate solution of the optimization problem ˆ x.

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 23 / 47

slide-37
SLIDE 37

Splitted Gibbs sampling (SPA): TV restoration

Linear Gaussian inverse problems

The conditional distributions associated to SPA are: p(x|z, u) ∝ exp

  • − 1

2σ2 Px − y2

2

  • × exp
  • − 1

2ρ2 x − (z − u)2

2

  • p(z|x, u) ∝ exp
  • −βTV(z) −

1 2ρ2 z − (x + u)2

2

  • p(u|x, z) ∝ exp
  • − 1

2α2 u2

2 −

1 2ρ2 u − (z − x)2

2

  • Rk: sampling from p(z|x, u) ⇒ P-MYULA + Chambolle’s algorithm

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 24 / 47

slide-38
SLIDE 38

Splitted Gibbs sampling (SPA): TV inpainting

Linear Gaussian inverse problems

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 25 / 47

slide-39
SLIDE 39

Splitted Gibbs sampling (SPA): TV inpainting

Linear Gaussian inverse problems

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 26 / 47

slide-40
SLIDE 40

Splitted Gibbs sampling (SPA): TV inpainting

Linear Gaussian inverse problems

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 26 / 47

slide-41
SLIDE 41

Splitted Gibbs sampling (SPA): confidence intervals

Linear Gaussian inverse problems 30 40 50 60 70 80

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 27 / 47

slide-42
SLIDE 42

Splitted Gibbs sampling (SPA): TV inpainting

Linear Gaussian inverse problems

SALSA P-MYULA SP SPA Balloons 26.18 23.00 26.19 26.18 Baboon 14.37 13.35 14.60 14.59 Elaine 23.61 21.21 23.86 23.84 Clock 25.72 24.50 25.45 25.42 Donna 24.71 21.69 23.87 23.82 House 20.21 19.59 20.43 20.43 Peppers 20.35 19.20 20.22 20.20 Cameraman 19.48 18.76 19.34 19.34 Boat 20.81 19.80 20.74 20.71

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 28 / 47

slide-43
SLIDE 43

Splitted & Augmented Gibbs sampling (SPA)

Applications

Many problems can be considered using SPA:

◮ Laplacian + ℓ2 regularizer for deconvolution

  • M. Vono et al., “Split-and-augmented Gibbs sampler - Application to large-scale

inference problems,” in IEEE Trans. Signal Processing, 2019 ◮ Poisson noise + blur + non-negativity + ...

  • M. Vono et al., “Bayesian image restoration under Poisson noise and log-concave

prior,” in Proc. ICASSP 2019 ◮ Machine learning: logistic regression,...

  • M. Vono et al. (2018), “Sparse Bayesian binary logistic regression using the

split-and-augmented Gibbs sampler,” in Proc. IEEE MLSP 2018

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 29 / 47

slide-44
SLIDE 44

Splitted & Augmented Gibbs sampling (SPA)

Poisson denoising + deblurring: sparse wavelet transform + non-negativity

◮ 6 times faster than P-MYULA, ◮ no approximation (e.g., Anscombe in P-MYULA) ◮ ... but using SPA + P-MYULA ! ◮ performances similar to PIDAL (Figueiredo & Bioucas-Dias 2010) ◮ + confidence intervals

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 30 / 47

slide-45
SLIDE 45

Splitted & Augmented Gibbs sampling (SPA)

Poisson denoising + deblurring: sparse wavelet transform + non-negativity

◮ 6 times faster than P-MYULA, ◮ no approximation (e.g., Anscombe in P-MYULA) ◮ ... but using SPA + P-MYULA ! ◮ performances similar to PIDAL (Figueiredo & Bioucas-Dias 2010) ◮ + confidence intervals

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 30 / 47

slide-46
SLIDE 46

Splitted & Augmented Gibbs sampling (SPA)

Poisson denoising + deblurring: sparse wavelet transform + non-negativity

◮ 6 times faster than P-MYULA, ◮ no approximation (e.g., Anscombe in P-MYULA) ◮ ... but using SPA + P-MYULA ! ◮ performances similar to PIDAL (Figueiredo & Bioucas-Dias 2010) ◮ + confidence intervals

5 10 15 Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 30 / 47

slide-47
SLIDE 47

Outline

1 Motivations 2 Splitted Gibbs sampling (SP) 3 Splitted & Augmented Gibbs sampling (SPA) 4 Asymptotically exact data augmentation: AXDA

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 31 / 47

slide-48
SLIDE 48

Asymptotically exact data augmentation (AXDA)

Motivations

Let π ∈ L1 a target probability distribution with density with respect to (w.r.t.) the Lebesgue measure π(x) ∝ exp(−f(x)) where f : X ⊆ Rd → (−∞, +∞] stands for a potential function. With a slight abuse of notations, π shall refer to

◮ a prior π(x), ◮ a likelihood π(x) π(y|x), ◮ a posterior π(x) π(x|y),

where y are observations.

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 32 / 47

slide-49
SLIDE 49

Asymptotically exact data augmentation (AXDA)

Motivations

Let π ∈ L1 a target probability distribution with density with respect to (w.r.t.) the Lebesgue measure π(x) ∝ exp(−f(x)) where f : X ⊆ Rd → (−∞, +∞] stands for a potential function.

Assumption 1

Inference from π is difficult and possibly inefficient. Examples:

◮ non-trivial maximum likelihood estimation ◮ difficult posterior sampling with poor mixing chains

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 32 / 47

slide-50
SLIDE 50

Data augmentation (DA)

One surrogate is to introduce auxiliary variables z ∈ Z ⊆ Rk such that

  • Z

π(x, z)dz = π(x). Numerous well-known advantages:

◮ augmented likelihood π(x, z) π(y, z|x) easier to work with ◮ joint posterior π(x, z) π(x, z|y) with simpler full conditionals ◮ improved inference (multimodal problems, mixing properties)

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 33 / 47

slide-51
SLIDE 51

The art of exact data augmentation: XDA

Unfortunately, satisfying

  • Z

π(x, z)dz = π(x) (XDA) is a matter of art (van Dyk and Meng 2001). Difficulties:

◮ finding π(x, z) (Geman and Yang 1995) ◮ scaling in high-dimensional/big data settings

(Neal 2003; Polson et al. 2013).

Goal: relax (XDA) while keeping XDA’s advantages

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 34 / 47

slide-52
SLIDE 52

Asymptotically exact data augmentation (AXDA)

Let consider an augmented density pρ(x, z) and define πρ(x) =

  • Z

pρ(x, z)dz, where ρ > 0.

Assumption 2

For all x ∈ X, limρ→0 πρ(x) = π(x).

Theorem 1 (Scheff´

e 1947)

Under Assumption 2, πρ − πTV − − − →

ρ→0 0.

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 35 / 47

slide-53
SLIDE 53

Choice of the augmented density

Take inspiration from variable splitting in optimization (Boyd et al.

2011)...

This motivates the choice (Vono et al. 2019) pρ(x, z) ∝ exp(−f(z) − φρ(x, z))

◮ simplify the inference (splitting complicated potentials) (Vono

et al. 2019)

◮ distribute the inference (Rendell et al. 2018) ◮ accelerate the inference (Vono et al. 2019).

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 36 / 47

slide-54
SLIDE 54

Properties based on convolution

(H1) π ∈ L1 is log-concave. (H2) φρ(x, z) = ˜ φρ(x − z), such that πρ(x) =

  • Z

pρ(x, z)dz Kρ ∝ exp(−˜ φρ) is C∞ log-concave, ∀k ≥ 0, ∂kKρ is bounded limρ→0 Kρ(u) = δ(u) with EKρ(U) = 0. Then, i) πρ − − − →

ρ→0 π

ii) πρ is log-concave iii) πρ is infinitely differentiable on X iv) π(x) = ⇒ Eπρ(X) = Eπ(X) varπρ(X) = varπ(X) + varKρ(X).

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 37 / 47

slide-55
SLIDE 55

Non-asymptotic bound on the TV distance

(H3) f is Lf-Lipschitz, (H4) φρ(x, z) = 1 2ρ2 x − z2

2.

Let d = dim(X). For all ρ > 0, πρ − πTV ≤ 1 − ∆d(ρ) = 1 − D−d(Lfρ) D−d(−Lfρ) 1 − ∆d(ρ) ∼

ρ→0

2 √ 2Γ d + 1 2

  • Γ

d 2

  • Lfρ

The function D−d is the parabolic cylinder special function.

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 38 / 47

slide-56
SLIDE 56

Behavior when ρ → 0 & illustration

10−4 10−3 10−2 10−1 100 101 Lfρ 10−3 10−2 10−1 100 1 − ∆d(ρ)

100 101 102 103 104 105 106

d

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 39 / 47

slide-57
SLIDE 57

Bounds on potentials

fρ(x) = d 2 log(2πρ2) − log

  • X

exp

  • −f(z) −

1 2ρ2 z − x2

2

  • dz

For all ρ > 0 and x ∈ X, Lρ ≤ fρ(x) − f(x) ≤ Uρ with Lρ = log Mρ − log D−d(−Lfρ) Uρ = log Mρ − log D−d(Lfρ) Mρ = 2d/2−1Γ (d/2) Γ(d) exp

  • L2

fρ2/4

.

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 40 / 47

slide-58
SLIDE 58

Bounds on credibility intervals

Illustration

(1 − α) Mρ D−d(−Lfρ) ≤

α

π(x)dx ≤ min

  • 1, (1 − α)

Mρ D−d(Lfρ)

  • −3

−2 −1 1 2 3

x1

0.0 0.2 0.4 0.6 0.8

π πρ

−3 −2 −1 1 2 3

x1

0.0 0.2 0.4 0.6 0.8

π πρ

−3 −2 −1 1 2 3

x1

0.0 0.2 0.4 0.6 0.8

π πρ

ρ Cα Cρ

α

α π(x1)dx1

α

10−2 [-0.47,1.24] [-0.47,1.24] 0.95 [0.948,0.952] 10−1 idem [-0.47,1.24] 0.95 [0.88,1] 1 idem [-0.47,1.37] 0.96 [0.34,1]

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 41 / 47

slide-59
SLIDE 59

Distributed sampling and data privacy

Regularized logistic regression

∀ i ∈ 1, n, yi ∼ Bernoulli

  • σ(aT

i x)

  • π(x|y) ∝ exp

 −f(x) −

b

  • j=1

g(j)(x)   where

◮ g(j)(x) = i∈Dj log

  • 1 + exp
  • −yiaT

i x

  • ,

◮ Dj indices associated to the jth block of data, ◮ f = prior on the regressor x

Issues:

◮ the full data set is distributed over b nodes, b ∈ 1, n ◮ data privacy.

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 42 / 47

slide-60
SLIDE 60

Distributed sampling and data privacy

Applying AXDA b times pρ(x, z1:b) ∝ exp  −f(x) −

b

  • j=1

  1 2ρ2 x − zj2 +

  • i∈Dj

log

  • 1 + exp
  • −yiaT

i zj

  

Benefits of AXDA:

◮ inference via a Gibbs sampler distributed on b nodes ◮ the master node never sees the data set: privacy ◮ theoretical guarantees on the approximation

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 43 / 47

slide-61
SLIDE 61

Conclusion

◮ SP & SPA split-and-augment strategy

Bayesian inference for complex models large scale problems (big & tall) confidence intervals

◮ Efficient algorithms for inference

acceleration of state-of-the-art sampling algorithms distributed inference (simulation, optimization, variational approx.)

◮ AXDA: unifying statistical framework

asymptotically exact: control parameter ρ non-asymptotic theoretical guarantees on the approximation under mild assumptions

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 44 / 47

slide-62
SLIDE 62

Interested in AXDA for your statistical problems?

Theory and methods

◮ M. Vono et al. (2019), “Asymptotically exact data augmentation: models,

properties and algorithms”. Technical report. https://arxiv.org/abs/1902.05754/

◮ M. Vono et al. (2019), “Split-and-augmented Gibbs sampler - Application to

large-scale inference problems,” IEEE Transactions on Signal Processing.

◮ L. J. Rendell et al. (2018), “Global consensus Monte Carlo”. Technical report.

https://arxiv.org/abs/1807.09288/

Applications

◮ M. Vono et al. (2019), “Bayesian image restoration under Poisson noise and

log-concave prior,” in Proc. ICASSP.

◮ M. Vono et al. (2018), “Sparse Bayesian binary logistic regression using the

split-and-augmented Gibbs sampler,” in Proc. MLSP.

Code

◮ https://github.com/mvono

Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 45 / 47

slide-63
SLIDE 63

Beaumont, M. A., Zhang, W., and Balding, D. J. (2002), “Approximate Bayesian Computation in Population Genetics,” Genetics, 162, 2025–2035. Besag, J. and Green, P. J. (1993), “Spatial Statistics and Bayesian Computation,” Journal of the Royal Statistical Society, Series B, 55, 25–37. Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011), “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers,” Foundations and Trends in Machine Learning, 3, 1–122. Damien, P., Wakefield, J., and Walker, S. (1999), “Gibbs sampling for Bayesian non-conjugate and hierarchical models by using auxiliary variables,” Journal of the Royal Statistical Society, Series B, 61, 331–344. Del Moral, P., Doucet, A., and Jasra, A. (2012), “An adaptive sequential Monte Carlo method for approximate Bayesian computation,” Statistics and Computing, 22, 1009–1020. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Series B, 39, 1–38. Doucet, A., Godsill, S. J., and Robert, C. P. (2002), “Marginal maximum a posteriori estimation using Markov chain Monte Carlo,” Statistics and Computing, 12, 77–84. Durmus, A., Moulines, E., and Pereyra, M. (2018), “Efficient Bayesian Computation by Proximal Markov chain Monte Carlo: When Langevin Meets Moreau,” SIAM Journal on Imaging Sciences, 11, 473–506. Filstroff, L., Lumbreras, A., and F´ evotte, C. (2018), “Closed-form Marginal Likelihood in Gamma-Poisson Matrix Factorization,” in Proceedings of the 35th International Conference on Machine Learning (ICML),

  • vol. 80, pp. 1506–1514.

Geman, D. and Yang, C. (1995), “Nonlinear image recovery with half-quadratic regularization,” IEEE Transactions on Image Processing, 4, 932–946. Ising, E. (1925), “Beitrag zur Theorie des Ferromagnetismus,” Zeitschrift f¨ ur Physik, 31, 253–258. Neal, R. M. (2003), “Slice sampling,” The Annals of Statistics, 31, 705–767. Peel, D. and McLachlan, G. J. (2000), “Robust mixture modelling using the t distribution,” Statistics and Computing, 10, 339–348. Pereyra, M. (2016), “Proximal Markov chain Monte Carlo algorithms,” Statistics and Computing, 26, 745–760. Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 46 / 47

slide-64
SLIDE 64

Polson, N. G., Scott, J. G., and Windle, J. (2013), “Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables,” Journal of the American Statistical Association, 108, 1339–1349. Potts, R. B. (1952), “Some generalized order-disorder transformations,” Mathematical Proceedings of the Cambridge Philosophical Society, 48, 106–109. Ratmann, O., Andrieu, C., Wiuf, C., and Richardson, S. (2009), “Model criticism based on likelihood-free inference, with an application to protein network evolution,” Proceedings of the National Academy of Sciences, 106, 10576–10581. Rendell, L. J., Johansen, A. M., Lee, A., and Whiteley, N. (2018), “Global consensus Monte Carlo,” [online]. Technical report. Available at https://arxiv.org/abs/1807.09288/. Scheff´ e, H. (1947), “A useful convergence theorem for probability distributions,” The Annals of Mathematical Statistics, 18, 434–438. Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., and McCulloch, R. E. (2016), “Bayes and Big Data: The Consensus Monte Carlo Algorithm,” International Journal of Management Science and Engineering Management, 11, 78–88. Sisson, S., Fan, Y., and Beaumont, M. (eds.) (2018), Handbook of Approximate Bayesian Computation, Chapman and Hall/CRC Press, 1st ed. Swendsen, R. H. and Wang, J.-S. (1987), “Nonuniversal critical dynamics in Monte Carlo simulations,” Physical Review Letters, 58, 86–88. van Dyk, D. A. and Meng, X.-L. (2001), “The Art of Data Augmentation,” Journal of Computational and Graphical Statistics, 10, 1–50. Vono, M., Dobigeon, N., and Chainais, P. (2018), “Sparse Bayesian binary logistic regression using the split-and-augmented Gibbs sampler,” in IEEE International Workshop on Machine Learning for Signal Processing, Aalborg, Denmark. — (2019), “Split-and-augmented Gibbs sampler - Application to large-scale inference problems,” IEEE Transactions on Signal Processing, 67, 1648–1661. Wang, C. and Blei, D. M. (2018), “A General Method for Robust Bayesian Modeling,” Bayesian Analysis, 13, 1163–1191. Centrale Lille - CRIStAL Pierre Chainais March 12th 2019 47 / 47