Mirrored Langevin Dynamics Ya-Ping Hsieh https://lions.epfl.ch - - PowerPoint PPT Presentation

mirrored langevin dynamics
SMART_READER_LITE
LIVE PREVIEW

Mirrored Langevin Dynamics Ya-Ping Hsieh https://lions.epfl.ch - - PowerPoint PPT Presentation

Mirrored Langevin Dynamics Ya-Ping Hsieh https://lions.epfl.ch Laboratory for Information and Inference Systems (LIONS) Ecole Polytechnique F ed erale de Lausanne (EPFL) Switzerland NeurIPS Spotlight [Dec 6th, 2018] Joint work with


slide-1
SLIDE 1

Mirrored Langevin Dynamics

Ya-Ping Hsieh https://lions.epfl.ch Laboratory for Information and Inference Systems (LIONS) ´ Ecole Polytechnique F´ ed´ erale de Lausanne (EPFL) Switzerland NeurIPS Spotlight [Dec 6th, 2018] Joint work with Ali Kavis, Paul Rolland, Volkan Cevher @ LIONS

slide-2
SLIDE 2

Introduction

  • Task: given a target distribution dµ = e−V (x)dx, generate samples from µ.

⊲ Fundamental in machine learning/statistics/computer science/etc.

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 2/ 8

slide-3
SLIDE 3

Introduction

  • Task: given a target distribution dµ = e−V (x)dx, generate samples from µ.

⊲ Fundamental in machine learning/statistics/computer science/etc.

  • A scalable framework: First-order sampling (assuming access to ∇V ).

Step 1. Langevin Dynamics dXt = −∇V (Xt)dt + √ 2dBt ⇒ X∞ ∼ e−V .

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 2/ 8

slide-4
SLIDE 4

Introduction

  • Task: given a target distribution dµ = e−V (x)dx, generate samples from µ.

⊲ Fundamental in machine learning/statistics/computer science/etc.

  • A scalable framework: First-order sampling (assuming access to ∇V ).

Step 1. Langevin Dynamics dXt = −∇V (Xt)dt + √ 2dBt ⇒ X∞ ∼ e−V . Step 2. Discretize xk+1 = xk − βk∇V (xk) +

  • 2βkξk

⊲ βk step-size, ξk standard normal ⊲ strong analogy to gradient descent method

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 2/ 8

slide-5
SLIDE 5

Recent progress: Unconstrained distributions are easy

  • State-of-the-art: When dom(V ) = Rd,

Assumption

W2 dTV

KL Literature LI ∇2V mI ˜ O ǫ−2d ˜ O ǫ−2d ˜ O ǫ−1d [Cheng and Bartlett, 2017] [Dalalyan and Karagulyan, 2017] [Durmus et al., 2018] LI ∇2V 0

  • ˜

O ǫ−4d ˜ O ǫ−2d [Durmus et al., 2018]

Note: W2(µ1, µ2) ≔

  • inf

X∼µ1,Y ∼µ2

EX − Y 2, dTV(µ1, µ2) ≔ sup

A Borel

|µ1(A)−µ2(A)|

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 3/ 8

slide-6
SLIDE 6

Recent progress: Unconstrained distributions are easy

  • State-of-the-art: When dom(V ) = Rd,

Assumption

W2 dTV

KL Literature LI ∇2V mI ˜ O ǫ−2d ˜ O ǫ−2d ˜ O ǫ−1d [Cheng and Bartlett, 2017] [Dalalyan and Karagulyan, 2017] [Durmus et al., 2018] LI ∇2V 0

  • ˜

O ǫ−4d ˜ O ǫ−2d [Durmus et al., 2018]

Note: W2(µ1, µ2) ≔

  • inf

X∼µ1,Y ∼µ2

EX − Y 2, dTV(µ1, µ2) ≔ sup

A Borel

|µ1(A)−µ2(A)|

  • What about constrained distributions?

⊲ include many important applications, such as Latent Dirichlet Allocation (LDA).

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 3/ 8

slide-7
SLIDE 7

A challenge: Constrained distributions are hard

  • When dom(V ) is compact, convergence rates deteriorate significantly.

Assumption

W2 or KL dTV

Literature LI ∇2V mI ? ˜ O ǫ−6d5 [Brosse et al., 2017] LI ∇2V 0 ? ˜ O ǫ−6d5 [Brosse et al., 2017]

⊲ cf., when V is unconstrained, ˜ O(ǫ−4d) convergence in dTV. ⊲ Projection is not a solution: slow rates [Bubeck et al., 2015], boundary issues.

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 4/ 8

slide-8
SLIDE 8

Unconstrained optimization of constrained problems

  • Entropic Mirror Descent: Unconstrained optimization within the simplex.

min

x∈∆d

V (x) ⊲ Choose h to be the entropic mirror map, h⋆ its dual

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 5/ 8

slide-9
SLIDE 9

Unconstrained optimization of constrained problems

  • Entropic Mirror Descent: Unconstrained optimization within the simplex.

min

x∈∆d

V (x) ⊲ Choose h to be the entropic mirror map, h⋆ its dual ⊲ Mirror vs primal image: y = ∇h(x) ⇔ x = ∇h⋆(y) yk+1 = yk − βk∇V (xk) ⇒ no projection since dom(h⋆) = Rd.

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 5/ 8

slide-10
SLIDE 10

Unconstrained optimization of constrained problems

  • Entropic Mirror Descent: Unconstrained optimization within the simplex.

min

x∈∆d

V (x) ⊲ Choose h to be the entropic mirror map, h⋆ its dual ⊲ Mirror vs primal image: y = ∇h(x) ⇔ x = ∇h⋆(y) yk+1 = yk − βk∇V (xk) ⇒ no projection since dom(h⋆) = Rd.

  • A “mirror descent theory” for Langevin Dynamics?

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 5/ 8

slide-11
SLIDE 11

Mirrored Langevin Dynamics (MLD)

  • Given e−V and h, compute e−W ≔ ∇h#e−V

MLD ≡

dYt = −∇W ◦ ∇h(Xt)dt +

√ 2dBt Xt = ∇h⋆(Yt) ⇒ X∞ ∼ e−V .

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 6/ 8

slide-12
SLIDE 12

Mirrored Langevin Dynamics (MLD)

  • Given e−V and h, compute e−W ≔ ∇h#e−V

MLD ≡

dYt = −∇W ◦ ∇h(Xt)dt +

√ 2dBt Xt = ∇h⋆(Yt) ⇒ X∞ ∼ e−V .

  • Discretize:
  • yk+1 = yk − βk∇W(yk) +

√ 2ξk xk+1 = ∇h⋆(yk+1) .

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 6/ 8

slide-13
SLIDE 13

Mirrored Langevin Dynamics (MLD)

  • Given e−V and h, compute e−W ≔ ∇h#e−V

MLD ≡

dYt = −∇W ◦ ∇h(Xt)dt +

√ 2dBt Xt = ∇h⋆(Yt) ⇒ X∞ ∼ e−V .

  • Discretize:
  • yk+1 = yk − βk∇W(yk) +

√ 2ξk xk+1 = ∇h⋆(yk+1) .

  • The dual distribution e−W can be unconstrained even if e−V is constrained.

⊲ Convergence rates for e−W are easy.

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 6/ 8

slide-14
SLIDE 14

Benefits of MLD

  • Improved rates for constrained sampling.
  • Can turn non-convex problems into convex ones!!

⊲ We provide the first ˜ O

  • 1

√ T

  • rate for Latent Dirichlet Allocation.
  • Works well in practice.

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 7/ 8

slide-15
SLIDE 15

For more details...

Welcome to our poster #43!!

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 8/ 8

slide-16
SLIDE 16

[0] Brosse, N., Durmus, A., Moulines, ´ E., and Pereyra, M. (2017). Sampling from a log-concave distribution with compact support with proximal langevin monte carlo. arXiv preprint arXiv:1705.08964. [0] Bubeck, S., Eldan, R., and Lehec, J. (2015). Sampling from a log-concave distribution with projected langevin monte carlo. arXiv preprint arXiv:1507.02564. [0] Cheng, X. and Bartlett, P. (2017). Convergence of langevin mcmc in kl-divergence. arXiv preprint arXiv:1705.09048. [0] Dalalyan, A. S. and Karagulyan, A. G. (2017). User-friendly guarantees for the langevin monte carlo with inaccurate gradient. arXiv preprint arXiv:1710.00095. [0] Durmus, A., Majewski, S., and Miasojedow, B. (2018). Analysis of langevin monte carlo via convex optimization. arXiv preprint arXiv:1802.09188.

Mirrored Langevin Dynamics | Ya-Ping Hsieh, https://lions.epfl.ch Slide 8/ 8