SLIDE 1 Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond
Xuechen Li 1,2 Denny Wu 1,2 Lester Mackey 3 Murat A. Erdogdu 1,2
1University of Toronto 2Vector Institute 3Microsoft Research
SLIDE 2 The Problem and Our Work
Given smooth potential f : Rd → R, sample from given density p(x) ∝ exp(−f (x)).
- We study both strongly convex and non-convex potentials.
- Many papers study individual algorithms [1, 2, 3, 4, 5].
However, there has yet to be a unifying theoretical framework.
- We provide a theorem that gives the convergence rate of
sampling algorithms obtained by discretizing an exponentially contracting diffusion based on local properties of the numerical method.
- A direct extension is we obtain faster converging algorithms
with the class of stochastic Runge-Kutta (SRK) methods.
SLIDE 3
Exponential W2-Contraction of Diffusions
Diffusion Xt has exponential W2-contraction if two instances Xt,x, Xt,y initiated respectively from x and y satisfy W2(Xt,x, Xt,y) ≤ e−αtx − y2, for all x, y ∈ Rd, t ≥ 0. Informal: The marginals of the continuous-time diffusion become the same very quickly regardless of the initial state. Example: When f is strongly convex, the Langevin diffusion characterized by the SDE dXt = −∇f (Xt) dt + √ 2 dBt has exponential W2-contraction.
SLIDE 4
Local Deviation
Let { ˜ Xk}k∈N be a discretization of {Xt}t≥0, and {X (k)
s
}s≥0 be another instance of the diffusion starting from ˜ Xk−1 at s = 0. The local deviation at iteration k is defined as D(k)
h
= X (k)
h
− ˜ Xk.
X(3)
h
<latexit sha1_base64="DsCGYcYDam91TceFPLMPtT+zvGk=">AHYXic3VNb9NAEJ20LThKy3cejFElcolSioPSBUyAUuUVNIG9qUynY2idW1HdlrRBT1V3CFH8aZP8Lbsdt6mxg415a9s+M3M29mZ73OWHqxqtd/LSwu3bm7XFpZLd+7/+Dho8ra+mEcJpErOm4ow6jr2LGQXiA6ylNSdMeRsH1HiPnvKm/H30VUeyFwSc1GYtT3x4G3sBzbQXV5+7Z6Mt0a/v5xVmlWq/V+bJmhUYmVCm79sO1pafUoz6F5FJCPgkKSEGWZFOM+4QaVKcxdKc0hS6C5PF3QRdUhm0ClADChvYc7yFmJ5k2wFz7jNnaRSJ4KlRZsZpg95wNp01PGtHLYoxpR9a4TjE7m04dW0Qjaf9ldIv/XTuekwHCXc/HAc8wanaVrZDTAKDFX4K/fEyAFpD6sIkgudBLaVKNjRBjTurMR1xnm3ECkuZU5lXSvnvsw4aHGDx6V8wd3FM6AJOUiwU5xVlgmHCNBFf5smK6wnm/B4X+8iPV0hwTaK5HI5zHD4Y6OLYx4V+zMxtrlPInTW/Am2jAiberERs+G4X+sqjAv6eMJP58Vu5+C0D/bforUJPeZTeFT6Se+b3MsmrnYTQM7G7tMm9aT23brJDrR95xLu8ueaPr4lub+XWfvObekDxTLNlY/2H21074+2VxkJlouxfJ4A3O/MN+9J/qCH6TGW78NobOgmnTuPmGTMrHL6oNbZr9ovq3vsvNnhTboGW3hjNmhPXpP+9QBC5+0w/6ufy7tFqlNZT6OJCZvOYjKu08QclvEWM</latexit>
˜ X3
<latexit sha1_base64="tXJIAdmN6KCKwM/dC3Y2wLmriA=">AHY3ic3VXNbtNAEJ62UJeUn7b0hpAMUSVOkdOC6KFChVzgEjWFtFGTqrKdTWp1Y0f2GhFfQyu8Fw8AO/Bt2O39TYxcK4te2fH3zs7NebyDRDnOr4XFpXv3l62VB5XVh48eP1lb3zhKojT2RduPZBR3PDcRMghFWwVKis4Fu7Ik+LYu2jo78dfRZwEUfhFTcbidOQOw2AQ+K6CqtTgeyLaefybOdsrerUHL7sWaGeC1XKr4NofekF9ahPEfmU0ogEhaQgS3Ipwd2lOjk0hu6UptDFkAL+LuiSKrBNgRJAuNBe4D3ErJtrQ8w1Z8LWPrxIPDEsbdrKMX3IA9Zmo/ZvF7BlPqbMrWOcYPRyzhG0is6h/ZfdFfJ/7XROChHuci4B4hyzRmfpGxkNMErMFeLX7wmQAlIfVjEkHzoJbabRPmKMWV15udcZ5dxApKOqcKrpLl7zOGCIUEcvevIPdxTOkQkWSw25AxnI8KUayS4ylcV0xUu8h6W8hVRAa+Q4JrEc2M4KcTwyUCX+z4p5TEzd7lOEXfW/Aq0jAqYeLMSicHdKuUqokL+nIk8/03C/6bBvpv3pulTEWU3hUjdJPefd/mWDQKvhsGdtZ3hbszbt26yQ62fecT7vLnmr65I7m/lNn+xb0ieKZcrP8w/2tn/fH+WmOjMnH+rxPAm535jrn0H2qIPlP5LrxhQyfh1KnfPmNmhaPtWn2n9qb1ur/IT9/VugZvaRXOGPe0j59pANqc49/px/0c/m3tWptWJsZdHEht3lKxmU9/wPXRkb3</latexit>
˜ X2
<latexit sha1_base64="x3HIwqT8R4/RS/NyjFTVARbWV8U=">AHY3ic3VXNbtNAEJ62UJcU6A+9ISRDVIlT5BQPVSokAtcoqaQNmpSVbazSa1u7MheI6Koj8EVnosH4D34duy23iYGzrVl7+z4m29+dtbrjWQKMf5tbC4dO/+srXyoL68NHjtfWNzaMkSmNftP1IRnHcxMhg1C0VaCk6Ixj4Y48KY69i4b+fvxVxEkQhV/UZCxOR+4wDAaB7yqouj0VyL6Ydi7Pds7Wq07N4cueFeq5UKX8Oog2lp5Tj/oUkU8pjUhQSAqyJcS3F2qk0Nj6E5pCl0MKeDvgi6pAtsUKAGEC+0F3kPMurk2xFxzJmztw4vE8PSpu0c04c8YG02av92AVvmY8rcOsYJRi/nHEGr6Bzaf9ldIf/XTuekEOEu5xIgzjFrdJa+kdEAo8RcIX79ngApIPVhFUPyoZPQZhrtI8aY1Vnfs51dhknIOmYKrxKmrvHC4YEsTRu47cwz2lQ0SxWJDznA2Iky5RoKrfFUxXeEi72EpXxEV8AoJrk8N4aTQgyfDHS575NSHjNzl+sUcWfNr0DLqICJNyuRGNytUq4iKuTvKUcy3+z4L9poP/mvVnKVETpXTFCN+nd92ORaPgu2FgZ31XaNveumu3zgq5fuYd5/Pukre6Lrmzmd/0yR73huSZYsnF+g/zv3bWH+vNTYqE+f/OgG82ZnvmEv/oYboM5Xvwhs2dBJOnfrtM2ZWONqp1V/V3rReV/c/5OfPCj2lF/QSZ8xb2qePdEBt7vHv9IN+Lv+2Vq1NayuDLi7kNk/IuKxnfwDQUb2</latexit>
D(3)
h
<latexit sha1_base64="opcjrm9x0vjknBWC4/kPr2xUTg=">AHYXic3VNT9tAEB2gJZB+BdobF7cREr1ECbQqB1TRpof2EhHaAIVQZDubxGL9IXtTNYr4Fb2P6zn/pG+HRvwkrjtGVv2zo7fzLyZnfU6kfQSVa/mptfuHN3sbS0XL53/8HDR5WV1YMkHMWu6LihDOMjx06E9ALRUZ6S4iKhe07Uhw65039/fCriBMvD6pcSROfXsQeH3PtRVUn9mWxsPb84G5VqvVanS9rWmhkQpWyay9cWXhKXepRSC6NyCdBASnIkmxKcJ9Qg+oUQXdKE+hiSB5/F3RBZdiOgBJA2NCe4z3A7CTBphrnwlbu4gi8cSwtGg9w/Qg91mbjq+lcMWxZiwb81xjNHJfPrQKhpC+y+7S+T/2umcFBhucy4eEas0Vm6RkZ9jBJzBf76PQZSQOrBKobkQiehTU6RowxravOfMh1thknIGlOZV4l7bvLPmx4SMCje8XcwT2hfTBJuViQU5wFhiOukeAqX1ZMVzjvd7/QXx7l8QoJrk8k8NxjsMHA10c+7jQj5m5zXUKubNmV6BtVMDEm5VIDN/tQl95VMDfR8xkdvxWLn7LQP8teqvQUx6ld4WPbtK79sMi2YudtPATscu07r15LbdOivk+pF3nMu7S97ouTWZn7dJzvcG5JniUb6z/I/tpf7y50lioTJz96wTwZme+Zl/6DzVAn6lsF157Qyfh1GncPGOmhYPNWmOr9rL9or7Njt/lmiNntEGzphXtEvaY86YOHTd/pBPxd/l5ZLldJqCp2fy2wek3GV1v4AmPNFeA=</latexit>
SLIDE 5 Uniform Orders of Local Deviation
Recall local deviation D(k)
h
= X (k)
h
− ˜
- Xk. A numerical scheme has
uniform mean-square and mean orders of (p1, p2) if for all k ∈ N E(1)
k
=E
h 2 2|Ftk−1
(1) E(2)
k
=E
h |Ftk−1
2
(2) for constants λ1 and λ2 independent of h. Remark: Bounds like (1) appeared explicitly in previous works (see e.g. [1]). To the best of our knowledge, (2) did not appear explicitly in previous works.
SLIDE 6 A General Theorem
Theorem (Informal) Diffusion has a stationary distribution p(x) ∝ exp(−f (x)) and exhibits exponential W2-contraction. Acting on this diffusion, a numerical discretization with uniform mean-square and mean
- rders of (p1, p2) for p2 ≥ p1 + 1
2 has rate ˜
O(ǫ−1/(p1−1/2)) in W2. Remark 1: Connects the numerical SDE and sampling literatures: Take any classical SDE discretization method, instantly know the convergence rate when it’s used for sampling! Remark 2: Can also be used for discretizing the underdamped Langenvin diffusion! Check out our examples in the paper.
SLIDE 7 Convergence Rates for EM and SRK
Result Diffusion Smoothness
Rate EM (Durmus et al.)
Langevin
1st
(1.0, 1.5) ˜ O(dǫ−2)
EM (Ex. 1)
Langevin
1st & 2nd
(1.5, 2.0) ˜ O(dǫ−1)
SRK-LD (This work)
Langevin
1st-3rd
(2.0, 2.5) ˜ O(dǫ−2/3)
EM (Ex. 2)
General
1st
(1.0, 1.5) ˜ O(dǫ−2)
SRK-ID (This work)
General
1st
(1.5, 2.0) ˜ O(d3/4m2ǫ−1)
Table: Convergence rates in W2, i.e. number of iterations required to reach ǫ accuracy to the target in W2. Top three for strongly convex f ; bottom two for non-convex f that admits uniformly dissipative diffusion.
EM = Euler-Maruyama SRK = Stochastic Runge-Kutta
SLIDE 8
Thanks to you and my coauthors: Denny Wu Lester Mackey Murat A. Erdogdu
SLIDE 9
Our poster: East Exhibition Hall B + C #162
[1] Xiang Cheng, Niladri S Chatterji, Peter L Bartlett, and Michael I Jordan. Underdamped Langevin MCMC: A non-asymptotic analysis. [2] Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities. [3] Alain Durmus, Eric Moulines, et al. Nonasymptotic convergence analysis for the unadjusted langevin algorithm. [4] Yin Tat Lee, Zhao Song, and Santosh S Vempala. Algorithmic theory of odes and sampling from well-conditioned logconcave densities. [5] Santosh S Vempala and Andre Wibisono. Rapid convergence of the unadjusted langevin algorithm: Log-sobolev suffices.