Steins method for diffusion approximation Thomas Bonis DataShape - - PowerPoint PPT Presentation
Steins method for diffusion approximation Thomas Bonis DataShape - - PowerPoint PPT Presentation
Steins method for diffusion approximation Thomas Bonis DataShape team, Inria K-nearest-neighbor graph We draw n points in R d , X 1 , . . . , X n d = fd K-nearest-neighbor graph We draw n points in R d , X 1 , . . . , X n d =
SLIDE 1
SLIDE 2
K-nearest-neighbor graph
We draw n points in Rd, X1, . . . , Xn ∼ dν = fdλ
SLIDE 3
K-nearest-neighbor graph
We draw n points in Rd, X1, . . . , Xn ∼ dν = fdλ We add an edge between each point and its K-nearest-neighbors
SLIDE 4
K-nearest-neighbor graph
We draw n points in Rd, X1, . . . , Xn ∼ dν = fdλ We add an edge between each point and its K-nearest-neighbors When K, n → ∞, K/n → 0 (+other condition on K), can we recover f using only the graph structure?
SLIDE 5
Random walk on K-nearest-neighbor graph
A random walk on the graph captures information on ν.
SLIDE 6
Random walk on K-nearest-neighbor graph
A random walk on the graph captures information on ν. The random walk approximates the diffusion process with generator f −2/d(∇(log f).∇ + 1 2∆).
SLIDE 7
Random walk on K-nearest-neighbor graph
A random walk on the graph captures information on ν. The random walk approximates the diffusion process with generator f −2/d(∇(log f).∇ + 1 2∆). Diffusion has invariant measure µ with density proportional to f 2+2/d.
SLIDE 8
Random walk on K-nearest-neighbor graph
A random walk on the graph captures information on ν. The random walk approximates the diffusion process with generator f −2/d(∇(log f).∇ + 1 2∆). Diffusion has invariant measure µ with density proportional to f 2+2/d. Does π, the invariant measure of the random walk, converge to µ ?
SLIDE 9
Random walk on ǫ-graph
The random walk approximates the diffusion process with generator ∇(log f).∇ + 1 2∆. Diffusion has invariant measure µ with density proportional to f 2. π(Xi) proportional to the degree of Xi (the ball density estimator → f). Since we have more points where f is large, π converges to a measure with density proportional to f 2. Edge between Xi and Xj if Xi − Xj ≤ ǫ.
SLIDE 10
Stein discrepancy
Let γ be the gaussian measure Z be drawn from γ. Then, ∀φ, E[−Z.∇φ(Z) + ∆φ(Z)] = 0.
SLIDE 11
Stein discrepancy
Let γ be the gaussian measure Z be drawn from γ. Then, ∀φ, E[−Z.∇φ(Z) + ∆φ(Z)] = 0. Let X be drawn from ν. We say that a measure ν admits a Stein kernel τν with respect to γ if there exists τν such tha, ∀φ, E[−X.∇φ(X)+ < τν(X), Hess(φ)(X)) >HS] = 0.
SLIDE 12
Stein discrepancy
Let γ be the gaussian measure Z be drawn from γ. Then, ∀φ, E[−Z.∇φ(Z) + ∆φ(Z)] = 0. Let X be drawn from ν. We say that a measure ν admits a Stein kernel τν with respect to γ if there exists τν such tha, ∀φ, E[−X.∇φ(X)+ < τν(X), Hess(φ)(X)) >HS] = 0. Intuitively, if τν is close to Id then ν is close to γ. The distance between τν and Id is quantified by: S(ν, µ)2 = E[τν(X) − Id2].
SLIDE 13
Bounding the Wasserstein distance with S
SLIDE 14
Bounding the Wasserstein distance with S
Theorem [Ledoux, Nourdin, Peccati 2015] Let ν be a measure admitting a Stein kernel τν and let S(ν) be the associated Stein discrepancy. We have: W2(ν, γ) ≤ S(ν)
SLIDE 15
Bounding the Wasserstein distance with S
Theorem [Ledoux, Nourdin, Peccati 2015] Let ν be a measure admitting a Stein kernel τν and let S(ν) be the associated Stein discrepancy. We have: W2(ν, γ) ≤ S(ν) Problem: in the general case, discrete measures do not admit a Stein kernel. Example: if the Rademacher measure admited a Stein kernel, there would exist τ such that for any smooth function φ φ′(−1) − φ′(1) + τ(1)φ′′(1) + τ(−1)φ′′(−1) = 0. Can be dealt with using a smoothing procedure (relying on the zero-bias distribution), but not practical in high dimensions.
SLIDE 16
Generalizing the Stein kernel
X ∼ ν. There exists an operator L such that ∀φ, E[Lφ(X)] = 0, compare L with −x.∇ + ∆.
SLIDE 17
Generalizing the Stein kernel
X ∼ ν. There exists an operator L such that ∀φ, E[Lφ(X)] = 0, compare L with −x.∇ + ∆. Let X and X′ be drawn from ν, then ∀φ, E[φ(X) − φ(X′)] = 0, and by a Taylor-Expansion E[
∞
- k=0
E[X′k] k! φ(k)] = 0.
SLIDE 18
Another bound on W2
Theorem[Dimension 1] Let ν be a measure of R and r.v. X, (Xt)t≥0 drawn from ν. Let Yt = Xt −X, for any h > 0, W2(ν, γ) ≤ ∞ e−tE[( 1 hE[Yt|X] + X)2]1/2dt + ∞ e−2t √ 1 − e−2t E[( 1 h E[Y 2
t |X]
2 − 1)2]1/2dt +
- k>2
∞ e−kt h √ k!(1 − e−2t)(k−1)/2 E[E[Y k
t |X]2]1/2dt
SLIDE 19
Another bound on W2
Theorem[Dimension 1] Let ν be a measure of R and r.v. X, (Xt)t≥0 drawn from ν. Let Yt = Xt −X, for any h > 0, W2(ν, γ) ≤ ∞ e−tE[( 1 hE[Yt|X] + X)2]1/2dt + ∞ e−2t √ 1 − e−2t E[( 1 h E[Y 2
t |X]
2 − 1)2]1/2dt +
- k>2
∞ e−kt h √ k!(1 − e−2t)(k−1)/2 E[E[Y k
t |X]2]1/2dt
Rescaling
SLIDE 20
Another bound on W2
Theorem[Dimension 1] Let ν be a measure of R and r.v. X, (Xt)t≥0 drawn from ν. Let Yt = Xt −X, for any h > 0, W2(ν, γ) ≤ ∞ e−tE[( 1 hE[Yt|X] + X)2]1/2dt + ∞ e−2t √ 1 − e−2t E[( 1 h E[Y 2
t |X]
2 − 1)2]1/2dt +
- k>2
∞ e−kt h √ k!(1 − e−2t)(k−1)/2 E[E[Y k
t |X]2]1/2dt
First moment close to −X. Rescaling
SLIDE 21
Another bound on W2
Theorem[Dimension 1] Let ν be a measure of R and r.v. X, (Xt)t≥0 drawn from ν. Let Yt = Xt −X, for any h > 0, W2(ν, γ) ≤ ∞ e−tE[( 1 hE[Yt|X] + X)2]1/2dt + ∞ e−2t √ 1 − e−2t E[( 1 h E[Y 2
t |X]
2 − 1)2]1/2dt +
- k>2
∞ e−kt h √ k!(1 − e−2t)(k−1)/2 E[E[Y k
t |X]2]1/2dt
First moment close to −X. Second moment close to 1. Rescaling
SLIDE 22
Another bound on W2
Theorem[Dimension 1] Let ν be a measure of R and r.v. X, (Xt)t≥0 drawn from ν. Let Yt = Xt −X, for any h > 0, W2(ν, γ) ≤ ∞ e−tE[( 1 hE[Yt|X] + X)2]1/2dt + ∞ e−2t √ 1 − e−2t E[( 1 h E[Y 2
t |X]
2 − 1)2]1/2dt +
- k>2
∞ e−kt h √ k!(1 − e−2t)(k−1)/2 E[E[Y k
t |X]2]1/2dt
First moment close to −X. Second moment close to 1. Higher moments small, start at 0 and grow as t increases. Roughly, Yt has to be bounded by √ t. Rescaling
SLIDE 23
Another bound on W2
Theorem[Dimension 1] Let ν be a measure of R and r.v. X, (Xt)t≥0 drawn from ν. Let Yt = Xt −X, for any h > 0, W2(ν, γ) ≤ ∞ e−tE[( 1 hE[Yt|X] + X)2]1/2dt + ∞ e−2t √ 1 − e−2t E[( 1 h E[Y 2
t |X]
2 − 1)2]1/2dt +
- k>2
∞ e−kt h √ k!(1 − e−2t)(k−1)/2 E[E[Y k
t |X]2]1/2dt
Similar result (in dimension 1 only !) for Wp, p ≥ 1.
SLIDE 24
Another bound on W2
Let µ be the invariant measure of an operator L = b.∇+ < a, ∇2 > . Under technical conditions on L (a curvature dimension inequality), a similar result holds.
SLIDE 25
Another bound on W2
Let µ be the invariant measure of an operator L = b.∇+ < a, ∇2 > . Under technical conditions on L (a curvature dimension inequality), a similar result holds. A similar result holds under technical conditions on L (for example under a curvature dimension inequality). If
- E[Yt|X] is close to b(X).
- E[Y 2
t |X] is close to a(X).
- E[Y k
t ] are small for k > 2.
then W2(ν, µ) is small.
SLIDE 26
Convergence rates in the Central Limit Theorem
Consider i.i.d random variables X1, . . . , Xn with measure ν and E[X1] = 0, E[X2
1] = 1. The Central Limit Theorem gives
Sn = n−1/2
n
- i=1
→ N(0, 1). How fast does it converge?
SLIDE 27
Convergence rates in the Central Limit Theorem
Consider i.i.d random variables X1, . . . , Xn with measure ν and E[X1] = 0, E[X2
1] = 1. The Central Limit Theorem gives
Sn = n−1/2
n
- i=1
→ N(0, 1). How fast does it converge? Let ˜ X1, . . . , ˜ Xn i.i.d. copies of X1, . . . , Xn and I a uniform r.v. on 1, . . . , n. We pose (Sn)t = Sn + n−1/2( ˜ XI − XI).
SLIDE 28
Convergence rates in the Central Limit Theorem
Consider i.i.d random variables X1, . . . , Xn with measure ν and E[X1] = 0, E[X2
1] = 1. The Central Limit Theorem gives
Sn = n−1/2
n
- i=1
→ N(0, 1). How fast does it converge? Let ˜ X1, . . . , ˜ Xn i.i.d. copies of X1, . . . , Xn and I a uniform r.v. on 1, . . . , n. We pose (Sn)t = Sn + n−1/2( ˜ XI − XI)1Xi, ˜
Xi∈[− √ tn, √ tn].
SLIDE 29
Convergence rates in the Central Limit Theorem
Theorem Consider X1, ..., Xn i.i.d random variables in Rd and let νn be the mea- sure of n−1/2 n
i=1 Xi. If X1 admits a moment of order p + m (that is
E[X1p+m
2
< ∞) for some m ∈ [0, 2] then W2(νn, γ) = O(n−1/2+(2−m)/4). Moreover, if d = 1 then for any p ≥ 2, if X1 admits a finite moment of
- rder p + m for some m ∈ [0, 2] then
Wp(νn, γ) = O(n−1/2+(2−m)/2p).
SLIDE 30
Convergence rates in the Central Limit Theorem
Theorem Consider X1, ..., Xn i.i.d random variables in Rd and let νn be the mea- sure of n−1/2 n
i=1 Xi. If X1 admits a moment of order p + m (that is
E[X1p+m
2
< ∞) for some m ∈ [0, 2] then W2(νn, γ) = O(n−1/2+(2−m)/4). Moreover, if d = 1 then for any p ≥ 2, if X1 admits a finite moment of
- rder p + m for some m ∈ [0, 2] then
Wp(νn, γ) = O(n−1/2+(2−m)/2p).
- p > 2, m = 0 proved by Sakhanenko (1985).
- p ∈ [1, 2], m = 2 proved by Rio (2009).
- p = 2, m = 2 proved by Bobkov (2013) by other means, can be
extended to higher dimensions at the cost of stronger assumptions.
- p > 2 also proved by Bobkov (2016).
SLIDE 31
Convergence of Markov chains towards diffusion processes
Consider a Markov chain (Xn) with invariant measure π and transition kernel K approximating a diffusion process with operator L = b.∇+ < a, ∇2 > and invariant measure µ.
SLIDE 32
Convergence of Markov chains towards diffusion processes
Consider a Markov chain (Xn) with invariant measure π and transition kernel K approximating a diffusion process with operator L = b.∇+ < a, ∇2 > and invariant measure µ. X is drawn from π, ξ is a jump from X. Xt = 1t≥T ξ + X.
SLIDE 33
Convergence of Markov chains towards diffusion processes
Consider a Markov chain (Xn) with invariant measure π and transition kernel K approximating a diffusion process with operator L = b.∇+ < a, ∇2 > and invariant measure µ. X is drawn from π, ξ is a jump from X. Xt = 1t≥T ξ + X. h is the time step of the Markov Chain. If
- Conditions on L.
- E[(
(y−X)
h
K(x, dy) − b(X))2] ;
- E[(
(y−X)2
2h
K(x, dy) − b(X))2] ;
- higher moments of the jumps
are small then W2(π, µ) is small.
SLIDE 34
Convergence of Markov chains towards diffusion processes
Theorem [Strook Varadhan] Consider a family of discrete Markov chains M h defined on Sh ⊂ Rd with transition kernel Kh. Then, if
- limh→0 supx∈Sh
1 h
- y∈Sh(y − x)Kh(x, dy) = b,
- limh→0 supx∈Sh
1 h
- y∈Sh
(y−x)2 2
Kh(x, dy) = a
- ∀r > 0, limh→0 supx∈Sh
- y∈Sh,|y−x|>r Kh(x, y) = 0 Then, for any
T > 0, M h converges weakly on [0, T] toward the diffusion process with infinitesimal generator L.
SLIDE 35
Langevin Monte-Carlo algorithm
How to draw points from a smooth log concave measure µ with density f?
SLIDE 36
Langevin Monte-Carlo algorithm
How to draw points from a smooth log concave measure µ with density f? µ is the stationary distribution of the diffusion process with generator Lf = ∇ log f.∇ + ∆f, thus we can draw from f by simulating the diffusion operator.
SLIDE 37
Langevin Monte-Carlo algorithm
How to draw points from a smooth log concave measure µ with density f? µ is the stationary distribution of the diffusion process with generator Lf = ∇ log f.∇ + ∆f, thus we can draw from f by simulating the diffusion operator. Sample from a discretization of the form Xt+h = Xt + hb(Xt) + √ hξ.
SLIDE 38
Langevin Monte-Carlo algorithm
How to draw points from a smooth log concave measure µ with density f? µ is the stationary distribution of the diffusion process with generator Lf = ∇ log f.∇ + ∆f, thus we can draw from f by simulating the diffusion operator. Sample from a discretization of the form Xt+h = Xt + hb(Xt) + √ hξ. Complexity to achieve an accuracy of ǫ between the sampling and the target in W2 is
- O∗(dǫ−1) if ξ = N(0, 1) (direct approach).
- O∗(dǫ−2) for more general ξ (Stein’s method).
SLIDE 39
Conclusion
Rates in Wasserstein Distance for the muldim CLT. Quantitative convergence result for the invariant measure of an approxi- mation scheme of a diffusion process.
SLIDE 40
Conclusion
Rates in Wasserstein Distance for the muldim CLT. Quantitative convergence result for the invariant measure of an approxi- mation scheme of a diffusion process. Density estimation for K-nn graphs. Rates for Langevin Monte-Carlo algorithm.
SLIDE 41