Steins method for diffusion approximation Thomas Bonis DataShape - - PowerPoint PPT Presentation

stein s method for diffusion approximation
SMART_READER_LITE
LIVE PREVIEW

Steins method for diffusion approximation Thomas Bonis DataShape - - PowerPoint PPT Presentation

Steins method for diffusion approximation Thomas Bonis DataShape team, Inria K-nearest-neighbor graph We draw n points in R d , X 1 , . . . , X n d = fd K-nearest-neighbor graph We draw n points in R d , X 1 , . . . , X n d =


slide-1
SLIDE 1

Stein’s method for diffusion approximation

Thomas Bonis DataShape team, Inria

slide-2
SLIDE 2

K-nearest-neighbor graph

We draw n points in Rd, X1, . . . , Xn ∼ dν = fdλ

slide-3
SLIDE 3

K-nearest-neighbor graph

We draw n points in Rd, X1, . . . , Xn ∼ dν = fdλ We add an edge between each point and its K-nearest-neighbors

slide-4
SLIDE 4

K-nearest-neighbor graph

We draw n points in Rd, X1, . . . , Xn ∼ dν = fdλ We add an edge between each point and its K-nearest-neighbors When K, n → ∞, K/n → 0 (+other condition on K), can we recover f using only the graph structure?

slide-5
SLIDE 5

Random walk on K-nearest-neighbor graph

A random walk on the graph captures information on ν.

slide-6
SLIDE 6

Random walk on K-nearest-neighbor graph

A random walk on the graph captures information on ν. The random walk approximates the diffusion process with generator f −2/d(∇(log f).∇ + 1 2∆).

slide-7
SLIDE 7

Random walk on K-nearest-neighbor graph

A random walk on the graph captures information on ν. The random walk approximates the diffusion process with generator f −2/d(∇(log f).∇ + 1 2∆). Diffusion has invariant measure µ with density proportional to f 2+2/d.

slide-8
SLIDE 8

Random walk on K-nearest-neighbor graph

A random walk on the graph captures information on ν. The random walk approximates the diffusion process with generator f −2/d(∇(log f).∇ + 1 2∆). Diffusion has invariant measure µ with density proportional to f 2+2/d. Does π, the invariant measure of the random walk, converge to µ ?

slide-9
SLIDE 9

Random walk on ǫ-graph

The random walk approximates the diffusion process with generator ∇(log f).∇ + 1 2∆. Diffusion has invariant measure µ with density proportional to f 2. π(Xi) proportional to the degree of Xi (the ball density estimator → f). Since we have more points where f is large, π converges to a measure with density proportional to f 2. Edge between Xi and Xj if Xi − Xj ≤ ǫ.

slide-10
SLIDE 10

Stein discrepancy

Let γ be the gaussian measure Z be drawn from γ. Then, ∀φ, E[−Z.∇φ(Z) + ∆φ(Z)] = 0.

slide-11
SLIDE 11

Stein discrepancy

Let γ be the gaussian measure Z be drawn from γ. Then, ∀φ, E[−Z.∇φ(Z) + ∆φ(Z)] = 0. Let X be drawn from ν. We say that a measure ν admits a Stein kernel τν with respect to γ if there exists τν such tha, ∀φ, E[−X.∇φ(X)+ < τν(X), Hess(φ)(X)) >HS] = 0.

slide-12
SLIDE 12

Stein discrepancy

Let γ be the gaussian measure Z be drawn from γ. Then, ∀φ, E[−Z.∇φ(Z) + ∆φ(Z)] = 0. Let X be drawn from ν. We say that a measure ν admits a Stein kernel τν with respect to γ if there exists τν such tha, ∀φ, E[−X.∇φ(X)+ < τν(X), Hess(φ)(X)) >HS] = 0. Intuitively, if τν is close to Id then ν is close to γ. The distance between τν and Id is quantified by: S(ν, µ)2 = E[τν(X) − Id2].

slide-13
SLIDE 13

Bounding the Wasserstein distance with S

slide-14
SLIDE 14

Bounding the Wasserstein distance with S

Theorem [Ledoux, Nourdin, Peccati 2015] Let ν be a measure admitting a Stein kernel τν and let S(ν) be the associated Stein discrepancy. We have: W2(ν, γ) ≤ S(ν)

slide-15
SLIDE 15

Bounding the Wasserstein distance with S

Theorem [Ledoux, Nourdin, Peccati 2015] Let ν be a measure admitting a Stein kernel τν and let S(ν) be the associated Stein discrepancy. We have: W2(ν, γ) ≤ S(ν) Problem: in the general case, discrete measures do not admit a Stein kernel. Example: if the Rademacher measure admited a Stein kernel, there would exist τ such that for any smooth function φ φ′(−1) − φ′(1) + τ(1)φ′′(1) + τ(−1)φ′′(−1) = 0. Can be dealt with using a smoothing procedure (relying on the zero-bias distribution), but not practical in high dimensions.

slide-16
SLIDE 16

Generalizing the Stein kernel

X ∼ ν. There exists an operator L such that ∀φ, E[Lφ(X)] = 0, compare L with −x.∇ + ∆.

slide-17
SLIDE 17

Generalizing the Stein kernel

X ∼ ν. There exists an operator L such that ∀φ, E[Lφ(X)] = 0, compare L with −x.∇ + ∆. Let X and X′ be drawn from ν, then ∀φ, E[φ(X) − φ(X′)] = 0, and by a Taylor-Expansion E[

  • k=0

E[X′k] k! φ(k)] = 0.

slide-18
SLIDE 18

Another bound on W2

Theorem[Dimension 1] Let ν be a measure of R and r.v. X, (Xt)t≥0 drawn from ν. Let Yt = Xt −X, for any h > 0, W2(ν, γ) ≤ ∞ e−tE[( 1 hE[Yt|X] + X)2]1/2dt + ∞ e−2t √ 1 − e−2t E[( 1 h E[Y 2

t |X]

2 − 1)2]1/2dt +

  • k>2

∞ e−kt h √ k!(1 − e−2t)(k−1)/2 E[E[Y k

t |X]2]1/2dt

slide-19
SLIDE 19

Another bound on W2

Theorem[Dimension 1] Let ν be a measure of R and r.v. X, (Xt)t≥0 drawn from ν. Let Yt = Xt −X, for any h > 0, W2(ν, γ) ≤ ∞ e−tE[( 1 hE[Yt|X] + X)2]1/2dt + ∞ e−2t √ 1 − e−2t E[( 1 h E[Y 2

t |X]

2 − 1)2]1/2dt +

  • k>2

∞ e−kt h √ k!(1 − e−2t)(k−1)/2 E[E[Y k

t |X]2]1/2dt

Rescaling

slide-20
SLIDE 20

Another bound on W2

Theorem[Dimension 1] Let ν be a measure of R and r.v. X, (Xt)t≥0 drawn from ν. Let Yt = Xt −X, for any h > 0, W2(ν, γ) ≤ ∞ e−tE[( 1 hE[Yt|X] + X)2]1/2dt + ∞ e−2t √ 1 − e−2t E[( 1 h E[Y 2

t |X]

2 − 1)2]1/2dt +

  • k>2

∞ e−kt h √ k!(1 − e−2t)(k−1)/2 E[E[Y k

t |X]2]1/2dt

First moment close to −X. Rescaling

slide-21
SLIDE 21

Another bound on W2

Theorem[Dimension 1] Let ν be a measure of R and r.v. X, (Xt)t≥0 drawn from ν. Let Yt = Xt −X, for any h > 0, W2(ν, γ) ≤ ∞ e−tE[( 1 hE[Yt|X] + X)2]1/2dt + ∞ e−2t √ 1 − e−2t E[( 1 h E[Y 2

t |X]

2 − 1)2]1/2dt +

  • k>2

∞ e−kt h √ k!(1 − e−2t)(k−1)/2 E[E[Y k

t |X]2]1/2dt

First moment close to −X. Second moment close to 1. Rescaling

slide-22
SLIDE 22

Another bound on W2

Theorem[Dimension 1] Let ν be a measure of R and r.v. X, (Xt)t≥0 drawn from ν. Let Yt = Xt −X, for any h > 0, W2(ν, γ) ≤ ∞ e−tE[( 1 hE[Yt|X] + X)2]1/2dt + ∞ e−2t √ 1 − e−2t E[( 1 h E[Y 2

t |X]

2 − 1)2]1/2dt +

  • k>2

∞ e−kt h √ k!(1 − e−2t)(k−1)/2 E[E[Y k

t |X]2]1/2dt

First moment close to −X. Second moment close to 1. Higher moments small, start at 0 and grow as t increases. Roughly, Yt has to be bounded by √ t. Rescaling

slide-23
SLIDE 23

Another bound on W2

Theorem[Dimension 1] Let ν be a measure of R and r.v. X, (Xt)t≥0 drawn from ν. Let Yt = Xt −X, for any h > 0, W2(ν, γ) ≤ ∞ e−tE[( 1 hE[Yt|X] + X)2]1/2dt + ∞ e−2t √ 1 − e−2t E[( 1 h E[Y 2

t |X]

2 − 1)2]1/2dt +

  • k>2

∞ e−kt h √ k!(1 − e−2t)(k−1)/2 E[E[Y k

t |X]2]1/2dt

Similar result (in dimension 1 only !) for Wp, p ≥ 1.

slide-24
SLIDE 24

Another bound on W2

Let µ be the invariant measure of an operator L = b.∇+ < a, ∇2 > . Under technical conditions on L (a curvature dimension inequality), a similar result holds.

slide-25
SLIDE 25

Another bound on W2

Let µ be the invariant measure of an operator L = b.∇+ < a, ∇2 > . Under technical conditions on L (a curvature dimension inequality), a similar result holds. A similar result holds under technical conditions on L (for example under a curvature dimension inequality). If

  • E[Yt|X] is close to b(X).
  • E[Y 2

t |X] is close to a(X).

  • E[Y k

t ] are small for k > 2.

then W2(ν, µ) is small.

slide-26
SLIDE 26

Convergence rates in the Central Limit Theorem

Consider i.i.d random variables X1, . . . , Xn with measure ν and E[X1] = 0, E[X2

1] = 1. The Central Limit Theorem gives

Sn = n−1/2

n

  • i=1

→ N(0, 1). How fast does it converge?

slide-27
SLIDE 27

Convergence rates in the Central Limit Theorem

Consider i.i.d random variables X1, . . . , Xn with measure ν and E[X1] = 0, E[X2

1] = 1. The Central Limit Theorem gives

Sn = n−1/2

n

  • i=1

→ N(0, 1). How fast does it converge? Let ˜ X1, . . . , ˜ Xn i.i.d. copies of X1, . . . , Xn and I a uniform r.v. on 1, . . . , n. We pose (Sn)t = Sn + n−1/2( ˜ XI − XI).

slide-28
SLIDE 28

Convergence rates in the Central Limit Theorem

Consider i.i.d random variables X1, . . . , Xn with measure ν and E[X1] = 0, E[X2

1] = 1. The Central Limit Theorem gives

Sn = n−1/2

n

  • i=1

→ N(0, 1). How fast does it converge? Let ˜ X1, . . . , ˜ Xn i.i.d. copies of X1, . . . , Xn and I a uniform r.v. on 1, . . . , n. We pose (Sn)t = Sn + n−1/2( ˜ XI − XI)1Xi, ˜

Xi∈[− √ tn, √ tn].

slide-29
SLIDE 29

Convergence rates in the Central Limit Theorem

Theorem Consider X1, ..., Xn i.i.d random variables in Rd and let νn be the mea- sure of n−1/2 n

i=1 Xi. If X1 admits a moment of order p + m (that is

E[X1p+m

2

< ∞) for some m ∈ [0, 2] then W2(νn, γ) = O(n−1/2+(2−m)/4). Moreover, if d = 1 then for any p ≥ 2, if X1 admits a finite moment of

  • rder p + m for some m ∈ [0, 2] then

Wp(νn, γ) = O(n−1/2+(2−m)/2p).

slide-30
SLIDE 30

Convergence rates in the Central Limit Theorem

Theorem Consider X1, ..., Xn i.i.d random variables in Rd and let νn be the mea- sure of n−1/2 n

i=1 Xi. If X1 admits a moment of order p + m (that is

E[X1p+m

2

< ∞) for some m ∈ [0, 2] then W2(νn, γ) = O(n−1/2+(2−m)/4). Moreover, if d = 1 then for any p ≥ 2, if X1 admits a finite moment of

  • rder p + m for some m ∈ [0, 2] then

Wp(νn, γ) = O(n−1/2+(2−m)/2p).

  • p > 2, m = 0 proved by Sakhanenko (1985).
  • p ∈ [1, 2], m = 2 proved by Rio (2009).
  • p = 2, m = 2 proved by Bobkov (2013) by other means, can be

extended to higher dimensions at the cost of stronger assumptions.

  • p > 2 also proved by Bobkov (2016).
slide-31
SLIDE 31

Convergence of Markov chains towards diffusion processes

Consider a Markov chain (Xn) with invariant measure π and transition kernel K approximating a diffusion process with operator L = b.∇+ < a, ∇2 > and invariant measure µ.

slide-32
SLIDE 32

Convergence of Markov chains towards diffusion processes

Consider a Markov chain (Xn) with invariant measure π and transition kernel K approximating a diffusion process with operator L = b.∇+ < a, ∇2 > and invariant measure µ. X is drawn from π, ξ is a jump from X. Xt = 1t≥T ξ + X.

slide-33
SLIDE 33

Convergence of Markov chains towards diffusion processes

Consider a Markov chain (Xn) with invariant measure π and transition kernel K approximating a diffusion process with operator L = b.∇+ < a, ∇2 > and invariant measure µ. X is drawn from π, ξ is a jump from X. Xt = 1t≥T ξ + X. h is the time step of the Markov Chain. If

  • Conditions on L.
  • E[(

(y−X)

h

K(x, dy) − b(X))2] ;

  • E[(

(y−X)2

2h

K(x, dy) − b(X))2] ;

  • higher moments of the jumps

are small then W2(π, µ) is small.

slide-34
SLIDE 34

Convergence of Markov chains towards diffusion processes

Theorem [Strook Varadhan] Consider a family of discrete Markov chains M h defined on Sh ⊂ Rd with transition kernel Kh. Then, if

  • limh→0 supx∈Sh

1 h

  • y∈Sh(y − x)Kh(x, dy) = b,
  • limh→0 supx∈Sh

1 h

  • y∈Sh

(y−x)2 2

Kh(x, dy) = a

  • ∀r > 0, limh→0 supx∈Sh
  • y∈Sh,|y−x|>r Kh(x, y) = 0 Then, for any

T > 0, M h converges weakly on [0, T] toward the diffusion process with infinitesimal generator L.

slide-35
SLIDE 35

Langevin Monte-Carlo algorithm

How to draw points from a smooth log concave measure µ with density f?

slide-36
SLIDE 36

Langevin Monte-Carlo algorithm

How to draw points from a smooth log concave measure µ with density f? µ is the stationary distribution of the diffusion process with generator Lf = ∇ log f.∇ + ∆f, thus we can draw from f by simulating the diffusion operator.

slide-37
SLIDE 37

Langevin Monte-Carlo algorithm

How to draw points from a smooth log concave measure µ with density f? µ is the stationary distribution of the diffusion process with generator Lf = ∇ log f.∇ + ∆f, thus we can draw from f by simulating the diffusion operator. Sample from a discretization of the form Xt+h = Xt + hb(Xt) + √ hξ.

slide-38
SLIDE 38

Langevin Monte-Carlo algorithm

How to draw points from a smooth log concave measure µ with density f? µ is the stationary distribution of the diffusion process with generator Lf = ∇ log f.∇ + ∆f, thus we can draw from f by simulating the diffusion operator. Sample from a discretization of the form Xt+h = Xt + hb(Xt) + √ hξ. Complexity to achieve an accuracy of ǫ between the sampling and the target in W2 is

  • O∗(dǫ−1) if ξ = N(0, 1) (direct approach).
  • O∗(dǫ−2) for more general ξ (Stein’s method).
slide-39
SLIDE 39

Conclusion

Rates in Wasserstein Distance for the muldim CLT. Quantitative convergence result for the invariant measure of an approxi- mation scheme of a diffusion process.

slide-40
SLIDE 40

Conclusion

Rates in Wasserstein Distance for the muldim CLT. Quantitative convergence result for the invariant measure of an approxi- mation scheme of a diffusion process. Density estimation for K-nn graphs. Rates for Langevin Monte-Carlo algorithm.

slide-41
SLIDE 41

Conclusion

Rates in Wasserstein Distance for the muldim CLT. Quantitative convergence result for the invariant measure of an approxi- mation scheme of a diffusion process. Density estimation for K-nn graphs. Rates for Langevin Monte-Carlo algorithm. Too strong assumptions on k as n → ∞, requires new stochas- tic homogenization results.