Stability of the Shannon-Stam Inequality Dan Mikulincer Students - - PowerPoint PPT Presentation

stability of the shannon stam inequality
SMART_READER_LITE
LIVE PREVIEW

Stability of the Shannon-Stam Inequality Dan Mikulincer Students - - PowerPoint PPT Presentation

Stability of the Shannon-Stam Inequality Dan Mikulincer Students Probability Day, 2019 Weizmann Institute of Science Joint work with Ronen Eldan 1 Relative Entropy The central quantity we will deal is relative entropy: Definition (Relative


slide-1
SLIDE 1

Stability of the Shannon-Stam Inequality

Dan Mikulincer Students Probability Day, 2019

Weizmann Institute of Science Joint work with Ronen Eldan 1

slide-2
SLIDE 2

Relative Entropy

The central quantity we will deal is relative entropy: Definition (Relative Entropy) Let X ∼ µ, Y ∼ ν be random vectors in Rd, define the entropy

  • f X, relative to Y as

Ent(X||Y ) = Ent(µ||ν) :=     

  • Rd

ln

if µ ≪ ν ∞

  • therwise

.

2

slide-3
SLIDE 3

The Shannon-Stam Inequality

In 48′ Shannon noted the following inequality, which was later proved by Stam, in 56′. Theorem (Shannon-Stam Inequality) Let X, Y be random vectors in Rd and let G ∼ N(0, I) be a random vector with the law of the standard Gaussian. Then, for any λ ∈ [0, 1] Ent( √ λX + √ 1 − λY ||G) ≤ λEnt(X||G) + (1 − λ)Ent(Y ||G). Moreover, equality holds if and only if X and Y are Gaussians with identical covariances. Remark: Shannon and Stam actually proved an equivalent form of the inequality, called the entropy power inequality. The equivalence was observed by Lieb in 78’.

3

slide-4
SLIDE 4

The Shannon-Stam Inequality

In 48′ Shannon noted the following inequality, which was later proved by Stam, in 56′. Theorem (Shannon-Stam Inequality) Let X, Y be random vectors in Rd and let G ∼ N(0, I) be a random vector with the law of the standard Gaussian. Then, for any λ ∈ [0, 1] Ent( √ λX + √ 1 − λY ||G) ≤ λEnt(X||G) + (1 − λ)Ent(Y ||G). Moreover, equality holds if and only if X and Y are Gaussians with identical covariances. Remark: Shannon and Stam actually proved an equivalent form of the inequality, called the entropy power inequality. The equivalence was observed by Lieb in 78’.

3

slide-5
SLIDE 5

Stability

Define the deficit δλ(X, Y ) = λEnt(X||G)+(1−λ)Ent(Y ||G)−Ent( √ λX+ √ 1 − λY ||G). The question of stability deals with approximate equality cases. Question Suppose that δλ(X, Y ) is small, must X and Y be ’close’ to Gaussian vectors, which are themselves ’close’ to each other? We will now show that the deficit can be bounded in terms of a stochastic process and that in certain cases this gives a positive answer to the above question.

4

slide-6
SLIDE 6

Stability

Define the deficit δλ(X, Y ) = λEnt(X||G)+(1−λ)Ent(Y ||G)−Ent( √ λX+ √ 1 − λY ||G). The question of stability deals with approximate equality cases. Question Suppose that δλ(X, Y ) is small, must X and Y be ’close’ to Gaussian vectors, which are themselves ’close’ to each other? We will now show that the deficit can be bounded in terms of a stochastic process and that in certain cases this gives a positive answer to the above question.

4

slide-7
SLIDE 7

Stability

Define the deficit δλ(X, Y ) = λEnt(X||G)+(1−λ)Ent(Y ||G)−Ent( √ λX+ √ 1 − λY ||G). The question of stability deals with approximate equality cases. Question Suppose that δλ(X, Y ) is small, must X and Y be ’close’ to Gaussian vectors, which are themselves ’close’ to each other? We will now show that the deficit can be bounded in terms of a stochastic process and that in certain cases this gives a positive answer to the above question.

4

slide-8
SLIDE 8

  • llmer Martingales

We focus on the one dimensional case and λ = 1

2.

Let X be centered random variable, and let Bt denote a standard Brownian motion. F¨

  • lmmer (1984) and then Lehec (2011) have

shown that there exists a process ΓX

t , such that

  • 1
  • ΓX

t dBt has the law of X.

  • Ent(X||G) = 1

2 1

  • E
  • (1 − ΓX

t )2 1−t

dt.

  • If HX

t is another process such that 1

  • HX

t dBt has the law of X, 1

  • E
  • (1 − HX

t )2

1 − t dt ≥

1

  • E
  • (1 − ΓX

t )2

1 − t dt.

5

slide-9
SLIDE 9

  • llmer Martingales

We focus on the one dimensional case and λ = 1

2.

Let X be centered random variable, and let Bt denote a standard Brownian motion. F¨

  • lmmer (1984) and then Lehec (2011) have

shown that there exists a process ΓX

t , such that

  • 1
  • ΓX

t dBt has the law of X.

  • Ent(X||G) = 1

2 1

  • E
  • (1 − ΓX

t )2 1−t

dt.

  • If HX

t is another process such that 1

  • HX

t dBt has the law of X, 1

  • E
  • (1 − HX

t )2

1 − t dt ≥

1

  • E
  • (1 − ΓX

t )2

1 − t dt.

5

slide-10
SLIDE 10

  • llmer Martingales

We focus on the one dimensional case and λ = 1

2.

Let X be centered random variable, and let Bt denote a standard Brownian motion. F¨

  • lmmer (1984) and then Lehec (2011) have

shown that there exists a process ΓX

t , such that

  • 1
  • ΓX

t dBt has the law of X.

  • Ent(X||G) = 1

2 1

  • E
  • (1 − ΓX

t )2 1−t

dt.

  • If HX

t is another process such that 1

  • HX

t dBt has the law of X, 1

  • E
  • (1 − HX

t )2

1 − t dt ≥

1

  • E
  • (1 − ΓX

t )2

1 − t dt.

5

slide-11
SLIDE 11

  • llmer Martingales

We focus on the one dimensional case and λ = 1

2.

Let X be centered random variable, and let Bt denote a standard Brownian motion. F¨

  • lmmer (1984) and then Lehec (2011) have

shown that there exists a process ΓX

t , such that

  • 1
  • ΓX

t dBt has the law of X.

  • Ent(X||G) = 1

2 1

  • E
  • (1 − ΓX

t )2 1−t

dt.

  • If HX

t is another process such that 1

  • HX

t dBt has the law of X, 1

  • E
  • (1 − HX

t )2

1 − t dt ≥

1

  • E
  • (1 − ΓX

t )2

1 − t dt.

5

slide-12
SLIDE 12

Bounding the Deficit

Now, for X, Y random variables, take two independent Brownian motions BX

t , BY t

and ΓX

t , ΓY t as above. Note that if G1 and G2 are

standard Gaussians, then for any a, b ∈ R aG1 + bG2

law

=

  • a2 + b2G,

where G is another standard Gaussian. This implies X + Y √ 2 = 1 √ 2  

1

  • ΓX

t dBX t + 1

  • ΓY

t dBY t

  law =

1

  • (ΓX

t )2 + (ΓY t )2

2 dBt. for some Brownian motion Bt.

6

slide-13
SLIDE 13

Bounding the Deficit

Now, for X, Y random variables, take two independent Brownian motions BX

t , BY t

and ΓX

t , ΓY t as above. Note that if G1 and G2 are

standard Gaussians, then for any a, b ∈ R aG1 + bG2

law

=

  • a2 + b2G,

where G is another standard Gaussian. This implies X + Y √ 2 = 1 √ 2  

1

  • ΓX

t dBX t + 1

  • ΓY

t dBY t

  law =

1

  • (ΓX

t )2 + (ΓY t )2

2 dBt. for some Brownian motion Bt.

6

slide-14
SLIDE 14

Bounding the Deficit

Now, for X, Y random variables, take two independent Brownian motions BX

t , BY t

and ΓX

t , ΓY t as above. Note that if G1 and G2 are

standard Gaussians, then for any a, b ∈ R aG1 + bG2

law

=

  • a2 + b2G,

where G is another standard Gaussian. This implies X + Y √ 2 = 1 √ 2  

1

  • ΓX

t dBX t + 1

  • ΓY

t dBY t

  law =

1

  • (ΓX

t )2 + (ΓY t )2

2 dBt. for some Brownian motion Bt.

6

slide-15
SLIDE 15

Bounding the Deficit

If Ht =

  • (ΓX

t )2+(ΓY t )2

2

, Ent

  • X+Y

√ 2 ||G

  • ≤ 1

2 1

  • E
  • (1 − Ht)2

1−t

dt. Consequently, 2δ 1

2 (X, Y ) ≥

1

  • E
  • (1 − ΓY

t )2

2(1 − t) + E

  • (1 − ΓX

t )2

2(1 − t) − E

  • (1 − Ht)2

1 − t dt =

1

  • 2E[Ht] − E[ΓX

t ] − E[ΓY t ]

1 − t . Using concavity of the square root then shows δ 1

2 (X, Y )

1

  • E
  • (ΓX

t − ΓY t )2

(1 − t)(ΓX

t + ΓY t )

  • dt.

7

slide-16
SLIDE 16

Bounding the Deficit

If Ht =

  • (ΓX

t )2+(ΓY t )2

2

, Ent

  • X+Y

√ 2 ||G

  • ≤ 1

2 1

  • E
  • (1 − Ht)2

1−t

dt. Consequently, 2δ 1

2 (X, Y ) ≥

1

  • E
  • (1 − ΓY

t )2

2(1 − t) + E

  • (1 − ΓX

t )2

2(1 − t) − E

  • (1 − Ht)2

1 − t dt =

1

  • 2E[Ht] − E[ΓX

t ] − E[ΓY t ]

1 − t . Using concavity of the square root then shows δ 1

2 (X, Y )

1

  • E
  • (ΓX

t − ΓY t )2

(1 − t)(ΓX

t + ΓY t )

  • dt.

7

slide-17
SLIDE 17

Bounding the Deficit

If Ht =

  • (ΓX

t )2+(ΓY t )2

2

, Ent

  • X+Y

√ 2 ||G

  • ≤ 1

2 1

  • E
  • (1 − Ht)2

1−t

dt. Consequently, 2δ 1

2 (X, Y ) ≥

1

  • E
  • (1 − ΓY

t )2

2(1 − t) + E

  • (1 − ΓX

t )2

2(1 − t) − E

  • (1 − Ht)2

1 − t dt =

1

  • 2E[Ht] − E[ΓX

t ] − E[ΓY t ]

1 − t . Using concavity of the square root then shows δ 1

2 (X, Y )

1

  • E
  • (ΓX

t − ΓY t )2

(1 − t)(ΓX

t + ΓY t )

  • dt.

7

slide-18
SLIDE 18

Bounding the Deficit

If Ht =

  • (ΓX

t )2+(ΓY t )2

2

, Ent

  • X+Y

√ 2 ||G

  • ≤ 1

2 1

  • E
  • (1 − Ht)2

1−t

dt. Consequently, 2δ 1

2 (X, Y ) ≥

1

  • E
  • (1 − ΓY

t )2

2(1 − t) + E

  • (1 − ΓX

t )2

2(1 − t) − E

  • (1 − Ht)2

1 − t dt =

1

  • 2E[Ht] − E[ΓX

t ] − E[ΓY t ]

1 − t . Using concavity of the square root then shows δ 1

2 (X, Y )

1

  • E
  • (ΓX

t − ΓY t )2

(1 − t)(ΓX

t + ΓY t )

  • dt.

7

slide-19
SLIDE 19

Log-Concave Measures

We say that X is strongly log-concave if it has a density f such that − ln(f )′′ ≥ 1. Fact: if X is strongly log-concave then ΓX

t ≤ 1 almost surely.

So, if both X and Y are strongly log-concave δ 1

2 (X, Y )

1

  • E

(ΓX

t − ΓY t )2

1 − t

  • dt

We use this to derive a quantitative stability bound.

8

slide-20
SLIDE 20

Log-Concave Measures

We say that X is strongly log-concave if it has a density f such that − ln(f )′′ ≥ 1. Fact: if X is strongly log-concave then ΓX

t ≤ 1 almost surely.

So, if both X and Y are strongly log-concave δ 1

2 (X, Y )

1

  • E

(ΓX

t − ΓY t )2

1 − t

  • dt

We use this to derive a quantitative stability bound.

8

slide-21
SLIDE 21

Log-Concave Measures

We say that X is strongly log-concave if it has a density f such that − ln(f )′′ ≥ 1. Fact: if X is strongly log-concave then ΓX

t ≤ 1 almost surely.

So, if both X and Y are strongly log-concave δ 1

2 (X, Y )

1

  • E

(ΓX

t − ΓY t )2

1 − t

  • dt

We use this to derive a quantitative stability bound.

8

slide-22
SLIDE 22

Log-Concave Measures

We say that X is strongly log-concave if it has a density f such that − ln(f )′′ ≥ 1. Fact: if X is strongly log-concave then ΓX

t ≤ 1 almost surely.

So, if both X and Y are strongly log-concave δ 1

2 (X, Y )

1

  • E

(ΓX

t − ΓY t )2

1 − t

  • dt

We use this to derive a quantitative stability bound.

8

slide-23
SLIDE 23

Log-Concave Measures

1

  • E

(ΓX

t − ΓY t )2

1 − t

  • dt

1

  • Var(ΓX

t )dt + 1

  • Var(ΓY

t )dt + 1

  • E
  • ΓX

t

  • − E
  • ΓY

t

2 dt ≥W2

2(X, G1) + W2 2(Y , G2) + W2 2(G1, G2).

Here, W2 denotes the Wasserstein distance and G1 =

1

  • E[ΓX

t ]dBX t , G2 = 1

  • E[ΓY

t ]dBY t

are Gaussians.

9

slide-24
SLIDE 24

Log-Concave Measures

1

  • E

(ΓX

t − ΓY t )2

1 − t

  • dt

1

  • Var(ΓX

t )dt + 1

  • Var(ΓY

t )dt + 1

  • E
  • ΓX

t

  • − E
  • ΓY

t

2 dt ≥W2

2(X, G1) + W2 2(Y , G2) + W2 2(G1, G2).

Here, W2 denotes the Wasserstein distance and G1 =

1

  • E[ΓX

t ]dBX t , G2 = 1

  • E[ΓY

t ]dBY t

are Gaussians.

9

slide-25
SLIDE 25

Thank You

10