Hypercontractivity and Information Theory Chandra Nair The Chinese - - PowerPoint PPT Presentation

hypercontractivity and information theory
SMART_READER_LITE
LIVE PREVIEW

Hypercontractivity and Information Theory Chandra Nair The Chinese - - PowerPoint PPT Presentation

Hypercontractivity and Information Theory Chandra Nair The Chinese University of Hong Kong August 25, 2016 Introduction Hypercontractive Inequalities: a review Hypercontractive inequalities: an introduction Disclaimer : If you are a


slide-1
SLIDE 1

Hypercontractivity and Information Theory

Chandra Nair

The Chinese University of Hong Kong August 25, 2016

slide-2
SLIDE 2

Introduction Hypercontractive Inequalities: a review

Hypercontractive inequalities: an introduction

Disclaimer: If you are a mathematician Hypercontractivity is usually discussed using the language of Markov semi-groups In this talk, I will use conditional expectations (snapshot rather than a time-indexed family) to discuss hypercontractivity

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 2 / 25

slide-3
SLIDE 3

Introduction Hypercontractive Inequalities: a review

Hypercontractive inequalities: an introduction

Disclaimer: If you are a mathematician Hypercontractivity is usually discussed using the language of Markov semi-groups In this talk, I will use conditional expectations (snapshot rather than a time-indexed family) to discuss hypercontractivity Elementary result Conditional expectation (a Markov operator) is contractive E(X|Y )p ≤ Xp, ∀p ≥ 1, where Xp = E(|X|p)1/p.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 2 / 25

slide-4
SLIDE 4

Introduction Hypercontractive Inequalities: a review

Hypercontractive inequalities: an introduction

Disclaimer: If you are a mathematician Hypercontractivity is usually discussed using the language of Markov semi-groups In this talk, I will use conditional expectations (snapshot rather than a time-indexed family) to discuss hypercontractivity Elementary result Conditional expectation (a Markov operator) is contractive E(X|Y )p ≤ Xp, ∀p ≥ 1, where Xp = E(|X|p)1/p. Hypercontractivity (X, Y ) ∼ µXY satisfies (p, q)-hypercontractivity (1 ≤ q ≤ p) if E(g(Y )|X)p ≤ g(Y )q ∀g ≥ 0.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 2 / 25

slide-5
SLIDE 5

Introduction Hypercontractive Inequalities: a review

Background

Hypercontractive inequalities have been used in Quantum field theory Establish best constants in classical inequalities Bounds on semi-group kernels

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 3 / 25

slide-6
SLIDE 6

Introduction Hypercontractive Inequalities: a review

Background

Hypercontractive inequalities have been used in Quantum field theory Establish best constants in classical inequalities Bounds on semi-group kernels Boolean function analysis (KKL theorem on influences) This talk: relation to (network) information theory equivalent characterizations why should information-theorists care why this relationship may interest mathematicians

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 3 / 25

slide-7
SLIDE 7

Part I Equivalent characterizations of hypercontractive inequalities using information measures

slide-8
SLIDE 8

Equivalent characterizations Hypercontractivity

Elementary exercises

Definition: (X, Y ) ∼ µXY is (p, q)-hypercontractive for 1 ≤ q ≤ p if E(g(Y )|X)p ≤ g(Y )q ∀g ≥ 0.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 5 / 25

slide-9
SLIDE 9

Equivalent characterizations Hypercontractivity

Elementary exercises

Definition: (X, Y ) ∼ µXY is (p, q)-hypercontractive for 1 ≤ q ≤ p if E(g(Y )|X)p ≤ g(Y )q ∀g ≥ 0. An equivalent condition: (X, Y ) ∼ µXY is (p, q)-hypercontractive for 1 ≤ q ≤ p if and only if E(f(X)g(Y )) ≤ f(X)p′g(Y )q ∀f, g ≥ 0, where p′ =

p p−1, the H¨

  • lder conjugate.

Proof: An application of H¨

  • lder’s inequality.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 5 / 25

slide-10
SLIDE 10

Equivalent characterizations Hypercontractivity

Elementary exercises

Definition: (X, Y ) ∼ µXY is (p, q)-hypercontractive for 1 ≤ q ≤ p if E(g(Y )|X)p ≤ g(Y )q ∀g ≥ 0. An equivalent condition: (X, Y ) ∼ µXY is (p, q)-hypercontractive for 1 ≤ q ≤ p if and only if E(f(X)g(Y )) ≤ f(X)p′g(Y )q ∀f, g ≥ 0, where p′ =

p p−1, the H¨

  • lder conjugate.

Proof: An application of H¨

  • lder’s inequality.

Tensorization property: Let (X1, Y1) ∼ µ1

XY be independent of (X2, Y2) ∼ µ2 XY ,

and let (X1, Y1) and (X2, Y2) be (p, q)-hypercontractive. Then ((X1, X2), (Y1, Y2)) is also (p, q)-hypercontractive.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 5 / 25

slide-11
SLIDE 11

Equivalent characterizations Hypercontractivity

Elementary exercises continued...

Define: rp(X; Y ) = 1

p × {inf q : (X, Y ) is (p, q)-hypercontractive.}

1 rp(X; Y ) is decreasing in p. 2 The p → ∞ limit of rp(X; Y ) is given by

r∞(X; Y ) = inf

  • r : E
  • eE(log g(Y )|X)

≤ g(Y )r ∀g > 0

  • .

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 6 / 25

slide-12
SLIDE 12

Equivalent characterizations Hypercontractivity

Elementary exercises continued...

Define: rp(X; Y ) = 1

p × {inf q : (X, Y ) is (p, q)-hypercontractive.}

1 rp(X; Y ) is decreasing in p. 2 The p → ∞ limit of rp(X; Y ) is given by

r∞(X; Y ) = inf

  • r : E
  • eE(log g(Y )|X)

≤ g(Y )r ∀g > 0

  • .

A (slightly) non-trivial inequality: If (X, Y ) is (p, q)-hypercontractive then q − 1 p − 1 ≥ ρ2

m(X; Y ),

where ρ2

m(X; Y ) is the maximal correlation.

Maximal correlation: ρm(X; Y ) = supf,g E(f(X)g(Y )) where f, g satisfy E(f(X)) = 0 = E(g(Y )) and E(f2(X)) = 1 = E(g2(Y )). A proof follows using perturbations from constant functions along directions induced by the optimizers for maximal correlation.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 6 / 25

slide-13
SLIDE 13

Equivalent characterizations Hypercontractivity

Equivalent characterizations

Ahlswede-G´ acs ’76 r∞(X; Y ) = sup

νX≪µx

D(νY µY ) D(νXµX), where νY is the (output) distribution induced by operating the same channel µY |X on the input distribution νX. Remark: G´ acs (independently) observed and used the hypercontraction of the Markov operator to study: Images of a set via a channel or equivalently Region where measure concentrates when a noise operator is applied to a set

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 7 / 25

slide-14
SLIDE 14

Equivalent characterizations Hypercontractivity

Equivalent characterizations

Ahlswede-G´ acs ’76 r∞(X; Y ) = sup

νX≪µx

D(νY µY ) D(νXµX), where νY is the (output) distribution induced by operating the same channel µY |X on the input distribution νX. Remark: G´ acs (independently) observed and used the hypercontraction of the Markov operator to study: Images of a set via a channel or equivalently Region where measure concentrates when a noise operator is applied to a set Anantharam-Gohari-Kamath-Nair ’13 r∞(X; Y ) = sup

νX≪µx

D(νY µY ) D(νXµX) = sup

U:U−X−Y

I(U; Y ) I(U; X) = inf {λ : KX[H(Y ) − λH(X)]µ = Hµ(Y ) − λHµ(X)} Remark: Our interest was motivated by the tensorization property (clear later)

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 7 / 25

slide-15
SLIDE 15

Equivalent characterizations Hypercontractivity

Entire regime, p ≥ 1

The following conditions are equivalent:

1

E(g(Y )|X)p ≤ g(Y )q ∀ g ≥ 0.

2

E(f(X)g(Y )) ≤ f(X)p′g(Y )q ∀ f, g ≥ 0.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 8 / 25

slide-16
SLIDE 16

Equivalent characterizations Hypercontractivity

Entire regime, p ≥ 1

The following conditions are equivalent:

1

E(g(Y )|X)p ≤ g(Y )q ∀ g ≥ 0.

2

E(f(X)g(Y )) ≤ f(X)p′g(Y )q ∀ f, g ≥ 0.

3 Using relative entropies (Carlen – Cordero-Erasquin ’09, Nair ’14, Friedgut ’15)

1 p′ D(νXµX) + 1 q D(νY µY ) ≤ D(νXY µXY ) ∀νXY ≪ µXY .

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 8 / 25

slide-17
SLIDE 17

Equivalent characterizations Hypercontractivity

Entire regime, p ≥ 1

The following conditions are equivalent:

1

E(g(Y )|X)p ≤ g(Y )q ∀ g ≥ 0.

2

E(f(X)g(Y )) ≤ f(X)p′g(Y )q ∀ f, g ≥ 0.

3 Using relative entropies (Carlen – Cordero-Erasquin ’09, Nair ’14, Friedgut ’15)

1 p′ D(νXµX) + 1 q D(νY µY ) ≤ D(νXY µXY ) ∀νXY ≪ µXY .

4 Using mutual information and auxiliary variables (Nair ’14)

1 p′ I(U; X) + 1 q I(U; Y ) ≤ I(U; XY ) ∀µU|XY .

5 Using convex envelopes (Nair ’14)

KXY 1 p′ H(X) + 1 q H(Y ) − H(XY )

  • µXY

= 1 p′ Hµ(X) + 1 q Hµ(Y ) − Hµ(XY ).

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 8 / 25

slide-18
SLIDE 18

Equivalent characterizations Hypercontractivity

Some remarks on equivalence proof

Functional form = ⇒ mutual information condition Use tensorization property: f(Xn) = 1A, where A = {xn : (un

0, xn) is jointly typical}

g(Y n) = 1B, where B = {yn : (un

0, yn) is jointly typical}

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 9 / 25

slide-19
SLIDE 19

Equivalent characterizations Hypercontractivity

Some remarks on equivalence proof

Functional form = ⇒ mutual information condition Use tensorization property: f(Xn) = 1A, where A = {xn : (un

0, xn) is jointly typical}

g(Y n) = 1B, where B = {yn : (un

0, yn) is jointly typical}

Mutual information condition = ⇒ relative entropy condition A (natural) perturbation argument

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 9 / 25

slide-20
SLIDE 20

Equivalent characterizations Hypercontractivity

Some remarks on equivalence proof

Functional form = ⇒ mutual information condition Use tensorization property: f(Xn) = 1A, where A = {xn : (un

0, xn) is jointly typical}

g(Y n) = 1B, where B = {yn : (un

0, yn) is jointly typical}

Mutual information condition = ⇒ relative entropy condition A (natural) perturbation argument Relative entropy condition = ⇒ functional form Let f(X)p′ = g(Y )q = 1. Define νXY = 1

Z µXY f(X)g(Y ).

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 9 / 25

slide-21
SLIDE 21

Equivalent characterizations Hypercontractivity

Some remarks on equivalence proof

Functional form = ⇒ mutual information condition Use tensorization property: f(Xn) = 1A, where A = {xn : (un

0, xn) is jointly typical}

g(Y n) = 1B, where B = {yn : (un

0, yn) is jointly typical}

Mutual information condition = ⇒ relative entropy condition A (natural) perturbation argument Relative entropy condition = ⇒ functional form Let f(X)p′ = g(Y )q = 1. Define νXY = 1

Z µXY f(X)g(Y ).

D(νXY µXY ) − 1 p′ D(νXµX) − 1 q D(νY µY ) = log 1 Z + 1 p′ Eν

  • log µXf(X)p′

νX

  • + 1

q Eν

  • log µY g(Y )q

νY

  • ≤ log 1

Z .

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 9 / 25

slide-22
SLIDE 22

Equivalent characterizations Brascamp-Lieb-type inequalities

Brascamp-Lieb-type inequalities

Brascamp Lieb-type inequalities (X1, .., Xm) ∼ µXY is said to satisfy Brascamp-Lieb type inequalities with parameters (λ1, λ2, · · · , λm, C) with λi ≥ 0 if E m

  • i=1

fi(Xi)

  • ≤ 2C

m

  • i=1

fi(Xi)λi ∀ {fi}.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 10 / 25

slide-23
SLIDE 23

Equivalent characterizations Brascamp-Lieb-type inequalities

Brascamp-Lieb-type inequalities

Brascamp Lieb-type inequalities (X1, .., Xm) ∼ µXY is said to satisfy Brascamp-Lieb type inequalities with parameters (λ1, λ2, · · · , λm, C) with λi ≥ 0 if E m

  • i=1

fi(Xi)

  • ≤ 2C

m

  • i=1

fi(Xi)λi ∀ {fi}. Hypercontractivity is a special case of above, C = 0 and m = 2 These parameters satisfy tensorization property Strengthen H¨

  • lder’s inequality

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 10 / 25

slide-24
SLIDE 24

Equivalent characterizations Brascamp-Lieb-type inequalities

Equivalent characterizations: Brascamp-Lieb type inequalities

Let X1, .., Xm ∼ µX1,...,Xm. The following conditions are equivalent:

1

E(

m

  • i=1

fi(Xi)) ≤ 2C

m

  • i=1

fi(Xi)λi ∀ fi ≥ 0.

2

E(

m

  • i=2

fi(Xi)|X1)λ′

1 ≤ 2C

m

  • i=2

fi(Xi)λi ∀ fi ≥ 0. 1 λ′

1

= 1 − 1 λ1 .

3 Using relative entropies (Carlen – Cordero-Erasquin ’09)

m

  • i=1

1 λi D(νXiµXi) ≤ C + D(νX1,..,XmµX1,...,Xm) ∀νX1,...,Xm ≪ µX1,..,Xm.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 11 / 25

slide-25
SLIDE 25

Equivalent characterizations Brascamp-Lieb-type inequalities

Equivalent characterizations: Brascamp-Lieb type inequalities

Let X1, .., Xm ∼ µX1,...,Xm. The following conditions are equivalent:

1

E(

m

  • i=1

fi(Xi)) ≤ 2C

m

  • i=1

fi(Xi)λi ∀ fi ≥ 0.

2

E(

m

  • i=2

fi(Xi)|X1)λ′

1 ≤ 2C

m

  • i=2

fi(Xi)λi ∀ fi ≥ 0. 1 λ′

1

= 1 − 1 λ1 .

3 Using relative entropies (Carlen – Cordero-Erasquin ’09)

m

  • i=1

1 λi D(νXiµXi) ≤ C + D(νX1,..,XmµX1,...,Xm) ∀νX1,...,Xm ≪ µX1,..,Xm.

4 When C = 0 then it is also equivalent to (earlier proof immediately extends)

m

  • i=1

1 λi I(U; Xi) ≤ I(U; X1, ..., Xm) ∀µU|X1,..,Xm.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 11 / 25

slide-26
SLIDE 26

Equivalent characterizations Brascamp-Lieb-type inequalities

Ahlswede-Gacs type limit (special case)

Interesting limit: for information theorists Let λ′

1 → ∞ and, λi → ∞ such that ri = λi λ′

1 , i = 2, .., m.

The functional characterization (Bracscamp-Lieb) reduces to eE(m

i=2 log fi(Xi)|X1) ≤ 2C

m

  • i=2

fi(Xi)ri ∀fi > 0,

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 12 / 25

slide-27
SLIDE 27

Equivalent characterizations Brascamp-Lieb-type inequalities

Ahlswede-Gacs type limit (special case)

Interesting limit: for information theorists Let λ′

1 → ∞ and, λi → ∞ such that ri = λi λ′

1 , i = 2, .., m.

The functional characterization (Bracscamp-Lieb) reduces to eE(m

i=2 log fi(Xi)|X1) ≤ 2C

m

  • i=2

fi(Xi)ri ∀fi > 0, Equivalent characterization of (Carlen – Cordero-Erasquin ’09) reduces to

m

  • i=2

1 ri D(νXiµXi) ≤ C + D(νX1µX1) ∀νX1 ≪ µX1. Here νXi = νX1 ⊙ µXi|X1, i.e. channels from X1 to Xi are preserved.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 12 / 25

slide-28
SLIDE 28

Equivalent characterizations Brascamp-Lieb-type inequalities

Ahlswede-Gacs type limit (special case)

Interesting limit: for information theorists Let λ′

1 → ∞ and, λi → ∞ such that ri = λi λ′

1 , i = 2, .., m.

The functional characterization (Bracscamp-Lieb) reduces to eE(m

i=2 log fi(Xi)|X1) ≤ 2C

m

  • i=2

fi(Xi)ri ∀fi > 0, Equivalent characterization of (Carlen – Cordero-Erasquin ’09) reduces to

m

  • i=2

1 ri D(νXiµXi) ≤ C + D(νX1µX1) ∀νX1 ≪ µX1. Here νXi = νX1 ⊙ µXi|X1, i.e. channels from X1 to Xi are preserved. Remark: Work by (Liu et. al. ’16): derive above equivalence directly extending the technique of (Carlen – Cordero-Erasquin ’09) and not as a limit.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 12 / 25

slide-29
SLIDE 29

Equivalent characterizations Reverse inequalities

Definitions: Reverse Inequalities

Reverse Hypercontractivity (X, Y ) ∼ µXY is said to be (λ1, λ2)-reverse-hypercontractive if E(f(X)g(Y )) ≥ f(X)λ1g(Y )λ2 ∀ f(X), g(Y ). Interested in λ1, λ2 ≤ 1 and

1 λ1 + 1 λ2 ≤ 1. (Notation: Zλ = E(|Z|λ)1/λ.)

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 13 / 25

slide-30
SLIDE 30

Equivalent characterizations Reverse inequalities

Definitions: Reverse Inequalities

Reverse Hypercontractivity (X, Y ) ∼ µXY is said to be (λ1, λ2)-reverse-hypercontractive if E(f(X)g(Y )) ≥ f(X)λ1g(Y )λ2 ∀ f(X), g(Y ). Interested in λ1, λ2 ≤ 1 and

1 λ1 + 1 λ2 ≤ 1. (Notation: Zλ = E(|Z|λ)1/λ.)

Reverse Brascamp-Lieb-type inequalities (X1, .., Xm) ∼ µXY is said to satisfy reverse-Brascamp-Lieb type inequalities with parameters (λ1, λ2, · · · , λm, C) if E(

m

  • i=1

fi(Xi)) ≥ 2C

m

  • i=1

fi(Xi)λi ∀ {fi}.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 13 / 25

slide-31
SLIDE 31

Equivalent characterizations Reverse inequalities

Definitions: Reverse Inequalities

Reverse Hypercontractivity (X, Y ) ∼ µXY is said to be (λ1, λ2)-reverse-hypercontractive if E(f(X)g(Y )) ≥ f(X)λ1g(Y )λ2 ∀ f(X), g(Y ). Interested in λ1, λ2 ≤ 1 and

1 λ1 + 1 λ2 ≤ 1. (Notation: Zλ = E(|Z|λ)1/λ.)

Reverse Brascamp-Lieb-type inequalities (X1, .., Xm) ∼ µXY is said to satisfy reverse-Brascamp-Lieb type inequalities with parameters (λ1, λ2, · · · , λm, C) if E(

m

  • i=1

fi(Xi)) ≥ 2C

m

  • i=1

fi(Xi)λi ∀ {fi}. Reverse-Hypercontractivity is a special case of reverse-Brascamp-Lieb These parameters satisfy tensorization property

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 13 / 25

slide-32
SLIDE 32

Equivalent characterizations Reverse inequalities

Reverse Brascamp-Lieb-type inequalities

Beigi-Nair ’16 Let X1, ..., Xm be finite valued random variables and let µ denote their joint probability mass function with marginals µi, 1 ≤ i ≤ m. Let λ1, ..., λm be non-zero

  • numbers. Let S+ = {i : λi > 0} be the set containing the indices of the positive λi’s.

Then for any C ∈ R the followings are equivalent: (i) For all positive functions f1, .., fm we have E m

  • i=1

fi(Xi)

  • ≥ 2C

m

  • i=1

fi(Xi)λi. (ii) For all probability mass functions νi for i ∈ S+, there exists a probability mass function ν, consistent with the given marginals νi, i ∈ S+ such that

m

  • i=1

1 λi D(νiµi) ≥ C + D(νµ). For i / ∈ S+, νi is the marginal induced by the p.m.f. ν.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 14 / 25

slide-33
SLIDE 33

Recap Saw: hypercontractive inequalities can be equivalently characterized using information measures

slide-34
SLIDE 34

Recap Saw: hypercontractive inequalities can be equivalently characterized using information measures Part II Why should some information-theorists care?

slide-35
SLIDE 35

Multiuser information theory Review

(Degraded) broadcast channel

(M1, M2) Encoder Xn W(y, z|x) Y n Zn Decoder 1 Decoder 2 ˆ M1 ˆ M2

Figure 1: Discrete memoryless broadcast channel

Degraded: A broadcast channel is degraded if W(z|x) =

y W ′(z|y)W(y|x)

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 16 / 25

slide-36
SLIDE 36

Multiuser information theory Review

(Degraded) broadcast channel

(M1, M2) Encoder Xn W(y, z|x) Y n Zn Decoder 1 Decoder 2 ˆ M1 ˆ M2

Figure 1: Discrete memoryless broadcast channel

Degraded: A broadcast channel is degraded if W(z|x) =

y W ′(z|y)W(y|x)

Particular sub-setting: Y = X Key Question: What is the capacity region (or union of achievable rate pairs)?

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 16 / 25

slide-37
SLIDE 37

Multiuser information theory Review

Capacity region characterization

(Cover ’72, Gallager ’74) The capacity region, C, is given by the union of rate pairs (R1, R2) satisfying R2 ≤ I(U; Z) R1 ≤ H(X|U) for some U such that U − X − Z is Markov.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 17 / 25

slide-38
SLIDE 38

Multiuser information theory Review

Capacity region characterization

(Cover ’72, Gallager ’74) The capacity region, C, is given by the union of rate pairs (R1, R2) satisfying R2 ≤ I(U; Z) R1 ≤ H(X|U) for some U such that U − X − Z is Markov. Gallager’s converse proof: Single-letterization argument Explicit identification of auxiliary U in terms of other variables induced by a given code Remark: There are some important settings where single-letter achievable regions (in terms of auxiliaries) lack a converse, and where there is evidence to suggest that the achievable regions are optimal

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 17 / 25

slide-39
SLIDE 39

Multiuser information theory Review

Capacity region characterization

(Cover ’72, Gallager ’74) The capacity region, C, is given by the union of rate pairs (R1, R2) satisfying R2 ≤ I(U; Z) R1 ≤ H(X|U) for some U such that U − X − Z is Markov. Gallager’s converse proof: Single-letterization argument Explicit identification of auxiliary U in terms of other variables induced by a given code Remark: There are some important settings where single-letter achievable regions (in terms of auxiliaries) lack a converse, and where there is evidence to suggest that the achievable regions are optimal Question: Can we provide an alternate proof to the capacity region (single-letter expression) that does not involve explicit identification of auxiliaries

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 17 / 25

slide-40
SLIDE 40

Multiuser information theory Use of hypercontractivity

Alternate converse

Alternate characterization of capacity region max(R1,R2)∈C R1 + λR2 = maxµX λIµ(X; Z) + CX[H(X) − λI(X; Z)]µ. Remarks Supporting hyperplane characterization of a convex region Interested in λ ≥ 1 Key: Sub-additivity of CX[H(X) − λI(X; Z)]µ implies optimality (converse)

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 18 / 25

slide-41
SLIDE 41

Multiuser information theory Use of hypercontractivity

Alternate converse

Alternate characterization of capacity region max(R1,R2)∈C R1 + λR2 = maxµX λIµ(X; Z) + CX[H(X) − λI(X; Z)]µ. Remarks Supporting hyperplane characterization of a convex region Interested in λ ≥ 1 Key: Sub-additivity of CX[H(X) − λI(X; Z)]µ implies optimality (converse) Lemma Sub-additivity of CX[H(X) − λI(X; Z)]µ is equivalent to tensorization property of r∞(X; Z). Proof: follows from an equivalent characterization of r∞(X; Z) Tensorization property of hypercontractivity region: a simple exercise No identification of auxiliary variables

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 18 / 25

slide-42
SLIDE 42

Multiuser information theory Use of hypercontractivity

Alternate converse

Alternate characterization of capacity region max(R1,R2)∈C R1 + λR2 = maxµX λIµ(X; Z) + CX[H(X) − λI(X; Z)]µ. Remarks Supporting hyperplane characterization of a convex region Interested in λ ≥ 1 Key: Sub-additivity of CX[H(X) − λI(X; Z)]µ implies optimality (converse) Lemma Sub-additivity of CX[H(X) − λI(X; Z)]µ is equivalent to tensorization property of r∞(X; Z). Proof: follows from an equivalent characterization of r∞(X; Z) Tensorization property of hypercontractivity region: a simple exercise No identification of auxiliary variables Our original interest in hypercontractivity came from its tensorization property

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 18 / 25

slide-43
SLIDE 43

Multiuser information theory Use of hypercontractivity

Remarks

Beigi-Gohari ’15 The entire hypercontractive region’s tensorization property implies optimality of Gray-Wyner source coding problem

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 19 / 25

slide-44
SLIDE 44

Multiuser information theory Use of hypercontractivity

Remarks

Beigi-Gohari ’15 The entire hypercontractive region’s tensorization property implies optimality of Gray-Wyner source coding problem Recall: There are some important settings where single-letter achievable regions (in terms of auxiliaries) lack a converse, and where there is evidence to suggest that the achievable regions are optimal Two receiver discrete memoryless broadcast channel Gaussian interference channel Some sub-classes of broadcast channels with three or more receivers Sum-capacity of interference channels with very weak interference Optimality in these settings would be implied by showing sub-additivity of certain functionals.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 19 / 25

slide-45
SLIDE 45

Multiuser information theory Use of hypercontractivity

Remarks

Beigi-Gohari ’15 The entire hypercontractive region’s tensorization property implies optimality of Gray-Wyner source coding problem Recall: There are some important settings where single-letter achievable regions (in terms of auxiliaries) lack a converse, and where there is evidence to suggest that the achievable regions are optimal Two receiver discrete memoryless broadcast channel Gaussian interference channel Some sub-classes of broadcast channels with three or more receivers Sum-capacity of interference channels with very weak interference Optimality in these settings would be implied by showing sub-additivity of certain functionals. Questions

1 Are these sub-additivity questions equivalent to showing that certain functional

inequalities satisfy a tensorization property?

2 Do the corresponding functional inequalities have an operational link with the

corresponding coding questions?

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 19 / 25

slide-46
SLIDE 46

Recap Saw: Equivalent characterizations and tensorization property together imply

  • ptimality of single-letter regions in some settings.

Proposal is that this link is worth exploring to solve open problems and to understand existing results in a different light

slide-47
SLIDE 47

Recap Saw: Equivalent characterizations and tensorization property together imply

  • ptimality of single-letter regions in some settings.

Proposal is that this link is worth exploring to solve open problems and to understand existing results in a different light Part III Why may some mathematicians care?

slide-48
SLIDE 48

Mathematics Review

Background

Consider binary-valued random variables X, Y distributed as follows: X is uniform, W(y|x) ∼ BSC

  • 1+ρ

2

  • .

Theorem (Bonami ’70, Beckner ’75) (X, Y ) is (p, q)-hypercontractive if and only if q − 1 p − 1 ≥ ρ2. Shows tightness of the correlation lower bound.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 21 / 25

slide-49
SLIDE 49

Mathematics Review

Background

Consider binary-valued random variables X, Y distributed as follows: X is uniform, W(y|x) ∼ BSC

  • 1+ρ

2

  • .

Theorem (Bonami ’70, Beckner ’75) (X, Y ) is (p, q)-hypercontractive if and only if q − 1 p − 1 ≥ ρ2. Shows tightness of the correlation lower bound. A similar statement also holds for jointly Gaussian random variables (Gross ’75) Remarks Exact characterization of optimal (or near optimal) hypercontractivity parameters has been done only in a few settings Typically arguments are non-trivial Idea: Use equivalent characterizations to obtain new results.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 21 / 25

slide-50
SLIDE 50

Mathematics Strong data processing constant

Results on r∞(X; Y ), the strong data processing constant

Anantharam-Gohari-Kamath-Nair ’13 Consider binary-valued random variables X, Y distributed as:

1 P(X = 0) = 1+s

2 , W(y|x) ∼ BSC

  • 1+ρ

2

  • , then

r∞(X; Y ) = J

  • 1+sρ

2

  • J

1+s

2

, where J(x) = log 1 − x x .

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 22 / 25

slide-51
SLIDE 51

Mathematics Strong data processing constant

Results on r∞(X; Y ), the strong data processing constant

Anantharam-Gohari-Kamath-Nair ’13 Consider binary-valued random variables X, Y distributed as:

1 P(X = 0) = 1+s

2 , W(y|x) ∼ BSC

  • 1+ρ

2

  • , then

r∞(X; Y ) = J

  • 1+sρ

2

  • J

1+s

2

, where J(x) = log 1 − x x .

2 P(X = 1) = x, W(y|x) ∼ Z(z), i.e. WY |X(0|1) = z, then

r∞(X; Y ) = log(1 − x(1 − z)) log(1 − x) . Remark: Both of these immediately follow from the convex envelope equivalent characterization.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 22 / 25

slide-52
SLIDE 52

Mathematics Strong data processing constant

Results on r∞(X; Y ), continued..

Kamath-Nair ’15 Let X1, ..., Xn be a sequence of i.i.d. random variables and Sm = m

i=1 Xi, m ≤ n.

Then, r∞(Sn; Sm) ≤ m n , when m ≤ n. Finite second moment, for instance, implies equality above. Remark: This strengthens a result by (Dembo et. al. ’01) that establish a similar result for correlation.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 23 / 25

slide-53
SLIDE 53

Mathematics Strong data processing constant

Results on r∞(X; Y ), continued..

Kamath-Nair ’15 Let X1, ..., Xn be a sequence of i.i.d. random variables and Sm = m

i=1 Xi, m ≤ n.

Then, r∞(Sn; Sm) ≤ m n , when m ≤ n. Finite second moment, for instance, implies equality above. Remark: This strengthens a result by (Dembo et. al. ’01) that establish a similar result for correlation. Proof: Given U − Sn − Sm is Markov. W.l.o.g. can assume that U − Sn − (X1, ..., Xn) is Markov. Let Φ(m) = I(U; Sm). Then since I(U; Sn) = I(U; Sn, Sm, Sn − Sm, Xm

1 ), we have

0 = I(U; Xm

1 |Sm, Sn − Sm) ≥ I(U; Xm 1 |Sm) ≥ 0.

Hence Φ(m) = I(U; Xm

1 ) for all m ≤ n.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 23 / 25

slide-54
SLIDE 54

Mathematics Strong data processing constant

Results on r∞(X; Y ), continued..

Kamath-Nair ’15 Let X1, ..., Xn be a sequence of i.i.d. random variables and Sm = m

i=1 Xi, m ≤ n.

Then, r∞(Sn; Sm) ≤ m n , when m ≤ n. Finite second moment, for instance, implies equality above. Remark: This strengthens a result by (Dembo et. al. ’01) that establish a similar result for correlation. Proof: Given U − Sn − Sm is Markov. W.l.o.g. can assume that U − Sn − (X1, ..., Xn) is Markov. Let Φ(m) = I(U; Sm). Then since I(U; Sn) = I(U; Sn, Sm, Sn − Sm, Xm

1 ), we have

0 = I(U; Xm

1 |Sm, Sn − Sm) ≥ I(U; Xm 1 |Sm) ≥ 0.

Hence Φ(m) = I(U; Xm

1 ) for all m ≤ n.

Φ(m + 1) − Φ(m) = I(U; Xm+1|Xm

1 ) ≥ I(U; Xm+1|Xm 2 ) = Φ(m) − Φ(m − 1).

The above convexity implies that Φ(m)

m

≤ Φ(n)

n

  • r equivalently Φ(m)

Φ(n) ≤ m n .

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 23 / 25

slide-55
SLIDE 55

Mathematics Full hypercontractive region

Results on (p, q)-hypercontractivity

Consider random variable X, Y distributed as follows: X is uniform and binary, W(y|x) ∼ BEC (ǫ). Theorem (Nair-Wang ’16) For BEC the correlation bound is tight, i.e. (X, Y ) is (p, q)-hypercontractive for

q−1 p−1 = 1 − ǫ, if and only if the following condition is satisfied:

ǫ − 1 2 ≤ 3 2(q − 1).

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 24 / 25

slide-56
SLIDE 56

Mathematics Full hypercontractive region

Results on (p, q)-hypercontractivity

Consider random variable X, Y distributed as follows: X is uniform and binary, W(y|x) ∼ BEC (ǫ). Theorem (Nair-Wang ’16) For BEC the correlation bound is tight, i.e. (X, Y ) is (p, q)-hypercontractive for

q−1 p−1 = 1 − ǫ, if and only if the following condition is satisfied:

ǫ − 1 2 ≤ 3 2(q − 1). Remarks: Always holds when ǫ ≤ 1

2

Holds for all ǫ if q ≥ 4

3.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 24 / 25

slide-57
SLIDE 57

Mathematics Full hypercontractive region

Results on (p, q)-hypercontractivity

Consider random variable X, Y distributed as follows: X is uniform and binary, W(y|x) ∼ BEC (ǫ). Theorem (Nair-Wang ’16) For BEC the correlation bound is tight, i.e. (X, Y ) is (p, q)-hypercontractive for

q−1 p−1 = 1 − ǫ, if and only if the following condition is satisfied:

ǫ − 1 2 ≤ 3 2(q − 1). Remarks: Always holds when ǫ ≤ 1

2

Holds for all ǫ if q ≥ 4

3.

Proof: Uses the relative entropy characterization Approach: study the stationary points (unique in above region) Technique also yields another proof of Bonami’s inequality for BSC.

chandra@ie.cuhk.edu.hk IT & HC 25-Aug-2016 24 / 25

slide-58
SLIDE 58

Recap Saw: Equivalent characterizations help Compute the hypercontractivity parameters in several new settings Obtain new proofs of old results

slide-59
SLIDE 59

Recap Saw: Equivalent characterizations help Compute the hypercontractivity parameters in several new settings Obtain new proofs of old results Thank You