6.975 Graduate Seminar in Area I A Relationship Between Information - - PowerPoint PPT Presentation

6 975 graduate seminar in area i a relationship between
SMART_READER_LITE
LIVE PREVIEW

6.975 Graduate Seminar in Area I A Relationship Between Information - - PowerPoint PPT Presentation

6.975 Graduate Seminar in Area I A Relationship Between Information Inequalities and Group Theory Desmond Lun 23 October 2002 1 The result Let N = { 1 , 2 , . . . , n } and be the set of all non-empty subsets of N . Let { b }


slide-1
SLIDE 1

6.975 Graduate Seminar in Area I A Relationship Between Information Inequalities and Group Theory

Desmond Lun 23 October 2002

1

slide-2
SLIDE 2

The result

Let N = {1, 2, . . . , n} and Ω be the set of all non-empty subsets of

  • N. Let {bα}α∈Ω be a set of real numbers. If the information

inequality

  • α∈Ω

bαH((Xi)i∈α) ≥ 0 holds for all discrete random variables X1, X2, . . . , Xn, then the group inequality

  • α∈Ω

bα log |G| | ∩i∈α Gi| ≥ 0 holds for all finite groups G and subgroups Gi of G and vice versa.

2

slide-3
SLIDE 3

An example

We know that, for all discrete random variables X1 and X2, H(X1) + H(X2) − H(X1, X2) ≥ 0 (since I(X1; X2) ≥ 0). So for all finite groups G and subgroups G1 and G2 of G, log |G| |G1| + log |G| |G2| ≥ log |G| |G1 ∩ G2|.

3

slide-4
SLIDE 4

An example (cont.)

We can confirm that log |G| |G1| + log |G| |G2| ≥ log |G| |G1 ∩ G2| for all G1 and G2 subgroups of a finite G. Let ”∗” be the operation

  • n the group. Consider the subset of G

G1 ∗ G2 = {a ∗ b | a ∈ G1 and b ∈ G2}. Let us calculate |G1 ∗ G2|. Fix (a1, a2) ∈ G1 × G2. We wish to know how many pairs (b1, b2) ∈ G1 × G2 there are that satisfy b1 ∗ b2 = a1 ∗ a2.

4

slide-5
SLIDE 5

An example (cont.)

Since b1 ∗ b2 = a1 ∗ a2, b2 ∗ a−1

2

= b−1

1

∗ a1. Let k = b2 ∗ a−1

2

= b−1

1

∗ a1, so k ∈ G1 ∩ G2. Each k gives rise to a unique pair (b1, b2) such that b1 ∗ b2 = a1 ∗ a2. Therefore, |G1 ∗ G2| = |G1||G2| |G1 ∩ G2|. Since G1 ∗ G2 ⊂ G, |G| ≥ |G1||G2| |G1 ∩ G2|.

5

slide-6
SLIDE 6

An example (cont.)

Upon rearrangement of |G| ≥ |G1||G2| |G1 ∩ G2|, we obtain log |G| |G1| + log |G| |G2| ≥ log |G| |G1 ∩ G2|.

6

slide-7
SLIDE 7

Entropy functions

Suppose we have discrete random variables X1, X2, . . . Xn with sample spaces X1, X2, . . . , Xn. Denote by Xα, α ∈ Ω the joint random variable (Xi)i∈α with sample space Xα =

i∈α Xi.

Definition Let g be a vector in R|Ω| with components gα indexed by α ∈ Ω. Then g is an entropy function if there exists a set of random variables X1, X2, . . . , Xn such that gα = H(Xα) for all α. Let Γ∗

n be the set of all entropy functions associated with n random

variables; that is Γ∗

n = {g ∈ R|Ω| | g is an entropy function}. 7

slide-8
SLIDE 8

Entropy functions: An example

Suppose X1, X2 have the PMF given in the following table. pX1,X2(x1, x2) x1 = 1 x1 = 2 x1 = 3 x2 = 1 1/6 1/6 x2 = 2 1/6 1/6 x2 = 3 1/6 1/6 Therefore H(X1) = log 3, H(X2) = log 3, and H(X1, X2) = log 6. So (log 3, log 3, log 6) ∈ Γ∗

n. 8

slide-9
SLIDE 9

Group-characterizable functions

Let G be a finite group and G1, G2, . . . , Gn be subgroups of G. We use the notation Gα = ∩i∈αGi, where α ∈ Ω. Definition Let h be a vector in R|Ω| with components hα indexed by α ∈ Ω. Then h is called group-characterizable if there exist subgroups G1, G2, . . . , Gn of a group G such that hα = log(|G|/|Gα|) for all α. Let Υn be the set of all group-characterizable functions associated with n groups; that is Υn = {h ∈ R|Ω| | h is group-characterizable}.

9

slide-10
SLIDE 10

Group-characterizable functions: An example

Let A be the 2 × 3 matrix A =  a a b c d d   where a, b, c, d are distinct elements of a field K. Let G be the group of permutations of 3 elements, so |G| = 6. For i = 1, 2, let Gi be the subgroup of G that keeps the ith row of A

  • unchanged. So |G1| = 2, |G2| = 2, and |G1 ∩ G2| = 1. So

(log 3, log 3, log 6) ∈ Υn.

10

slide-11
SLIDE 11

Information inequalities

From the definition of Γ∗

n, an information inequality

  • α∈Ω

bαH(Xα) ≥ 0 is valid if and only if Γ∗

n ⊂ {h ∈ R|Ω| | b⊤h ≥ 0},

where b is a column vector with components bα.

11

slide-12
SLIDE 12

Group inequalities

Likewise, a group inequality

  • α∈Ω

bα log |G| |Gα| ≥ 0 is valid if and only if Υn ⊂ {h ∈ R|Ω| | b⊤h ≥ 0}. We are interested in relating Γ∗

n with Υn. Specifically, we want

Γ∗

n ⊂ {h ∈ R|Ω| | b⊤h ≥ 0} ⇔ Υn ⊂ {h ∈ R|Ω| | b⊤h ≥ 0}.

Note: Υn is much sparser than Γ∗

n because it has at most

countably many points.

12

slide-13
SLIDE 13

Main theorem

Theorem conv(Υn) = Γ

∗ n for all natural numbers n.

Outline of proof:

  • Show that Υn ⊂ Γ∗

n.

  • Show that Γ

∗ n is a convex cone.

  • So conv(Υn) ⊂ conv(Γ∗

n) ⊂ conv(Γ ∗ n) = Γ ∗ n.

  • Show that Γ∗

n ⊂ conv(Υn).

  • So Γ

∗ n ⊂ conv(Υn). 13

slide-14
SLIDE 14

Showing Υn ⊂ Γ∗

n

Lemma If h is group-characterizable, then it is an entropy function, i.e. h ∈ Γ∗

n.

Proof.

  • Want: Find random variables X1, X2, . . . , Xn such that

H((Xi)i∈α) = log(|G|/|Gα|).

  • Let Λ be uniformly distributed over the sample space G.
  • For i ∈ N, Xi = aGi if Λ = a (aGi is a left coset of Gi).
  • For α ∈ Ω,

P((Xi = aiGi)i∈α) = P(Λ ∈ ∩i∈αaiGi) = | ∩i∈α aiGi| |G| .

14

slide-15
SLIDE 15

Showing Υn ⊂ Γ∗

n (cont.)

  • Consider ∩i∈αaiGi. If non-empty, then let b ∈ ∩i∈αaiGi, so

∩i∈αaiGi = ∩i∈αbGi = b ∩i∈α Gi = bGα.

  • The set ∩i∈αaiGi is either empty or of size |Gα|.
  • Hence

P((Xi = aiGi)i∈α) =   

|Gα| |G|

if ∩i∈αaiGi = ∅,

  • therwise.
  • So H((Xi)i∈α) = log(|G|/|Gα|), as desired.
  • 15
slide-16
SLIDE 16

Showing Γ

∗ n is a convex cone

Lemma Γ

∗ n is a convex cone.

Proof.

  • First show that Γ

∗ n is convex.

  • Want: For any 0 < b < 1, u, v ∈ Γ∗

n, bu + (1 − b)v ∈ Γ ∗

  • n. (It is

then straightforward to extend to points in the closure of Γ∗

n.)

  • u entropy function for Y1, Y2, . . . , Yn.
  • v entropy function for Z1, Z2, . . . , Zn.

16

slide-17
SLIDE 17

Showing Γ

∗ n is a convex cone (cont.)

  • Let (Y (i)

1 , Y (i) 2 , . . . , Y (i) n ) for 1 ≤ i ≤ k be k independent vectors

each distributed identically to (Y1, Y2, . . . , Yn).

  • Let (Z(i)

1 , Z(i) 2 , . . . , Z(i) n ) for 1 ≤ i ≤ k be k independent vectors

each distributed identically to (Z1, Z2, . . . , Zn).

  • Let U be a random variable having the distribution

pU(u) =        1 − δ − µ if u = 0, δ if u = 1, µ if u = 2. So H(U) → 0 as δ, µ → 0.

17

slide-18
SLIDE 18

Showing Γ

∗ n is a convex cone (cont.)

  • Now construct X1, X2, . . . , Xn by

Xi =        if U = 0, (Y (1)

i

, Y (2)

i

, . . . , Y (k)

i

) if U = 1, (Z(1)

i

, Z(2)

i

, . . . , Z(k)

i

) if U = 2.

  • So for any α ∈ Ω, H(Xα|U) = δkH(Yα) + µkH(Xα).

18

slide-19
SLIDE 19

Showing Γ

∗ n is a convex cone (cont.)

  • We have

0 ≤ H(Xα) − H(Xα|U) ≤ H(U), ⇒ 0 ≤ H(Xα) − (δkH(Yα) + µkH(Xα)) ≤ H(U).

  • Take δ = b/k, µ = (1 − b)/k, so

0 ≤ H(Xα) − (bH(Yα) + (1 − b)H(Xα)) ≤ H(U).

  • Taking k = 1, 2, . . . gives us a sequence of points in Γ∗

n whose

limit point is bu + (1 − b)v. So bu + (1 − b)v ∈ Γ

∗ n.

  • Γ

∗ n is convex. 19

slide-20
SLIDE 20

Showing Γ

∗ n is a convex cone (cont.)

  • It remains only to show that Γ

∗ n is a cone.

  • If v ∈ Γ∗

n, consider k independent copies of its associated

random variables, and we see that kv ∈ Γ∗

n.

  • Straightforwardly extend to closure: if v ∈ Γ

∗ n, then for any

positive integer k, kv ∈ Γ

∗ n.

  • By letting X1, X2, . . . , Xn take constant values with

probability 1, we see that 0 ∈ Γ∗

n. 20

slide-21
SLIDE 21

Showing Γ

∗ n is a convex cone (cont.)

  • Consider non-negative combination

i αivi, αi ≥ 0, vi ∈ Γ ∗ n.

  • Let α =

i αi, then by convexity

Γ

∗ n ∋

  • i

αi ⌈α⌉vi +

  • 1 − α

⌈α⌉

  • 0 =
  • i

αi ⌈α⌉vi.

  • So Γ

∗ n ∋ ⌈α⌉ i αi ⌈α⌉vi = i αivi.

  • 21
slide-22
SLIDE 22

Showing Γ∗

n ⊂ conv(Υn)

Lemma For any h ∈ Γ∗

n, there exists a sequence {f (r)} in Υn such

that limr→∞(f (r)/r) = h. Proof.

  • First consider special case where |Xi| < ∞ for all i ∈ N and

joint distribution X1, X2, . . . , Xn is rational.

  • For any α ∈ Ω, let Qα be the marginal distribution of Xα.
  • Assume w.l.o.g. that for any α ∈ Ω and x ∈ Xα, Qα(x) is

rational with denominator q.

  • Want: Construct a sequence {f (r)} in Υn such that

limr→∞(f (r)/r) = h (hα = H(Xα)).

22

slide-23
SLIDE 23

Showing Γ∗

n ⊂ conv(Υn) (cont.)

  • For r = q, 2q, 3q, . . . , fix an n × r matrix A,

A =      a1,1 · · · a1,r . . . ... . . . an,1 · · · an,r      such that for all x ∈ XN , the number of columns in A equal to x is rQN (x).

  • Denote by Aα the submatrix of A obtained by extracting the

rows of A indexed by α.

  • For all x ∈ Xα, the number of columns of Aα equal to x is

rQα(x) (Qα is the marginal distribution of Xi, i ∈ α).

23

slide-24
SLIDE 24

Showing Γ∗

n ⊂ conv(Υn) (cont.)

  • Let G by the group of permutations on {1, . . . , r}.
  • For any i ∈ N, define

Gi = {σ ∈ G | σ[Ai] = Ai} (permutations that keep the ith row fixed).

  • For α ∈ Ω, simple derivation shows

Gα = {σ ∈ G | σ[Aα] = Aα}.

  • For all x ∈ Xα, number of columns of Aα equal to x is rQα(x),

|Gα| =

  • x∈Xα

(rQα(x))!.

24

slide-25
SLIDE 25

Showing Γ∗

n ⊂ conv(Υn) (cont.)

  • Hence

|G| |Gα| = r!

  • x∈Xα(rQα(x))!.
  • Can then show (using theorem regarding method of types)

lim

r→∞

1 r log |G| |Gα| = H(Xα) = hα.

  • Define f (r) ∈ Υn by f (r)

α

= log(|G|/|Gα|).

  • We have

lim

r→∞

1 r f (r) = h.

25

slide-26
SLIDE 26

Showing Γ∗

n ⊂ conv(Υn) (cont.)

  • Proved lemma for special case of collection of random variables

with finite sample space with rational joint distribution.

  • Argue that such collections of random variables are ‘dense’ in

general ones.

  • By noting that 0 ∈ Υn (set G1 = G2 = · · · = Gn = G), the lemma

shows that Γ∗

n ⊂ conv(Υn). 26

slide-27
SLIDE 27

Correspondence of inequalities examples

Let G1, G2, G3 be subgroups of a group G. We have |G3| ≥ |G13 ∗ G23| = |G13||G23| |G123| . So |G|2 |G13||G23| ≥ |G|2 |G3||G123|, ⇒ log |G| |G13| + log |G| |G23| ≥ log |G| |G3| + log |G| |G123|, which corresponds to H(X1, X3) + H(X2, X3) ≥ H(X3) + H(X1, X2, X3).

27

slide-28
SLIDE 28

Correspondence of inequalities examples (cont.)

We have H(X1, X3) + H(X2, X3) ≥ H(X3) + H(X1, X2, X3), which is equivalent to I(X1; X2|X3) ≥ 0, a valid information inequality for all random variables X1, X2, X3.

28

slide-29
SLIDE 29

Correspondence of inequalities examples (cont.)

Zhang and Yeung (1998) showed that H(X1) + H(X2) + 2H(X1, X2) + 4H(X3) + 4H(X4) + 5H(X1, X3, X4) + 5H(X2, X3, X4) ≤ 6H(X3, X4) + 4H(X1, X3) + 4H(X1, X4) + 4H(X2, X3) + 4H(X2, X4), which, after rearrangement, corresponds to |G34|6|G13|4|G14|4|G23|4|G24|4 ≤ |G1||G2||G3|4|G4|4|G12|2|G134|5|G234|5. The significance of this inequality is unclear.

29

slide-30
SLIDE 30

Conclusion

  • We have a one-to-one correspondence between information

inequalities and group inequalities.

  • Demonstration comes down to showing that Υn, whilst sparser

than Γ∗

n, has conv(Υn) = Γ ∗ n.

  • Meaning or utility of result is unclear.

30