SLIDE 1
6.975 Graduate Seminar in Area I A Relationship Between Information Inequalities and Group Theory
Desmond Lun 23 October 2002
1
SLIDE 2 The result
Let N = {1, 2, . . . , n} and Ω be the set of all non-empty subsets of
- N. Let {bα}α∈Ω be a set of real numbers. If the information
inequality
bαH((Xi)i∈α) ≥ 0 holds for all discrete random variables X1, X2, . . . , Xn, then the group inequality
bα log |G| | ∩i∈α Gi| ≥ 0 holds for all finite groups G and subgroups Gi of G and vice versa.
2
SLIDE 3
An example
We know that, for all discrete random variables X1 and X2, H(X1) + H(X2) − H(X1, X2) ≥ 0 (since I(X1; X2) ≥ 0). So for all finite groups G and subgroups G1 and G2 of G, log |G| |G1| + log |G| |G2| ≥ log |G| |G1 ∩ G2|.
3
SLIDE 4 An example (cont.)
We can confirm that log |G| |G1| + log |G| |G2| ≥ log |G| |G1 ∩ G2| for all G1 and G2 subgroups of a finite G. Let ”∗” be the operation
- n the group. Consider the subset of G
G1 ∗ G2 = {a ∗ b | a ∈ G1 and b ∈ G2}. Let us calculate |G1 ∗ G2|. Fix (a1, a2) ∈ G1 × G2. We wish to know how many pairs (b1, b2) ∈ G1 × G2 there are that satisfy b1 ∗ b2 = a1 ∗ a2.
4
SLIDE 5
An example (cont.)
Since b1 ∗ b2 = a1 ∗ a2, b2 ∗ a−1
2
= b−1
1
∗ a1. Let k = b2 ∗ a−1
2
= b−1
1
∗ a1, so k ∈ G1 ∩ G2. Each k gives rise to a unique pair (b1, b2) such that b1 ∗ b2 = a1 ∗ a2. Therefore, |G1 ∗ G2| = |G1||G2| |G1 ∩ G2|. Since G1 ∗ G2 ⊂ G, |G| ≥ |G1||G2| |G1 ∩ G2|.
5
SLIDE 6
An example (cont.)
Upon rearrangement of |G| ≥ |G1||G2| |G1 ∩ G2|, we obtain log |G| |G1| + log |G| |G2| ≥ log |G| |G1 ∩ G2|.
6
SLIDE 7
Entropy functions
Suppose we have discrete random variables X1, X2, . . . Xn with sample spaces X1, X2, . . . , Xn. Denote by Xα, α ∈ Ω the joint random variable (Xi)i∈α with sample space Xα =
i∈α Xi.
Definition Let g be a vector in R|Ω| with components gα indexed by α ∈ Ω. Then g is an entropy function if there exists a set of random variables X1, X2, . . . , Xn such that gα = H(Xα) for all α. Let Γ∗
n be the set of all entropy functions associated with n random
variables; that is Γ∗
n = {g ∈ R|Ω| | g is an entropy function}. 7
SLIDE 8
Entropy functions: An example
Suppose X1, X2 have the PMF given in the following table. pX1,X2(x1, x2) x1 = 1 x1 = 2 x1 = 3 x2 = 1 1/6 1/6 x2 = 2 1/6 1/6 x2 = 3 1/6 1/6 Therefore H(X1) = log 3, H(X2) = log 3, and H(X1, X2) = log 6. So (log 3, log 3, log 6) ∈ Γ∗
n. 8
SLIDE 9
Group-characterizable functions
Let G be a finite group and G1, G2, . . . , Gn be subgroups of G. We use the notation Gα = ∩i∈αGi, where α ∈ Ω. Definition Let h be a vector in R|Ω| with components hα indexed by α ∈ Ω. Then h is called group-characterizable if there exist subgroups G1, G2, . . . , Gn of a group G such that hα = log(|G|/|Gα|) for all α. Let Υn be the set of all group-characterizable functions associated with n groups; that is Υn = {h ∈ R|Ω| | h is group-characterizable}.
9
SLIDE 10 Group-characterizable functions: An example
Let A be the 2 × 3 matrix A = a a b c d d where a, b, c, d are distinct elements of a field K. Let G be the group of permutations of 3 elements, so |G| = 6. For i = 1, 2, let Gi be the subgroup of G that keeps the ith row of A
- unchanged. So |G1| = 2, |G2| = 2, and |G1 ∩ G2| = 1. So
(log 3, log 3, log 6) ∈ Υn.
10
SLIDE 11 Information inequalities
From the definition of Γ∗
n, an information inequality
bαH(Xα) ≥ 0 is valid if and only if Γ∗
n ⊂ {h ∈ R|Ω| | b⊤h ≥ 0},
where b is a column vector with components bα.
11
SLIDE 12 Group inequalities
Likewise, a group inequality
bα log |G| |Gα| ≥ 0 is valid if and only if Υn ⊂ {h ∈ R|Ω| | b⊤h ≥ 0}. We are interested in relating Γ∗
n with Υn. Specifically, we want
Γ∗
n ⊂ {h ∈ R|Ω| | b⊤h ≥ 0} ⇔ Υn ⊂ {h ∈ R|Ω| | b⊤h ≥ 0}.
Note: Υn is much sparser than Γ∗
n because it has at most
countably many points.
12
SLIDE 13 Main theorem
Theorem conv(Υn) = Γ
∗ n for all natural numbers n.
Outline of proof:
n.
∗ n is a convex cone.
n) ⊂ conv(Γ ∗ n) = Γ ∗ n.
n ⊂ conv(Υn).
∗ n ⊂ conv(Υn). 13
SLIDE 14 Showing Υn ⊂ Γ∗
n
Lemma If h is group-characterizable, then it is an entropy function, i.e. h ∈ Γ∗
n.
Proof.
- Want: Find random variables X1, X2, . . . , Xn such that
H((Xi)i∈α) = log(|G|/|Gα|).
- Let Λ be uniformly distributed over the sample space G.
- For i ∈ N, Xi = aGi if Λ = a (aGi is a left coset of Gi).
- For α ∈ Ω,
P((Xi = aiGi)i∈α) = P(Λ ∈ ∩i∈αaiGi) = | ∩i∈α aiGi| |G| .
14
SLIDE 15 Showing Υn ⊂ Γ∗
n (cont.)
- Consider ∩i∈αaiGi. If non-empty, then let b ∈ ∩i∈αaiGi, so
∩i∈αaiGi = ∩i∈αbGi = b ∩i∈α Gi = bGα.
- The set ∩i∈αaiGi is either empty or of size |Gα|.
- Hence
P((Xi = aiGi)i∈α) =
|Gα| |G|
if ∩i∈αaiGi = ∅,
- therwise.
- So H((Xi)i∈α) = log(|G|/|Gα|), as desired.
- 15
SLIDE 16 Showing Γ
∗ n is a convex cone
Lemma Γ
∗ n is a convex cone.
Proof.
∗ n is convex.
- Want: For any 0 < b < 1, u, v ∈ Γ∗
n, bu + (1 − b)v ∈ Γ ∗
then straightforward to extend to points in the closure of Γ∗
n.)
- u entropy function for Y1, Y2, . . . , Yn.
- v entropy function for Z1, Z2, . . . , Zn.
16
SLIDE 17 Showing Γ
∗ n is a convex cone (cont.)
1 , Y (i) 2 , . . . , Y (i) n ) for 1 ≤ i ≤ k be k independent vectors
each distributed identically to (Y1, Y2, . . . , Yn).
1 , Z(i) 2 , . . . , Z(i) n ) for 1 ≤ i ≤ k be k independent vectors
each distributed identically to (Z1, Z2, . . . , Zn).
- Let U be a random variable having the distribution
pU(u) = 1 − δ − µ if u = 0, δ if u = 1, µ if u = 2. So H(U) → 0 as δ, µ → 0.
17
SLIDE 18 Showing Γ
∗ n is a convex cone (cont.)
- Now construct X1, X2, . . . , Xn by
Xi = if U = 0, (Y (1)
i
, Y (2)
i
, . . . , Y (k)
i
) if U = 1, (Z(1)
i
, Z(2)
i
, . . . , Z(k)
i
) if U = 2.
- So for any α ∈ Ω, H(Xα|U) = δkH(Yα) + µkH(Xα).
18
SLIDE 19 Showing Γ
∗ n is a convex cone (cont.)
0 ≤ H(Xα) − H(Xα|U) ≤ H(U), ⇒ 0 ≤ H(Xα) − (δkH(Yα) + µkH(Xα)) ≤ H(U).
- Take δ = b/k, µ = (1 − b)/k, so
0 ≤ H(Xα) − (bH(Yα) + (1 − b)H(Xα)) ≤ H(U).
- Taking k = 1, 2, . . . gives us a sequence of points in Γ∗
n whose
limit point is bu + (1 − b)v. So bu + (1 − b)v ∈ Γ
∗ n.
∗ n is convex. 19
SLIDE 20 Showing Γ
∗ n is a convex cone (cont.)
- It remains only to show that Γ
∗ n is a cone.
n, consider k independent copies of its associated
random variables, and we see that kv ∈ Γ∗
n.
- Straightforwardly extend to closure: if v ∈ Γ
∗ n, then for any
positive integer k, kv ∈ Γ
∗ n.
- By letting X1, X2, . . . , Xn take constant values with
probability 1, we see that 0 ∈ Γ∗
n. 20
SLIDE 21 Showing Γ
∗ n is a convex cone (cont.)
- Consider non-negative combination
i αivi, αi ≥ 0, vi ∈ Γ ∗ n.
i αi, then by convexity
Γ
∗ n ∋
αi ⌈α⌉vi +
⌈α⌉
αi ⌈α⌉vi.
∗ n ∋ ⌈α⌉ i αi ⌈α⌉vi = i αivi.
SLIDE 22 Showing Γ∗
n ⊂ conv(Υn)
Lemma For any h ∈ Γ∗
n, there exists a sequence {f (r)} in Υn such
that limr→∞(f (r)/r) = h. Proof.
- First consider special case where |Xi| < ∞ for all i ∈ N and
joint distribution X1, X2, . . . , Xn is rational.
- For any α ∈ Ω, let Qα be the marginal distribution of Xα.
- Assume w.l.o.g. that for any α ∈ Ω and x ∈ Xα, Qα(x) is
rational with denominator q.
- Want: Construct a sequence {f (r)} in Υn such that
limr→∞(f (r)/r) = h (hα = H(Xα)).
22
SLIDE 23 Showing Γ∗
n ⊂ conv(Υn) (cont.)
- For r = q, 2q, 3q, . . . , fix an n × r matrix A,
A = a1,1 · · · a1,r . . . ... . . . an,1 · · · an,r such that for all x ∈ XN , the number of columns in A equal to x is rQN (x).
- Denote by Aα the submatrix of A obtained by extracting the
rows of A indexed by α.
- For all x ∈ Xα, the number of columns of Aα equal to x is
rQα(x) (Qα is the marginal distribution of Xi, i ∈ α).
23
SLIDE 24 Showing Γ∗
n ⊂ conv(Υn) (cont.)
- Let G by the group of permutations on {1, . . . , r}.
- For any i ∈ N, define
Gi = {σ ∈ G | σ[Ai] = Ai} (permutations that keep the ith row fixed).
- For α ∈ Ω, simple derivation shows
Gα = {σ ∈ G | σ[Aα] = Aα}.
- For all x ∈ Xα, number of columns of Aα equal to x is rQα(x),
|Gα| =
(rQα(x))!.
24
SLIDE 25 Showing Γ∗
n ⊂ conv(Υn) (cont.)
|G| |Gα| = r!
- x∈Xα(rQα(x))!.
- Can then show (using theorem regarding method of types)
lim
r→∞
1 r log |G| |Gα| = H(Xα) = hα.
- Define f (r) ∈ Υn by f (r)
α
= log(|G|/|Gα|).
lim
r→∞
1 r f (r) = h.
25
SLIDE 26 Showing Γ∗
n ⊂ conv(Υn) (cont.)
- Proved lemma for special case of collection of random variables
with finite sample space with rational joint distribution.
- Argue that such collections of random variables are ‘dense’ in
general ones.
- By noting that 0 ∈ Υn (set G1 = G2 = · · · = Gn = G), the lemma
shows that Γ∗
n ⊂ conv(Υn). 26
SLIDE 27
Correspondence of inequalities examples
Let G1, G2, G3 be subgroups of a group G. We have |G3| ≥ |G13 ∗ G23| = |G13||G23| |G123| . So |G|2 |G13||G23| ≥ |G|2 |G3||G123|, ⇒ log |G| |G13| + log |G| |G23| ≥ log |G| |G3| + log |G| |G123|, which corresponds to H(X1, X3) + H(X2, X3) ≥ H(X3) + H(X1, X2, X3).
27
SLIDE 28
Correspondence of inequalities examples (cont.)
We have H(X1, X3) + H(X2, X3) ≥ H(X3) + H(X1, X2, X3), which is equivalent to I(X1; X2|X3) ≥ 0, a valid information inequality for all random variables X1, X2, X3.
28
SLIDE 29
Correspondence of inequalities examples (cont.)
Zhang and Yeung (1998) showed that H(X1) + H(X2) + 2H(X1, X2) + 4H(X3) + 4H(X4) + 5H(X1, X3, X4) + 5H(X2, X3, X4) ≤ 6H(X3, X4) + 4H(X1, X3) + 4H(X1, X4) + 4H(X2, X3) + 4H(X2, X4), which, after rearrangement, corresponds to |G34|6|G13|4|G14|4|G23|4|G24|4 ≤ |G1||G2||G3|4|G4|4|G12|2|G134|5|G234|5. The significance of this inequality is unclear.
29
SLIDE 30 Conclusion
- We have a one-to-one correspondence between information
inequalities and group inequalities.
- Demonstration comes down to showing that Υn, whilst sparser
than Γ∗
n, has conv(Υn) = Γ ∗ n.
- Meaning or utility of result is unclear.
30