 
              6.975 Graduate Seminar in Area I A Relationship Between Information Inequalities and Group Theory Desmond Lun 23 October 2002 1
The result Let N = { 1 , 2 , . . . , n } and Ω be the set of all non-empty subsets of N . Let { b α } α ∈ Ω be a set of real numbers. If the information inequality � b α H (( X i ) i ∈ α ) ≥ 0 α ∈ Ω holds for all discrete random variables X 1 , X 2 , . . . , X n , then the group inequality | G | � b α log | ∩ i ∈ α G i | ≥ 0 α ∈ Ω holds for all finite groups G and subgroups G i of G and vice versa. 2
An example We know that, for all discrete random variables X 1 and X 2 , H ( X 1 ) + H ( X 2 ) − H ( X 1 , X 2 ) ≥ 0 (since I ( X 1 ; X 2 ) ≥ 0). So for all finite groups G and subgroups G 1 and G 2 of G , log | G | | G 1 | + log | G | | G | | G 2 | ≥ log | G 1 ∩ G 2 | . 3
An example (cont.) We can confirm that log | G | | G 1 | + log | G | | G | | G 2 | ≥ log | G 1 ∩ G 2 | for all G 1 and G 2 subgroups of a finite G . Let ” ∗ ” be the operation on the group. Consider the subset of G G 1 ∗ G 2 = { a ∗ b | a ∈ G 1 and b ∈ G 2 } . Let us calculate | G 1 ∗ G 2 | . Fix ( a 1 , a 2 ) ∈ G 1 × G 2 . We wish to know how many pairs ( b 1 , b 2 ) ∈ G 1 × G 2 there are that satisfy b 1 ∗ b 2 = a 1 ∗ a 2 . 4
An example (cont.) Since b 1 ∗ b 2 = a 1 ∗ a 2 , b 2 ∗ a − 1 = b − 1 ∗ a 1 . 2 1 Let k = b 2 ∗ a − 1 = b − 1 ∗ a 1 , so k ∈ G 1 ∩ G 2 . Each k gives rise to a 2 1 unique pair ( b 1 , b 2 ) such that b 1 ∗ b 2 = a 1 ∗ a 2 . Therefore, | G 1 ∗ G 2 | = | G 1 || G 2 | | G 1 ∩ G 2 | . Since G 1 ∗ G 2 ⊂ G , | G | ≥ | G 1 || G 2 | | G 1 ∩ G 2 | . 5
An example (cont.) Upon rearrangement of | G | ≥ | G 1 || G 2 | | G 1 ∩ G 2 | , we obtain log | G | | G 1 | + log | G | | G | | G 2 | ≥ log | G 1 ∩ G 2 | . 6
Entropy functions Suppose we have discrete random variables X 1 , X 2 , . . . X n with sample spaces X 1 , X 2 , . . . , X n . Denote by X α , α ∈ Ω the joint random variable ( X i ) i ∈ α with sample space X α = � i ∈ α X i . Definition Let g be a vector in R | Ω | with components g α indexed by α ∈ Ω. Then g is an entropy function if there exists a set of random variables X 1 , X 2 , . . . , X n such that g α = H ( X α ) for all α . Let Γ ∗ n be the set of all entropy functions associated with n random variables; that is n = { g ∈ R | Ω | | g is an entropy function } . Γ ∗ 7
Entropy functions: An example Suppose X 1 , X 2 have the PMF given in the following table. p X 1 ,X 2 ( x 1 , x 2 ) x 1 = 1 x 1 = 2 x 1 = 3 x 2 = 1 1/6 1/6 0 x 2 = 2 0 1/6 1/6 x 2 = 3 1/6 0 1/6 Therefore H ( X 1 ) = log 3, H ( X 2 ) = log 3, and H ( X 1 , X 2 ) = log 6. So (log 3 , log 3 , log 6) ∈ Γ ∗ n . 8
Group-characterizable functions Let G be a finite group and G 1 , G 2 , . . . , G n be subgroups of G . We use the notation G α = ∩ i ∈ α G i , where α ∈ Ω. Definition Let h be a vector in R | Ω | with components h α indexed by α ∈ Ω. Then h is called group-characterizable if there exist subgroups G 1 , G 2 , . . . , G n of a group G such that h α = log( | G | / | G α | ) for all α . Let Υ n be the set of all group-characterizable functions associated with n groups; that is Υ n = { h ∈ R | Ω | | h is group-characterizable } . 9
Group-characterizable functions: An example Let A be the 2 × 3 matrix    a a b A =  c d d where a, b, c, d are distinct elements of a field K . Let G be the group of permutations of 3 elements, so | G | = 6. For i = 1 , 2, let G i be the subgroup of G that keeps the i th row of A unchanged. So | G 1 | = 2, | G 2 | = 2, and | G 1 ∩ G 2 | = 1. So (log 3 , log 3 , log 6) ∈ Υ n . 10
Information inequalities From the definition of Γ ∗ n , an information inequality � b α H ( X α ) ≥ 0 α ∈ Ω is valid if and only if n ⊂ { h ∈ R | Ω | | b ⊤ h ≥ 0 } , Γ ∗ where b is a column vector with components b α . 11
Group inequalities Likewise, a group inequality b α log | G | � | G α | ≥ 0 α ∈ Ω is valid if and only if Υ n ⊂ { h ∈ R | Ω | | b ⊤ h ≥ 0 } . We are interested in relating Γ ∗ n with Υ n . Specifically, we want n ⊂ { h ∈ R | Ω | | b ⊤ h ≥ 0 } ⇔ Υ n ⊂ { h ∈ R | Ω | | b ⊤ h ≥ 0 } . Γ ∗ Note : Υ n is much sparser than Γ ∗ n because it has at most countably many points. 12
Main theorem ∗ Theorem conv(Υ n ) = Γ n for all natural numbers n . Outline of proof: • Show that Υ n ⊂ Γ ∗ n . ∗ • Show that Γ n is a convex cone. ∗ ∗ • So conv(Υ n ) ⊂ conv(Γ ∗ n ) ⊂ conv(Γ n ) = Γ n . • Show that Γ ∗ n ⊂ conv(Υ n ). ∗ • So Γ n ⊂ conv(Υ n ). 13
Showing Υ n ⊂ Γ ∗ n Lemma If h is group-characterizable, then it is an entropy function, i.e. h ∈ Γ ∗ n . Proof. • Want : Find random variables X 1 , X 2 , . . . , X n such that H (( X i ) i ∈ α ) = log( | G | / | G α | ). • Let Λ be uniformly distributed over the sample space G . • For i ∈ N , X i = aG i if Λ = a ( aG i is a left coset of G i ). • For α ∈ Ω, P (( X i = a i G i ) i ∈ α ) = P (Λ ∈ ∩ i ∈ α a i G i ) = | ∩ i ∈ α a i G i | . | G | 14
Showing Υ n ⊂ Γ ∗ n (cont.) • Consider ∩ i ∈ α a i G i . If non-empty, then let b ∈ ∩ i ∈ α a i G i , so ∩ i ∈ α a i G i = ∩ i ∈ α bG i = b ∩ i ∈ α G i = bG α . • The set ∩ i ∈ α a i G i is either empty or of size | G α | . • Hence  | G α | if ∩ i ∈ α a i G i � = ∅ ,  | G | P (( X i = a i G i ) i ∈ α ) = 0 otherwise .  • So H (( X i ) i ∈ α ) = log( | G | / | G α | ), as desired. � 15
∗ Showing Γ n is a convex cone ∗ Lemma Γ n is a convex cone. Proof. ∗ • First show that Γ n is convex. ∗ • Want : For any 0 < b < 1, u , v ∈ Γ ∗ n , b u + (1 − b ) v ∈ Γ n . (It is then straightforward to extend to points in the closure of Γ ∗ n .) • u entropy function for Y 1 , Y 2 , . . . , Y n . • v entropy function for Z 1 , Z 2 , . . . , Z n . 16
∗ Showing Γ n is a convex cone (cont.) • Let ( Y ( i ) 1 , Y ( i ) 2 , . . . , Y ( i ) n ) for 1 ≤ i ≤ k be k independent vectors each distributed identically to ( Y 1 , Y 2 , . . . , Y n ). • Let ( Z ( i ) 1 , Z ( i ) 2 , . . . , Z ( i ) n ) for 1 ≤ i ≤ k be k independent vectors each distributed identically to ( Z 1 , Z 2 , . . . , Z n ). • Let U be a random variable having the distribution  1 − δ − µ if u = 0 ,    p U ( u ) = δ if u = 1 ,   µ if u = 2 .  So H ( U ) → 0 as δ, µ → 0. 17
∗ Showing Γ n is a convex cone (cont.) • Now construct X 1 , X 2 , . . . , X n by  0 if U = 0 ,    ( Y (1) , Y (2) , . . . , Y ( k ) X i = ) if U = 1 , i i i  ( Z (1) , Z (2) , . . . , Z ( k )  ) if U = 2 .  i i i • So for any α ∈ Ω, H ( X α | U ) = δkH ( Y α ) + µkH ( X α ). 18
∗ Showing Γ n is a convex cone (cont.) • We have 0 ≤ H ( X α ) − H ( X α | U ) ≤ H ( U ) , ⇒ 0 ≤ H ( X α ) − ( δkH ( Y α ) + µkH ( X α )) ≤ H ( U ) . • Take δ = b/k , µ = (1 − b ) /k , so 0 ≤ H ( X α ) − ( bH ( Y α ) + (1 − b ) H ( X α )) ≤ H ( U ) . • Taking k = 1 , 2 , . . . gives us a sequence of points in Γ ∗ n whose ∗ limit point is b u + (1 − b ) v . So b u + (1 − b ) v ∈ Γ n . ∗ • Γ n is convex. 19
∗ Showing Γ n is a convex cone (cont.) ∗ • It remains only to show that Γ n is a cone. • If v ∈ Γ ∗ n , consider k independent copies of its associated random variables, and we see that k v ∈ Γ ∗ n . ∗ • Straightforwardly extend to closure: if v ∈ Γ n , then for any ∗ positive integer k , k v ∈ Γ n . • By letting X 1 , X 2 , . . . , X n take constant values with probability 1, we see that 0 ∈ Γ ∗ n . 20
∗ Showing Γ n is a convex cone (cont.) ∗ • Consider non-negative combination � i α i v i , α i ≥ 0, v i ∈ Γ n . • Let α = � i α i , then by convexity � � α i 1 − α α i ∗ � � n ∋ Γ ⌈ α ⌉ v i + 0 = ⌈ α ⌉ v i . ⌈ α ⌉ i i ∗ α i • So Γ n ∋ ⌈ α ⌉ � ⌈ α ⌉ v i = � i α i v i . i � 21
n ⊂ conv(Υ n ) Showing Γ ∗ Lemma For any h ∈ Γ ∗ n , there exists a sequence { f ( r ) } in Υ n such that lim r →∞ ( f ( r ) /r ) = h . Proof. • First consider special case where |X i | < ∞ for all i ∈ N and joint distribution X 1 , X 2 , . . . , X n is rational. • For any α ∈ Ω, let Q α be the marginal distribution of X α . • Assume w.l.o.g. that for any α ∈ Ω and x ∈ X α , Q α ( x ) is rational with denominator q . • Want : Construct a sequence { f ( r ) } in Υ n such that lim r →∞ ( f ( r ) /r ) = h ( h α = H ( X α )). 22
n ⊂ conv(Υ n ) (cont.) Showing Γ ∗ • For r = q, 2 q, 3 q, . . . , fix an n × r matrix A ,   · · · a 1 , 1 a 1 ,r . .  ...  . . A =   . .     a n, 1 · · · a n,r such that for all x ∈ X N , the number of columns in A equal to x is rQ N ( x ). • Denote by A α the submatrix of A obtained by extracting the rows of A indexed by α . • For all x ∈ X α , the number of columns of A α equal to x is rQ α ( x ) ( Q α is the marginal distribution of X i , i ∈ α ). 23
Recommend
More recommend