chapter 11 information theory and statistics
play

Chapter 11 Information Theory and Statistics Peng-Hua Wang - PowerPoint PPT Presentation

Chapter 11 Information Theory and Statistics Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 11 Information Theory and Statistics 11.1 Method of Types 11.2 Law of Large Numbers 11.3 Universal


  1. Chapter 11 Information Theory and Statistics Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University

  2. Chapter Outline Chap. 11 Information Theory and Statistics 11.1 Method of Types 11.2 Law of Large Numbers 11.3 Universal Source Coding 11.4 Large Deviation Theory 11.5 Examples of Sanov’s THeorem 11.6 Conditional Limit Theorem 11.7 Hypothesis Testing 11.8 Chernoff-Stein Lemma 11.9 Chernoff Information 11.10 Fisher Information and the Cram´ er-Rao Inequality Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 2/34

  3. 11.1 Method of Types Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 3/34

  4. Definitions ■ Let X 1 , X 2 , . . . be a sequence of n symbols from an alphabet X = { a 1 , a 2 , . . . , a M } where M = |X| is the number of alphabets. ■ x n ≡ x is a sequence x 1 , x 2 , . . . x n . ■ The type P x (or empirical probability distribution) of a sequence x 1 , x 2 , . . . x n is the relative frequency of each symbol of X . P x ( a ) = N ( a | x ) n for all a ∈ X where N ( a | x ) is the number of times the symbol a occurs in the sequence x . Example. Let X = { a, b, c } , x = aabca . Then the type P x = P aabca is � 3 � P x ( a ) = 3 P x ( b ) = 1 P x ( c ) = 1 5 , 1 5 , 1 Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 4/34 5 , 5 , 5 , or P x = 5

  5. Definitions ■ The type class T ( P ) is the set of sequences that have the same type. T ( P ) = { x : P x = P } . Example. Let X = { a, b, c } , x = aabca . Then the type P x = P aabca is P x ( a ) = 3 P x ( b ) = 1 P x ( c ) = 1 5 , 5 , 5 . The type class T ( P x ) is the set of the length-5 sequences that have 3 a ’s, 1 b and 1 c . T ( P x ) = { aaabc, aabca, abcaa, bcaaa, . . . } . The number of elements in T ( P x ) is � � 5 5! | T ( P x ) | = = 3!1!1! = 20 . Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 5/34 3 , 1 , 1

  6. Definitions ■ Let P n denote the set of types with denominator n . For example, if X = { a, b, c } , �� x 1 n , x 2 n , x 3 � � P n = : x 1 + x 2 + x 3 = n, x 1 ≥ 0 , x 2 ≥ 0 , x 3 ≥ 0 n where x 1 = P ( a ) , x 2 = P ( b ) , x 3 = P ( c ) . Theorem. |P n | ≤ ( n + 1) M Proof. �� x 1 n , x 2 n , . . . , x M �� P n = n where 0 ≤ x k ≤ n. Since there are n + 1 choices for each x k , the � result follows. Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 6/34

  7. Observations ■ The number of sequences of length n is M n . (exponential in n ). ■ The number of types of length n is ( n + 1) M . (polynomial in n ). ■ Therefore, at least one type has exponentially many sequences in its type class. ■ In fact, the largest type class has essentially the same number of elements as the entire set of sequences. Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 7/34

  8. Theorem Theorem. If X 1 , X 2 , . . . , X n are drawn i.i.d. according to Q ( x ) , the probability of x depends only on its type and is given by Q n ( x ) = 2 − n ( H ( P x )+ D ( p x || Q )) where n n � � Q n ( x ) = Pr( x ) = Pr( x i ) = Q ( x i ) . i =1 i =1 Proof. n � � Q ( a ) N ( a | x ) Q n ( x ) = Q ( x i ) = i =1 a ∈X � � Q ( a ) nP x ( a ) = 2 nP x ( a ) log Q ( a ) = a ∈X a ∈X = 2 n � a ∈X P x ( a ) log Q ( a ) Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 8/34

  9. Theorem Proof. (cont.) Since � P x ( a ) log Q ( a ) a ∈X � = ( P x ( a ) log Q ( a ) + P x ( a ) log P x ( a ) − P x ( a ) log P x ( a )) a ∈X = − H ( P x ) − D ( P x || Q ) , we have Q n ( x ) = 2 − n ( H ( P x )+ D ( P x || Q )) . � Corollary. If x is in the type class of Q , then Q n ( x ) = 2 − nH ( Q ) . Proof. If x ∈ T ( Q ) , then P x = Q and D ( P x || Q ) = 0 . � Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 9/34

  10. Size of T(P) Next, we will estimate the size of | T ( P ) | . The exact size of | T ( P ) | is � � n | T ( P ) | = . nP ( a 1 ) , nP ( a 2 ) , . . . , nP ( a M ) This value is hard to manipulate. We give a simple bound of | T ( P ) | . We need the following lemmas. Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 10/34

  11. Size of T(P) Lemma. m ! n ! ≥ n m − n Proof. For m ≥ n, we have m ! n ! = 1 × 2 × · · · × m 1 × 2 × · · · × n = ( n + 1)( n + 2) × · · · × m ≥ n × n × . . . n � �� � m − n times = n m − n For m < n, m ! n ! = 1 × 2 × · · · × m 1 1 × 2 × · · · × n = ( m + 1)( m + 2) × · · · × n 1 1 n n − m = n m − n ≥ = � n × n × . . . n Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 11/34 � �� � n − m times

  12. Size of T(P) Lemma. The type class T ( P ) has the the highest probability among all type classes under the probability distribution P . P n ( T ( P )) ≥ P n ( T ( ˆ for all ˆ P )) P ∈ P n . Proof. | T ( P ) | � a ∈X P ( a ) nP ( a ) P n ( T ( P )) = P ) | � P n ( T ( ˆ | T ( ˆ a ∈X P ( a ) n ˆ P ( a ) P )) � � � n P ( a ) nP ( a ) nP ( a 1 ) , nP ( a 2 ) , . . . , nP ( a M ) a ∈X = � � � n P ( a ) n ˆ P ( a ) n ˆ P ( a 1 ) , n ˆ P ( a 2 ) , . . . , n ˆ P ( a M ) a ∈X � ( n ˆ P ( a ))! ( nP ( a ))! P ( a ) n ( P ( a ) − ˆ P ( a )) = Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 12/34

  13. Size of T(P) Proof. (cont.) � ( nP ( a )) n ˆ P ( a ) − nP ( a ) P ( a ) n ( P ( a ) − ˆ P ( a )) ≥ a ∈X � n n ˆ P ( a ) − nP ( a ) = a ∈X a ∈X ˆ = n n � P ( a ) − n � a ∈X P ( a ) = n n − n = 1 � Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 13/34

  14. Size of T(P) Theorem. 1 ( n + 1) M 2 nH ( P ) ≤ | T ( P ) | ≤ 2 nH ( P ) . Note. The exact size of | T ( P ) | is � � n | T ( P ) | = . nP ( a 1 ) , nP ( a 2 ) , . . . , nP ( a M ) This value is hard to manipulate. Proof. (upper bound) If X 1 , X 2 , . . . , X n are drawn i.i.d. from P , then P ( a ) nP ( a ) = | T ( P ) | � � � 2 nP ( a ) log P ( a ) 1 ≥ P n ( T ( P )) = x ∈ T ( P ) a ∈X a ∈X a ∈X P ( a ) log P ( a ) = | T ( P ) | 2 − nH ( P ) . = | T ( P ) | 2 n � Thus, | T ( P ) | ≤ 2 nH ( P ) Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 14/34

  15. Size of T(P) Proof. (lower bound) � P n ( T ( Q )) 1 = Q ∈P n � Q P n ( T ( Q )) ≤ max Q ∈P n � P n ( T ( P )) = Q ∈P n ≤ ( n + 1) M P n ( T ( P )) = ( n + 1) M | T ( P ) | 2 − nH ( P ) � Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 15/34

  16. Probability of type class Theorem. For any P ∈ P n and any distribution Q , the probability of the type class T ( P ) under Q n satisfies 1 ( n + 1) M 2 − nD ( P || Q ) ≤ Q n ( T ( P )) ≤ 2 − nD ( P || Q ) . Proof. � Q n ( T ( P ))) = ( x ∈ T ( P )) Q n ( x ) � ( x ∈ T ( P ))2 − n ( H ( P x )+ D ( P x || Q )) = = | T ( P ) | 2 − n ( H ( P x )+ D ( P x || Q )) Since 1 ( n + 1) M 2 nH ( P ) ≤ | T ( P ) | ≤ 2 nH ( P ) , we have Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 16/34 1 2 − nD ( P || Q ) ≤ Q n ( T ( P )) ≤ 2 − nD ( P || Q ) .

  17. Summary |P n | ≤ ( n + 1) M Q n ( x ) = 2 − n ( H ( P x )+ D ( P x || Q )) 1 n log | T ( P ) | → H ( P ) as n → ∞ . − 1 n log Q n ( T ( P )) → D ( P || Q ) as n → ∞ . ■ If X i ∼ Q , the probability of sequences with type P � = Q approaches 0 as n → ∞ . ⇒ Typical sequences are T ( Q ) . Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 17/34

  18. 11.2 Law of Large Numbers Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 18/34

  19. Typical Sequences Q for the distribution Q n is defined as ■ Given ǫ > 0 , the typical T ǫ T ǫ Q = { x : D ( P x || Q ) ≤ ǫ } ■ The probability that x is nontypical is � 1 − Q n ( T ǫ Q n ( T ( P )) Q ) = P : D ( P || Q ) >ǫ � 2 − nD ( P || Q ) ≤ P : D ( P || Q ) >ǫ � 2 − nǫ ≤ P : D ( P || Q ) >ǫ � 2 − nǫ = ( n + 1) M 2 − nǫ ≤ P ∈ Q n = 2 − n ( ǫ − M ln( n +1) ) n Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 19/34

  20. Theorem Theorem. Let X 1 , X 2 , . . . be i.i.d. ∼ P ( x ) . Then Pr( D ( P x || P ) > ǫ ) ≤ 2 − n ( ǫ − M ln( n +1) ) . n Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 20/34

  21. 11.3 Universal Source Coding Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 21/34

  22. Introduction ■ An iid source with a known distribution p ( x ) can be compressed to its entropy H ( X ) . by Huffman coding. ■ Wrong code for incorrect distribution q ( x ) , a penalty of D ( p || q ) bits is incurred. ■ Is there a universal code of rate R that is sufficient to compress every iid source with entropy H ( X ) < R ? Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 22/34

  23. Concept ■ There are 2 nH ( P ) sequences of type P . ■ There are no more than ( n + 1) |X| (polynomial) types. ■ There are no more than ( n + 1) |X| 2 nH ( P ) sequences to describe. ■ If H ( P ) < R there are no more than ( n + 1) |X| 2 nR sequences to describe. Need nR bits as n → ∞ . Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 23/34

  24. 11.4 Large Deviation Theory Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 24/34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend