chapter 3 asymptotic equipartition property
play

Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate - PowerPoint PPT Presentation

Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 3 Asymptotic Equipartition Property 3.1 Asymptotic Equipartition Property Theorem 3.2 Consequences of


  1. Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University

  2. Chapter Outline Chap. 3 Asymptotic Equipartition Property 3.1 Asymptotic Equipartition Property Theorem 3.2 Consequences of the AEP: Data Compression 3.3 High-Probability Sets and Typical Set Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 2/17

  3. 3.1 Asymptotic Equipartition Property Theorem Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 3/17

  4. Definition of convergence Given a sequence of random variables, X 1 , X 2 , . . . we say that the sequence X 1 , X 2 , . . . converges to a random variable X ■ In probability if for every ǫ > 0 , n →∞ Pr {| X n − X | > ǫ } = 0 lim or, equivalently, n →∞ Pr {| X n − X | < ǫ } = 1 lim Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 4/17

  5. Definition of convergence ■ In mean square if n →∞ E [ | X n − X | 2 ] = 0 lim ■ With probability 1 or called almost surely if � � Pr n →∞ X n = X lim = 1 Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 5/17

  6. Weak law of large numbers For i.i.d. random variables X 1 , X 2 , . . . , X n with common mean m , we have n 1 � X i → m in probability . n i =1 That is, for any ǫ > 0 , �� � � n 1 � � � n →∞ Pr lim X i − m = 0 � > ǫ � � n � � � i =1 Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 6/17

  7. AEP Theorem 3.1.1 (AEP) If X 1 , X 2 , . . . are i.i.d. ∼ p ( x ) , then − 1 n log p ( X 1 , X 2 , . . . , X n ) → H ( X ) in probability Proof. Let Z i = − log p ( X i ) be i.i.d. random variables. That is, Z i = − log p [ X i = x ] if X i = x , we have � E [ Z i ] = − p [ X i = x ] log p [ X i = x ] = H ( X i ) = H ( X ) Now, by the weak law of large numbers, 1 � Z i → H ( X ) in probability n i ⇒ − 1 � log p ( X i ) → H ( X ) in probability n i ⇒ − 1 n log p ( X 1 , X 2 , . . . , X n ) → H ( X ) � in probability Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 7/17

  8. Interpretation of AEP ■ When n is sufficient large, p ( X 1 , X 2 , . . . , X n ) = 2 − nH ( X ) with high probability. ■ For example, Let the random number X i with probability P [ X i = 1] = p and P [ X i = 0] = 1 − p = q . If X 1 , X 2 , . . . , X n are i.i.d., � X i q n − � X i . p ( X 1 , X 2 , . . . , X n ) = p When n → ∞ , p ( X 1 , X 2 , . . . , X n ) → p np q nq = 2 − nH . It means that the number of 1’s in the sequence is close to np , and all such sequences have roughly the same probability 2 − nH . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 8/17

  9. Interpretation of AEP ■ Thus for large n we can divide the sequences X 1 , X 2 , . . . , X n into two types: the typical type consisting of sequences each with probability roughly 2 − nH , and another type, consisting of other sequences. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 9/17

  10. Typical set Definition (Typical set) The typical set A ( n ) with respect to p ( x ) is the ǫ set of sequence ( x 1 , x 2 , . . . , x n ) ∈ X n with the property 2 − n ( H ( X )+ ǫ ) ≤ p ( x 1 , x 2 , . . . , x n ) ≤ 2 − n ( H ( X ) − ǫ ) . Theorem 3.1.2 1. If ( x 1 , x 2 , . . . , x n ) ∈ A ( n ) , then ǫ H ( X ) − ǫ ≤ − 1 n log p ( x 1 , x 2 , . . . , x n ) ≤ H ( X ) + ǫ . � . Proof. By the definition of typical set. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 10/17

  11. Theorems Theorem 3.1.2 2. Pr { A ( n ) ǫ } > 1 − ǫ for n sufficiently large. Proof. This property follows directly from Theorem 3.1.1, since the convergence in the mean can be written as �� � � � − 1 � � Pr n log p ( X 1 , X 2 , . . . , X n ) − H ( X ) > 1 − δ � < ǫ � � Setting δ = ǫ , we obtain the desired result. � Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 11/17

  12. Theorems Theorem 3.1.2 3. | A ( n ) ǫ | ≤ 2 n ( H ( X )+ ǫ ) , where | A | denotes the number of elements in the set A . Proof. � � 1 = p ( x ) ≥ p ( x ) x ∈X n x ∈ A ( n ) ǫ 2 − n [ H ( X )+ ǫ ] = 2 − n [ H ( X )+ ǫ ] | A ( n ) � ≥ ǫ | � x ∈ A ( n ) ǫ Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 12/17

  13. Theorems Theorem 3.1.2 4. | A ( n ) ǫ | ≥ (1 − ǫ )2 n ( H ( X ) − ǫ ) for n sufficiently large. Proof. For n sufficiently large, Pr { A ( n ) ǫ } > 1 − ǫ , so that, 2 − n [ H ( X ) − ǫ ] = 2 − n [ H ( X ) − ǫ ] | A ( n ) � 1 − ǫ < Pr { A ( n ) ǫ } ≤ ǫ | � x ∈ A ( n ) ǫ Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 13/17

  14. 3.2 Consequences of the AEP: Data Compression Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 14/17

  15. Typical set and source coding ■ There are |X| n elements in the whole set. ■ There are | A ( n ) ǫ | ≈ 2 n ( H + ǫ ) elements in the typical set. We need n ( H + ǫ ) + 1 bits to encode these elements, and one addition bit to indicate they are typical sequences. ■ There are |X| n − | A ( n ) ǫ | elements in the nontypical set. We can use n log |X| + 1 bits to encode them, and one addition bit to indicate they are non-typical sequences. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 15/17

  16. Average length of codeword E [ l ( X n )] = � p ( x n ) l ( x n ) x n � p ( x n ) l ( x n ) + � p ( x n ) l ( x n ) = x n ∈ A ( n ) x n ∈ [ A ( n ) ] c ǫ ǫ � p ( x n )[ n ( H + ǫ ) + 2] ≤ x n ∈ A ( n ) ǫ � p ( x n )[ n log |X | + 2] + x n ∈ [ A ( n )] c ǫ = Pr { A ( n ) } [ n ( H + ǫ ) + 2] + Pr { [ A ( n ) ] c } [ n log |X | + 2] ǫ ǫ ≤ n ( H + ǫ ) + ǫn log |X | + 2 = n ( H + ǫ ′ ) where ǫ ′ = ǫ + ǫ log |X| + 2 n Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 16/17

  17. Theorems Theorem 3.2.1 Let X n be i.i.d. ∼ p ( x ) . Let ǫ > 0 . Then there exists a code that maps sequences x n of length n into binary strings such that the mapping is one-to-one (and therefore invertible) and � 1 � nl ( X n ) ≤ H ( X ) + ǫ E for n sufficiently large. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 17/17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend