 
              Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University
Chapter Outline Chap. 3 Asymptotic Equipartition Property 3.1 Asymptotic Equipartition Property Theorem 3.2 Consequences of the AEP: Data Compression 3.3 High-Probability Sets and Typical Set Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 2/17
3.1 Asymptotic Equipartition Property Theorem Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 3/17
Definition of convergence Given a sequence of random variables, X 1 , X 2 , . . . we say that the sequence X 1 , X 2 , . . . converges to a random variable X ■ In probability if for every ǫ > 0 , n →∞ Pr {| X n − X | > ǫ } = 0 lim or, equivalently, n →∞ Pr {| X n − X | < ǫ } = 1 lim Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 4/17
Definition of convergence ■ In mean square if n →∞ E [ | X n − X | 2 ] = 0 lim ■ With probability 1 or called almost surely if � � Pr n →∞ X n = X lim = 1 Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 5/17
Weak law of large numbers For i.i.d. random variables X 1 , X 2 , . . . , X n with common mean m , we have n 1 � X i → m in probability . n i =1 That is, for any ǫ > 0 , �� � � n 1 � � � n →∞ Pr lim X i − m = 0 � > ǫ � � n � � � i =1 Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 6/17
AEP Theorem 3.1.1 (AEP) If X 1 , X 2 , . . . are i.i.d. ∼ p ( x ) , then − 1 n log p ( X 1 , X 2 , . . . , X n ) → H ( X ) in probability Proof. Let Z i = − log p ( X i ) be i.i.d. random variables. That is, Z i = − log p [ X i = x ] if X i = x , we have � E [ Z i ] = − p [ X i = x ] log p [ X i = x ] = H ( X i ) = H ( X ) Now, by the weak law of large numbers, 1 � Z i → H ( X ) in probability n i ⇒ − 1 � log p ( X i ) → H ( X ) in probability n i ⇒ − 1 n log p ( X 1 , X 2 , . . . , X n ) → H ( X ) � in probability Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 7/17
Interpretation of AEP ■ When n is sufficient large, p ( X 1 , X 2 , . . . , X n ) = 2 − nH ( X ) with high probability. ■ For example, Let the random number X i with probability P [ X i = 1] = p and P [ X i = 0] = 1 − p = q . If X 1 , X 2 , . . . , X n are i.i.d., � X i q n − � X i . p ( X 1 , X 2 , . . . , X n ) = p When n → ∞ , p ( X 1 , X 2 , . . . , X n ) → p np q nq = 2 − nH . It means that the number of 1’s in the sequence is close to np , and all such sequences have roughly the same probability 2 − nH . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 8/17
Interpretation of AEP ■ Thus for large n we can divide the sequences X 1 , X 2 , . . . , X n into two types: the typical type consisting of sequences each with probability roughly 2 − nH , and another type, consisting of other sequences. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 9/17
Typical set Definition (Typical set) The typical set A ( n ) with respect to p ( x ) is the ǫ set of sequence ( x 1 , x 2 , . . . , x n ) ∈ X n with the property 2 − n ( H ( X )+ ǫ ) ≤ p ( x 1 , x 2 , . . . , x n ) ≤ 2 − n ( H ( X ) − ǫ ) . Theorem 3.1.2 1. If ( x 1 , x 2 , . . . , x n ) ∈ A ( n ) , then ǫ H ( X ) − ǫ ≤ − 1 n log p ( x 1 , x 2 , . . . , x n ) ≤ H ( X ) + ǫ . � . Proof. By the definition of typical set. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 10/17
Theorems Theorem 3.1.2 2. Pr { A ( n ) ǫ } > 1 − ǫ for n sufficiently large. Proof. This property follows directly from Theorem 3.1.1, since the convergence in the mean can be written as �� � � � − 1 � � Pr n log p ( X 1 , X 2 , . . . , X n ) − H ( X ) > 1 − δ � < ǫ � � Setting δ = ǫ , we obtain the desired result. � Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 11/17
Theorems Theorem 3.1.2 3. | A ( n ) ǫ | ≤ 2 n ( H ( X )+ ǫ ) , where | A | denotes the number of elements in the set A . Proof. � � 1 = p ( x ) ≥ p ( x ) x ∈X n x ∈ A ( n ) ǫ 2 − n [ H ( X )+ ǫ ] = 2 − n [ H ( X )+ ǫ ] | A ( n ) � ≥ ǫ | � x ∈ A ( n ) ǫ Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 12/17
Theorems Theorem 3.1.2 4. | A ( n ) ǫ | ≥ (1 − ǫ )2 n ( H ( X ) − ǫ ) for n sufficiently large. Proof. For n sufficiently large, Pr { A ( n ) ǫ } > 1 − ǫ , so that, 2 − n [ H ( X ) − ǫ ] = 2 − n [ H ( X ) − ǫ ] | A ( n ) � 1 − ǫ < Pr { A ( n ) ǫ } ≤ ǫ | � x ∈ A ( n ) ǫ Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 13/17
3.2 Consequences of the AEP: Data Compression Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 14/17
Typical set and source coding ■ There are |X| n elements in the whole set. ■ There are | A ( n ) ǫ | ≈ 2 n ( H + ǫ ) elements in the typical set. We need n ( H + ǫ ) + 1 bits to encode these elements, and one addition bit to indicate they are typical sequences. ■ There are |X| n − | A ( n ) ǫ | elements in the nontypical set. We can use n log |X| + 1 bits to encode them, and one addition bit to indicate they are non-typical sequences. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 15/17
Average length of codeword E [ l ( X n )] = � p ( x n ) l ( x n ) x n � p ( x n ) l ( x n ) + � p ( x n ) l ( x n ) = x n ∈ A ( n ) x n ∈ [ A ( n ) ] c ǫ ǫ � p ( x n )[ n ( H + ǫ ) + 2] ≤ x n ∈ A ( n ) ǫ � p ( x n )[ n log |X | + 2] + x n ∈ [ A ( n )] c ǫ = Pr { A ( n ) } [ n ( H + ǫ ) + 2] + Pr { [ A ( n ) ] c } [ n log |X | + 2] ǫ ǫ ≤ n ( H + ǫ ) + ǫn log |X | + 2 = n ( H + ǫ ′ ) where ǫ ′ = ǫ + ǫ log |X| + 2 n Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 16/17
Theorems Theorem 3.2.1 Let X n be i.i.d. ∼ p ( x ) . Let ǫ > 0 . Then there exists a code that maps sequences x n of length n into binary strings such that the mapping is one-to-one (and therefore invertible) and � 1 � nl ( X n ) ≤ H ( X ) + ǫ E for n sufficiently large. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 17/17
Recommend
More recommend