Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate - PowerPoint PPT Presentation

Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University

Chapter Outline Chap. 3 Asymptotic Equipartition Property 3.1 Asymptotic Equipartition Property Theorem 3.2 Consequences of the AEP: Data Compression 3.3 High-Probability Sets and Typical Set Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 2/17

3.1 Asymptotic Equipartition Property Theorem Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 3/17

Definition of convergence Given a sequence of random variables, X 1 , X 2 , . . . we say that the sequence X 1 , X 2 , . . . converges to a random variable X ■ In probability if for every ǫ > 0 , n →∞ Pr {| X n − X | > ǫ } = 0 lim or, equivalently, n →∞ Pr {| X n − X | < ǫ } = 1 lim Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 4/17

Definition of convergence ■ In mean square if n →∞ E [ | X n − X | 2 ] = 0 lim ■ With probability 1 or called almost surely if � � Pr n →∞ X n = X lim = 1 Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 5/17

Weak law of large numbers For i.i.d. random variables X 1 , X 2 , . . . , X n with common mean m , we have n 1 � X i → m in probability . n i =1 That is, for any ǫ > 0 , �� n 1 � � � n →∞ Pr lim X i − m = 0 � > ǫ � � n � � � i =1 Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 6/17

AEP Theorem 3.1.1 (AEP) If X 1 , X 2 , . . . are i.i.d. ∼ p ( x ) , then − 1 n log p ( X 1 , X 2 , . . . , X n ) → H ( X ) in probability Proof. Let Z i = − log p ( X i ) be i.i.d. random variables. That is, Z i = − log p [ X i = x ] if X i = x , we have � E [ Z i ] = − p [ X i = x ] log p [ X i = x ] = H ( X i ) = H ( X ) Now, by the weak law of large numbers, 1 � Z i → H ( X ) in probability n i ⇒ − 1 � log p ( X i ) → H ( X ) in probability n i ⇒ − 1 n log p ( X 1 , X 2 , . . . , X n ) → H ( X ) � in probability Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 7/17

Interpretation of AEP ■ When n is sufficient large, p ( X 1 , X 2 , . . . , X n ) = 2 − nH ( X ) with high probability. ■ For example, Let the random number X i with probability P [ X i = 1] = p and P [ X i = 0] = 1 − p = q . If X 1 , X 2 , . . . , X n are i.i.d., � X i q n − � X i . p ( X 1 , X 2 , . . . , X n ) = p When n → ∞ , p ( X 1 , X 2 , . . . , X n ) → p np q nq = 2 − nH . It means that the number of 1’s in the sequence is close to np , and all such sequences have roughly the same probability 2 − nH . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 8/17

Interpretation of AEP ■ Thus for large n we can divide the sequences X 1 , X 2 , . . . , X n into two types: the typical type consisting of sequences each with probability roughly 2 − nH , and another type, consisting of other sequences. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 9/17

Typical set Definition (Typical set) The typical set A ( n ) with respect to p ( x ) is the ǫ set of sequence ( x 1 , x 2 , . . . , x n ) ∈ X n with the property 2 − n ( H ( X )+ ǫ ) ≤ p ( x 1 , x 2 , . . . , x n ) ≤ 2 − n ( H ( X ) − ǫ ) . Theorem 3.1.2 1. If ( x 1 , x 2 , . . . , x n ) ∈ A ( n ) , then ǫ H ( X ) − ǫ ≤ − 1 n log p ( x 1 , x 2 , . . . , x n ) ≤ H ( X ) + ǫ . � . Proof. By the definition of typical set. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 10/17

Theorems Theorem 3.1.2 2. Pr { A ( n ) ǫ } > 1 − ǫ for n sufficiently large. Proof. This property follows directly from Theorem 3.1.1, since the convergence in the mean can be written as �� − 1 � � Pr n log p ( X 1 , X 2 , . . . , X n ) − H ( X ) > 1 − δ � < ǫ � � Setting δ = ǫ , we obtain the desired result. � Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 11/17

Theorems Theorem 3.1.2 3. | A ( n ) ǫ | ≤ 2 n ( H ( X )+ ǫ ) , where | A | denotes the number of elements in the set A . Proof. � � 1 = p ( x ) ≥ p ( x ) x ∈X n x ∈ A ( n ) ǫ 2 − n [ H ( X )+ ǫ ] = 2 − n [ H ( X )+ ǫ ] | A ( n ) � ≥ ǫ | � x ∈ A ( n ) ǫ Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 12/17

Theorems Theorem 3.1.2 4. | A ( n ) ǫ | ≥ (1 − ǫ )2 n ( H ( X ) − ǫ ) for n sufficiently large. Proof. For n sufficiently large, Pr { A ( n ) ǫ } > 1 − ǫ , so that, 2 − n [ H ( X ) − ǫ ] = 2 − n [ H ( X ) − ǫ ] | A ( n ) � 1 − ǫ < Pr { A ( n ) ǫ } ≤ ǫ | � x ∈ A ( n ) ǫ Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 13/17

3.2 Consequences of the AEP: Data Compression Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 14/17

Typical set and source coding ■ There are |X| n elements in the whole set. ■ There are | A ( n ) ǫ | ≈ 2 n ( H + ǫ ) elements in the typical set. We need n ( H + ǫ ) + 1 bits to encode these elements, and one addition bit to indicate they are typical sequences. ■ There are |X| n − | A ( n ) ǫ | elements in the nontypical set. We can use n log |X| + 1 bits to encode them, and one addition bit to indicate they are non-typical sequences. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 15/17

Average length of codeword E [ l ( X n )] = � p ( x n ) l ( x n ) x n � p ( x n ) l ( x n ) + � p ( x n ) l ( x n ) = x n ∈ A ( n ) x n ∈ [ A ( n ) ] c ǫ ǫ � p ( x n )[ n ( H + ǫ ) + 2] ≤ x n ∈ A ( n ) ǫ � p ( x n )[ n log |X | + 2] + x n ∈ [ A ( n )] c ǫ = Pr { A ( n ) } [ n ( H + ǫ ) + 2] + Pr { [ A ( n ) ] c } [ n log |X | + 2] ǫ ǫ ≤ n ( H + ǫ ) + ǫn log |X | + 2 = n ( H + ǫ ′ ) where ǫ ′ = ǫ + ǫ log |X| + 2 n Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 16/17

Theorems Theorem 3.2.1 Let X n be i.i.d. ∼ p ( x ) . Let ǫ > 0 . Then there exists a code that maps sequences x n of length n into binary strings such that the mapping is one-to-one (and therefore invertible) and � 1 � nl ( X n ) ≤ H ( X ) + ǫ E for n sufficiently large. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 17/17

Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate - PowerPoint PPT Presentation

Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 3 Asymptotic Equipartition Property 3.1 Asymptotic Equipartition Property Theorem 3.2 Consequences of

An Introduction to Asymptotic Theory Ping Yu School of Economics and Finance The University of

The Asymptotical Equipartition Property of Supremus Typicality in the Weak Sense Sheng Huang and

Cut-points in asymptotic cones of groups Mark Sapir With J. Behrstock, C. Drut u, S. Mozes,

What is the cloud? Property of TalentWise Property of TalentWise Cloud HCM Players Property of

PROPERTY RATES PROPERTY RATES PROPERTY RATES PROPERTY RATES BUFFALO CITY MUNICIPALITY

Thermal Properties of Materials Thermal Properties of Materials Heat Capacity, Content, Energy

Effects of breaking vibrational energy equipartition on measurements of temperature in

Topic 15 Maximum Likelihood Estimation Examples and Asymptotic Properties 1 / 14 Normal Random

Asymptotic Behaviour of the Quadratic Knapsack Problem Joachim Schauer Department of Statistics

Asymptotic Theory Part I Review of Asymptotic Theory James J. Heckman University of Chicago

Statistics Asymptotic Theory Shiu-Sheng Chen Department of Economics National Taiwan University

On the Asymptotic Variance of the Estimator of Kendalls Tau Barbara Dengler, Uwe Schmock

Outline Outline Gumbel Gumbel Asymptotic Distributions Asymptotic Distributions

Asymptotic Analysis Pedro Ribeiro DCC/FCUP 2018/2019 Pedro Ribeiro (DCC/FCUP) Asymptotic

Tirgul 2 Asymptotic Analysis Asymptotic Analysis Motivation: Suppose you want to evaluate

Asymptotic behavior of series of multiple integrals. K. K. Kozlowski CNRS, Institut de

Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! x 2 , , ! x N R d

Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Sup

Partial match queries: a limit process Nicolas Broutin Ralph Neininger Henning Sulzbach Partial

The hyperbolic Brownian plane Thomas Budzinski ENS Paris July 7th, 2016 Thomas Budzinski The

Clustering 2 Clustering 2 Nov 3 2008 HAC Algorithm HAC Algorithm St t Start with all objects in

Stochastic solution of large least squares systems in variational data assimilation Parallel

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Linear Regression II, SGD Milan Straka October 12, 2020 Charles University in Prague Faculty of