Chapter 11 Information Theory and Statistics Peng-Hua Wang - PowerPoint PPT Presentation

Chapter 11 Information Theory and Statistics Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University

Chapter Outline Chap. 11 Information Theory and Statistics 11.1 Method of Types 11.2 Law of Large Numbers 11.3 Universal Source Coding 11.4 Large Deviation Theory 11.5 Examples of Sanov’s THeorem 11.6 Conditional Limit Theorem 11.7 Hypothesis Testing 11.8 Chernoff-Stein Lemma 11.9 Chernoff Information 11.10 Fisher Information and the Cram´ er-Rao Inequality Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 2/34

11.1 Method of Types Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 3/34

Definitions ■ Let X 1 , X 2 , . . . be a sequence of n symbols from an alphabet X = { a 1 , a 2 , . . . , a M } where M = |X| is the number of alphabets. ■ x n ≡ x is a sequence x 1 , x 2 , . . . x n . ■ The type P x (or empirical probability distribution) of a sequence x 1 , x 2 , . . . x n is the relative frequency of each symbol of X . P x ( a ) = N ( a | x ) n for all a ∈ X where N ( a | x ) is the number of times the symbol a occurs in the sequence x . Example. Let X = { a, b, c } , x = aabca . Then the type P x = P aabca is � 3 � P x ( a ) = 3 P x ( b ) = 1 P x ( c ) = 1 5 , 1 5 , 1 Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 4/34 5 , 5 , 5 , or P x = 5

Definitions ■ The type class T ( P ) is the set of sequences that have the same type. T ( P ) = { x : P x = P } . Example. Let X = { a, b, c } , x = aabca . Then the type P x = P aabca is P x ( a ) = 3 P x ( b ) = 1 P x ( c ) = 1 5 , 5 , 5 . The type class T ( P x ) is the set of the length-5 sequences that have 3 a ’s, 1 b and 1 c . T ( P x ) = { aaabc, aabca, abcaa, bcaaa, . . . } . The number of elements in T ( P x ) is � � 5 5! | T ( P x ) | = = 3!1!1! = 20 . Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 5/34 3 , 1 , 1

Definitions ■ Let P n denote the set of types with denominator n . For example, if X = { a, b, c } , �� x 1 n , x 2 n , x 3 � � P n = : x 1 + x 2 + x 3 = n, x 1 ≥ 0 , x 2 ≥ 0 , x 3 ≥ 0 n where x 1 = P ( a ) , x 2 = P ( b ) , x 3 = P ( c ) . Theorem. |P n | ≤ ( n + 1) M Proof. �� x 1 n , x 2 n , . . . , x M �� P n = n where 0 ≤ x k ≤ n. Since there are n + 1 choices for each x k , the � result follows. Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 6/34

Observations ■ The number of sequences of length n is M n . (exponential in n ). ■ The number of types of length n is ( n + 1) M . (polynomial in n ). ■ Therefore, at least one type has exponentially many sequences in its type class. ■ In fact, the largest type class has essentially the same number of elements as the entire set of sequences. Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 7/34

Theorem Theorem. If X 1 , X 2 , . . . , X n are drawn i.i.d. according to Q ( x ) , the probability of x depends only on its type and is given by Q n ( x ) = 2 − n ( H ( P x )+ D ( p x || Q )) where n n � � Q n ( x ) = Pr( x ) = Pr( x i ) = Q ( x i ) . i =1 i =1 Proof. n � � Q ( a ) N ( a | x ) Q n ( x ) = Q ( x i ) = i =1 a ∈X � � Q ( a ) nP x ( a ) = 2 nP x ( a ) log Q ( a ) = a ∈X a ∈X = 2 n � a ∈X P x ( a ) log Q ( a ) Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 8/34

Theorem Proof. (cont.) Since � P x ( a ) log Q ( a ) a ∈X � = ( P x ( a ) log Q ( a ) + P x ( a ) log P x ( a ) − P x ( a ) log P x ( a )) a ∈X = − H ( P x ) − D ( P x || Q ) , we have Q n ( x ) = 2 − n ( H ( P x )+ D ( P x || Q )) . � Corollary. If x is in the type class of Q , then Q n ( x ) = 2 − nH ( Q ) . Proof. If x ∈ T ( Q ) , then P x = Q and D ( P x || Q ) = 0 . � Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 9/34

Size of T(P) Next, we will estimate the size of | T ( P ) | . The exact size of | T ( P ) | is � � n | T ( P ) | = . nP ( a 1 ) , nP ( a 2 ) , . . . , nP ( a M ) This value is hard to manipulate. We give a simple bound of | T ( P ) | . We need the following lemmas. Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 10/34

Size of T(P) Lemma. m ! n ! ≥ n m − n Proof. For m ≥ n, we have m ! n ! = 1 × 2 × · · · × m 1 × 2 × · · · × n = ( n + 1)( n + 2) × · · · × m ≥ n × n × . . . n � �� m − n times = n m − n For m < n, m ! n ! = 1 × 2 × · · · × m 1 1 × 2 × · · · × n = ( m + 1)( m + 2) × · · · × n 1 1 n n − m = n m − n ≥ = � n × n × . . . n Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 11/34 � �� n − m times

Size of T(P) Lemma. The type class T ( P ) has the the highest probability among all type classes under the probability distribution P . P n ( T ( P )) ≥ P n ( T ( ˆ for all ˆ P )) P ∈ P n . Proof. | T ( P ) | � a ∈X P ( a ) nP ( a ) P n ( T ( P )) = P ) | � P n ( T ( ˆ | T ( ˆ a ∈X P ( a ) n ˆ P ( a ) P )) � � � n P ( a ) nP ( a ) nP ( a 1 ) , nP ( a 2 ) , . . . , nP ( a M ) a ∈X = � � � n P ( a ) n ˆ P ( a ) n ˆ P ( a 1 ) , n ˆ P ( a 2 ) , . . . , n ˆ P ( a M ) a ∈X � ( n ˆ P ( a ))! ( nP ( a ))! P ( a ) n ( P ( a ) − ˆ P ( a )) = Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 12/34

Size of T(P) Proof. (cont.) � ( nP ( a )) n ˆ P ( a ) − nP ( a ) P ( a ) n ( P ( a ) − ˆ P ( a )) ≥ a ∈X � n n ˆ P ( a ) − nP ( a ) = a ∈X a ∈X ˆ = n n � P ( a ) − n � a ∈X P ( a ) = n n − n = 1 � Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 13/34

Size of T(P) Theorem. 1 ( n + 1) M 2 nH ( P ) ≤ | T ( P ) | ≤ 2 nH ( P ) . Note. The exact size of | T ( P ) | is � � n | T ( P ) | = . nP ( a 1 ) , nP ( a 2 ) , . . . , nP ( a M ) This value is hard to manipulate. Proof. (upper bound) If X 1 , X 2 , . . . , X n are drawn i.i.d. from P , then P ( a ) nP ( a ) = | T ( P ) | � � � 2 nP ( a ) log P ( a ) 1 ≥ P n ( T ( P )) = x ∈ T ( P ) a ∈X a ∈X a ∈X P ( a ) log P ( a ) = | T ( P ) | 2 − nH ( P ) . = | T ( P ) | 2 n � Thus, | T ( P ) | ≤ 2 nH ( P ) Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 14/34

Size of T(P) Proof. (lower bound) � P n ( T ( Q )) 1 = Q ∈P n � Q P n ( T ( Q )) ≤ max Q ∈P n � P n ( T ( P )) = Q ∈P n ≤ ( n + 1) M P n ( T ( P )) = ( n + 1) M | T ( P ) | 2 − nH ( P ) � Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 15/34

Probability of type class Theorem. For any P ∈ P n and any distribution Q , the probability of the type class T ( P ) under Q n satisfies 1 ( n + 1) M 2 − nD ( P || Q ) ≤ Q n ( T ( P )) ≤ 2 − nD ( P || Q ) . Proof. � Q n ( T ( P ))) = ( x ∈ T ( P )) Q n ( x ) � ( x ∈ T ( P ))2 − n ( H ( P x )+ D ( P x || Q )) = = | T ( P ) | 2 − n ( H ( P x )+ D ( P x || Q )) Since 1 ( n + 1) M 2 nH ( P ) ≤ | T ( P ) | ≤ 2 nH ( P ) , we have Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 16/34 1 2 − nD ( P || Q ) ≤ Q n ( T ( P )) ≤ 2 − nD ( P || Q ) .

Summary |P n | ≤ ( n + 1) M Q n ( x ) = 2 − n ( H ( P x )+ D ( P x || Q )) 1 n log | T ( P ) | → H ( P ) as n → ∞ . − 1 n log Q n ( T ( P )) → D ( P || Q ) as n → ∞ . ■ If X i ∼ Q , the probability of sequences with type P � = Q approaches 0 as n → ∞ . ⇒ Typical sequences are T ( Q ) . Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 17/34

11.2 Law of Large Numbers Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 18/34

Typical Sequences Q for the distribution Q n is defined as ■ Given ǫ > 0 , the typical T ǫ T ǫ Q = { x : D ( P x || Q ) ≤ ǫ } ■ The probability that x is nontypical is � 1 − Q n ( T ǫ Q n ( T ( P )) Q ) = P : D ( P || Q ) >ǫ � 2 − nD ( P || Q ) ≤ P : D ( P || Q ) >ǫ � 2 − nǫ ≤ P : D ( P || Q ) >ǫ � 2 − nǫ = ( n + 1) M 2 − nǫ ≤ P ∈ Q n = 2 − n ( ǫ − M ln( n +1) ) n Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 19/34

Theorem Theorem. Let X 1 , X 2 , . . . be i.i.d. ∼ P ( x ) . Then Pr( D ( P x || P ) > ǫ ) ≤ 2 − n ( ǫ − M ln( n +1) ) . n Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 20/34

11.3 Universal Source Coding Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 21/34

Introduction ■ An iid source with a known distribution p ( x ) can be compressed to its entropy H ( X ) . by Huffman coding. ■ Wrong code for incorrect distribution q ( x ) , a penalty of D ( p || q ) bits is incurred. ■ Is there a universal code of rate R that is sufficient to compress every iid source with entropy H ( X ) < R ? Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 22/34

Concept ■ There are 2 nH ( P ) sequences of type P . ■ There are no more than ( n + 1) |X| (polynomial) types. ■ There are no more than ( n + 1) |X| 2 nH ( P ) sequences to describe. ■ If H ( P ) < R there are no more than ( n + 1) |X| 2 nR sequences to describe. Need nR bits as n → ∞ . Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 23/34

11.4 Large Deviation Theory Peng-Hua Wang, May 21, 2012 Information Theory, Chap. 11 - p. 24/34

Chapter 11 Information Theory and Statistics Peng-Hua Wang - PowerPoint PPT Presentation

Chapter 11 Information Theory and Statistics Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 11 Information Theory and Statistics 11.1 Method of Types 11.2 Law of Large Numbers 11.3 Universal

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Statistics I Chapter 1 What is Statistics? Ling-Chieh Kung Department of Information

Statistics I Chapter 3 Describing Data through Statistics Ling-Chieh Kung Department of

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Chapter II: Basics from probability theory and statistics Information Retrieval & Data

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Statistics I Chapter 2 Visualizing the Data Ling-Chieh Kung Department of Information

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 Inheritance Concepts

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events,

Chapter 1: Probability Theory (a recap) STK4011/9011: Statistical Inference Theory Johan Pensar

Information Theory, Statistics, and Decision Trees L eon Bottou COS 424 4/6/2010 Summary

Statistics I Chapter 7 Sampling Distributions (Part 1) Ling-Chieh Kung Department of

Statistics I Chapter 5 Discrete Probability Distributions Ling-Chieh Kung Department of

Statistics I Chapter 7 Sampling Distributions (Part 2) Ling-Chieh Kung Department of

Homotopy Type Theory Steve Awodey Carnegie Mellon University TACL 2011 Marseille Coming

Eta-equalities in Martin-L of type theory Ansten Klev Czech Academy of Sciences Mathematical

Higher Inductive Types in Homotopy Type Theory Steve Awodey Institute for Advanced Study and

Set Theory in Type Theory Gert Smolka Saarland University Types 2015, Tallinn, May 19, 2015 1 /

Transfinite Constructions in Classical Type Theory Gert Smolka, Steven Sch afer, Christian

Thermal Energy Storage Mechanical Engineering ME462/562 Sustainable Energy: an Exergy

1HFY21 Results Presentation Note: This presentation is to be read in conjunction with the

FINANCIAL STATEMENTS AND RELATED ANNOUNCEMENT::HALF YEARLY RESULTS Issuer & Securities

Chapter 11 Information Theory and Statistics Peng-Hua Wang - PowerPoint PPT Presentation

Chapter 11 Information Theory and Statistics Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 11 Information Theory and Statistics 11.1 Method of Types 11.2 Law of Large Numbers 11.3 Universal

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Statistics I Chapter 1 What is Statistics? Ling-Chieh Kung Department of Information

Statistics I Chapter 3 Describing Data through Statistics Ling-Chieh Kung Department of

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Chapter II: Basics from probability theory and statistics Information Retrieval &amp; Data

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Statistics I Chapter 2 Visualizing the Data Ling-Chieh Kung Department of Information

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 Inheritance Concepts

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events,

Chapter 1: Probability Theory (a recap) STK4011/9011: Statistical Inference Theory Johan Pensar

Information Theory, Statistics, and Decision Trees L eon Bottou COS 424 4/6/2010 Summary

Statistics I Chapter 7 Sampling Distributions (Part 1) Ling-Chieh Kung Department of

Statistics I Chapter 5 Discrete Probability Distributions Ling-Chieh Kung Department of

Statistics I Chapter 7 Sampling Distributions (Part 2) Ling-Chieh Kung Department of

Homotopy Type Theory Steve Awodey Carnegie Mellon University TACL 2011 Marseille Coming

Eta-equalities in Martin-L of type theory Ansten Klev Czech Academy of Sciences Mathematical

Higher Inductive Types in Homotopy Type Theory Steve Awodey Institute for Advanced Study and

Set Theory in Type Theory Gert Smolka Saarland University Types 2015, Tallinn, May 19, 2015 1 /

Transfinite Constructions in Classical Type Theory Gert Smolka, Steven Sch afer, Christian

Thermal Energy Storage Mechanical Engineering ME462/562 Sustainable Energy: an Exergy

1HFY21 Results Presentation Note: This presentation is to be read in conjunction with the

FINANCIAL STATEMENTS AND RELATED ANNOUNCEMENT::HALF YEARLY RESULTS Issuer &amp; Securities

Chapter II: Basics from probability theory and statistics Information Retrieval & Data

FINANCIAL STATEMENTS AND RELATED ANNOUNCEMENT::HALF YEARLY RESULTS Issuer & Securities