chapter 8 differential entropy
play

Chapter 8 Differential Entropy Peng-Hua Wang Graduate Inst. of - PowerPoint PPT Presentation

Chapter 8 Differential Entropy Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 8 Differential Entropy 8.1 Definitions 8.2 AEP for Continuous Random Variables 8.3 Relation of Differential


  1. Chapter 8 Differential Entropy Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University

  2. Chapter Outline Chap. 8 Differential Entropy 8.1 Definitions 8.2 AEP for Continuous Random Variables 8.3 Relation of Differential Entropy to Discrete Entropy 8.4 Joint and Conditional Differential Entropy 8.5 Relative Entropy and Mutual Information 8.6 Properties of Differential Entropy and Related Amounts Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 2/24

  3. 8.1 Definitions Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 3/24

  4. Definitions Definition 1 (Differential entropy) The differential entropy h ( X ) of a continuous random variable X with pdf f ( X ) is defined as � h ( X ) = − f ( x ) log f ( x ) dx, S where S is the support region of the random variable. Example � a 1 a log 1 X ∼ U (0 , a ) , h ( X ) = − adx = log a. 0 Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 4/24

  5. Differential Entropy of Gaussian 2 πσ 2 e − x 2 2 σ 2 , then 1 Example. If X ∼ N (0 , σ 2 ) with pdf φ ( x ) = √ � h a ( x ) = − φ ( x ) log a φ ( x ) dx 2 πσ 2 − x 2 � � � 1 √ = − φ ( x ) log a 2 σ 2 log a e dx = 1 2 log a (2 πσ 2 ) + log a e 2 σ 2 E φ [ X 2 ] = 1 2 log a (2 πeσ 2 ) � Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 5/24

  6. Differential Entropy of Gaussian Remark. If a random variable with pdf f ( x ) has zero mean and variance σ 2 , then � − f ( x ) log a φ ( x ) dx 2 πσ 2 − x 2 � � � 1 √ = − f ( x ) log a 2 σ 2 log a e dx =1 2 log a (2 πσ 2 ) + log a e 2 σ 2 E f [ X 2 ] = 1 2 log a (2 πeσ 2 ) Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 6/24

  7. Gaussian has Maximal Differential Entropy Suppose that a random variable X with pdf f ( x ) has zero mean and variance σ 2 , what is its maximal differential entropy? Let φ ( x ) be the pdf of N (0 , σ 2 ) . � � f ( x ) log φ ( x ) h ( X ) + f ( x ) log φ ( x ) dx = f ( x ) dx �� � f ( x ) φ ( x ) ≤ log f ( x ) dx (convexity of logarithm) � = log φ ( x ) dx = 0 That is, � f ( x ) log φ ( x ) dx = 1 2 log(2 πeσ 2 ) h ( X ) ≤ − and equality holds if f ( x ) = φ ( x ) . � Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 7/24

  8. 8.2 AEP for Continuous Random Variables Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 8/24

  9. AEP Theorem 1 (AEP) Let X 1 , X 2 , . . . , X n be a sequence of i.i.d. random variables with common pdf f ( x ) . Then, − 1 n log f ( X 1 , X 2 , . . . , X n ) → E [ − log f ( X )] = h ( X ) in probability. Definition 2 (Typical Set) For ǫ > 0 the typical set A ( n ) with respect ǫ to f ( x ) is defined as � ( x 1 , x 2 , . . . , x n ) ∈ S n : A ( n ) = ǫ � � � � − 1 � � n log f ( x 1 , x 2 , . . . , x n ) − h ( X ) � ≤ ǫ � � Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 9/24

  10. AEP Definition 3 (Volume) The volume Vol( A ) of a set A ⊂ R n is defined as � Vol( A ) = dx 1 dx 2 . . . dx n A Theorem 2 (Properties of typical set) 1. Pr( A ( n ) ǫ ) > 1 − ǫ for n sufficiently large. 2. Vol( A ( n ) ǫ ) ≤ 2 n ( h ( X )+ ǫ ) for all n . 3. Vol( A ( n ) ǫ ) ≥ (1 − ǫ )2 n ( h ( X ) − ǫ ) for n sufficiently large. Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 10/24

  11. 8.4 Joint and Conditional Differential Entropy Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 11/24

  12. Definitions Definition 4 (Differential entropy) The differential entropy of jointly distributed random variables X 1 , X 2 , . . . , X n is defined as � f ( x n ) log f ( x n ) dx n h ( X 1 , X 2 , . . . , X n ) = − where f ( x n ) = f ( x 1 , x 2 , . . . , x n ) is the joint pdf. Definition 5 (Conditional differential entropy) The conditional differential entropy of jointly distributed random variables X, Y with joint pdf f ( x, y ) is defined as, if it exists, � h ( X | Y ) = − f ( x, y ) log f ( x | y ) dxdy = h ( X, Y ) − h ( Y ) Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 12/24

  13. Multivariate Normal Distribution Theorem 3 (Entropy of a multivariate normal) Let X 1 , X 2 , . . . , X n have a multivariate normal distribution with mean vector µ and covariance matrix K . Then h ( X 1 , X 2 , . . . , X n ) = 1 2 log(2 πe ) n | K | Proof. The joint pdf of a multivariate normal distribution is 1 (2 π ) n/ 2 | K | 1 / 2 e − 1 2 ( x − µ ) t K − 1 ( x − µ ) φ ( x ) = Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 13/24

  14. Multivariate Normal Distribution Therefore, � h ( X 1 , X 2 , . . . , X n ) = − φ ( x ) log a φ ( x ) d x � 1 � � 2 log a (2 π ) n | K | + 1 2( x − µ ) t K − 1 ( x − µ ) log a e = φ ( x ) d x =1 2 log a (2 π ) n | K | + 1 � � ( x − µ ) t K − 1 ( x − µ ) 2(log a e ) E � �� � = n =1 2 log a (2 π ) n | K | + 1 2 n log a e =1 2 log a (2 πe ) n | K | � Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 14/24

  15. Multivariate Normal Distribution Let Y = ( Y 1 , Y 2 , . . . , Y n ) t be a random vector. If K = E [ YY t ] , then E [ Y t K − 1 Y ] = n. Proof. Denote   | | |   K = E [ YY t ] = . . .  k 1 k 2 k n    | | | and   a t 1   a t   K − 1 = 2     .   .  .    a t n We have k i = E [ Y i Y ] and a t j k i = δ ij . Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 15/24

  16. Multivariate Normal Distribution Now,     a t a t 1 Y 1     a t a t     2 Y 2     Y t K − 1 Y = Y t Y = ( Y 1 , Y 2 , . . . , Y n )     . .     . .  .   .      a t a t n Y n = Y 1 a t 1 Y + Y 2 a t 2 Y + · · · + Y n a t n Y and E [ Y t K − 1 Y ] = a t 1 E [ Y 1 Y ] + a t 2 E [ Y 2 Y ] + · · · + a t n E [ Y n Y ] = a t 1 k 1 + a t 2 k 2 + · · · + a t n k n = n � Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 16/24

  17. 8.5 Relative Entropy and Mutual Information Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 17/24

  18. Definitions Definition 6 (Relative entropy) The relative entropy (or KullbackVLeibler distance) D ( f || g ) between two densities f ( x ) and g ( x ) is defined as � f ( x ) log f ( x ) D ( f || g ) = g ( x ) dx Definition 7 (Mutual information) The mutual information I ( X ; Y ) between two random variables with joint density f ( x, y ) is defined as f ( x, y ) log f ( x, y ) � I ( X ; Y ) = f ( x ) f ( y ) dxdy Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 18/24

  19. Example Let ( X, Y ) ∼ N (0 , K ) where � � σ 2 ρσ 2 K = . ρσ 2 σ 2 2 log(2 πe ) σ 2 and Then h ( X ) = h ( Y ) = 1 h ( X, Y ) = 1 2 log(2 πe ) 2 | K | = 1 2 log(2 πe ) 2 σ 4 (1 − ρ 2 ) . Therefore, I ( X ; Y ) = h ( X ) + h ( Y ) − h ( X ; Y ) = − 1 2 log(1 − ρ 2 ) . Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 19/24

  20. 8.6 Properties of Differential Entropy and Related Amounts Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 20/24

  21. Properties Theorem 4 (Relative entropy) D ( f || g ) ≥ 0 with equality iff f = g almost everywhere. Corollary 1 1. I ( X ; Y ) ≥ 0 with equality iff X and Y are independent. 1. h ( X | Y ) ≤ h ( X ) with equality iff X and Y are independent. Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 21/24

  22. Properties Theorem 5 (Chain rule for differential entropy) n � h ( X 1 , X 2 , . . . , X n ) = h ( X i | X 1 , X 2 , . . . , X i − 1 ) i =1 Corollary 2 n � h ( X 1 , X 2 , . . . , X n ) ≤ h ( X i ) i =1 Corollary 3 (Hadamard’s inequality) If K is the covariance matrix of a multivariate normal distribution, then n � | K | ≤ K ii . i =1 Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 22/24

  23. Properties Theorem 6 1. h ( X + c ) = h ( X ) 2. h ( aX ) = h ( X ) + log | a | . 3. h ( AX ) = h ( X ) + log | det( A ) | Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 23/24

  24. Gaussian has Maximal Entropy Theorem 7 Let the random vector X ∈ R n have zero mean and covariance K = E [ XX t ] . Then h ( X ) ≤ 1 2 log(2 πe ) n | K | . with equality X ∼ N ( 0 , K ) � Proof. Let g ( x ) be any density satisfying x i x j g ( x ) d x = K ij . Let φ ( x ) be the density of N ( 0 , K ) . Then, � � 0 ≤ D ( g || φ ) = g log( g/φ ) = − h ( g ) − g log φ � = − h ( g ) − φ log φ = − h ( g ) + h ( φ ) That is, h ( g ) ≤ h ( φ ) . Equality holds if g = φ. � Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 24/24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend