lecture 5 continuous valued sources and channels
play

Lecture 5 Continuous-Valued Sources and Channels I-Hsiang Wang - PowerPoint PPT Presentation

Lecture 5 Continuous-Valued Sources and Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 4, 2016 1 / 63 I-Hsiang Wang IT Lecture 5 From Discrete to Continuous So far we have


  1. Lecture 5 Continuous-Valued Sources and Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 4, 2016 1 / 63 I-Hsiang Wang IT Lecture 5

  2. From Discrete to Continuous So far we have focused on the discrete (& finite-alphabet) sources and channels: Information measures for discrete r.v's. Lossless/Lossy source coding for discrete stationary/memoryless sources. Channel coding over discrete memoryless channels. In this lecture, we extend the basic principles and fundamental theorems to continuous random sources and channels. In particular: Mutual information and information divergence for continuous r.v.'s. Channel coding over continuous memoryless channels. Lossy source coding for continuous memoryless sources. 2 / 63 I-Hsiang Wang IT Lecture 5

  3. Outline 1 First we investigate basic information measures – entropy, mutual information, and KL divergence – when the r.v.'s are continuous. We will see that both mutual information and KL divergence are well defined, while entropy of continuous r.v. is not. 2 Then, we introduce differential entropy as a continuous r.v.'s counterpart of Shannon entropy, and discuss the related properties. 3 / 63 I-Hsiang Wang IT Lecture 5

  4. Measures of Information for Continuous Random Variables Entropy and Mutual Information Measures of Information for Continuous Random Variables 1 Entropy and Mutual Information Differential Entropy 2 Channel Coding over Continuous Memoryless Channels Continuous Memoryless Channel Gaussian Channel Capacity 3 Lossy Source Coding for Continuous Memoryless Sources 4 / 63 I-Hsiang Wang IT Lecture 5

  5. Measures of Information for Continuous Random Variables Entropy and Mutual Information Entropy of a Continuous Random Variable Question : What is the entropy of a continuous real-valued random variable X ? Suppose X has the probability density function (p.d.f.) f X ( · ) . Let us discretize X to answer this question, as follows: Partition R into length- ∆ intervals: R = ∪ ∞ k = −∞ [ k ∆ , ( k + 1)∆) . Suppose that f X ( · ) is continuous (drop subscript " X " below) , then by the mean-value theorem, ∫ ( k +1)∆ ∀ k ∈ Z , ∃ x k ∈ [ k ∆ , ( k + 1)∆) such that f ( x k ) = 1 f ( x ) dx. ∆ k ∆ Set [ X ] ∆ ≜ x k if X ∈ [ k ∆ , ( k + 1)∆) , with p.m.f. P ( x k ) = f ( x k ) ∆ . 5 / 63 I-Hsiang Wang IT Lecture 5

  6. Measures of Information for Continuous Random Variables Entropy and Mutual Information f ( x ) f ( x ) � dF X ( x ) F X ( x ) � P { X ≤ x } dx x 6 / 63 I-Hsiang Wang IT Lecture 5

  7. Measures of Information for Continuous Random Variables Entropy and Mutual Information f ( x ) ∆ x 7 / 63 I-Hsiang Wang IT Lecture 5

  8. Measures of Information for Continuous Random Variables Entropy and Mutual Information f ( x ) ∆ x x 1 x 3 x 5 8 / 63 I-Hsiang Wang IT Lecture 5

  9. log Measures of Information for Continuous Random Variables Entropy and Mutual Information Observation : lim ∆ → 0 H ([ X ] ∆ ) = H ( X ) (intuitively), while ∞ ∞ ∑ ∑ H ([ X ] ∆ ) = − ( f ( x k ) ∆) log ( f ( x k ) ∆) = − ∆ f ( x k ) log f ( x k ) − log ∆ k = −∞ k = −∞ ∫ ∞ as ∆ → 0 → − f ( x ) log f ( x ) dx + ∞ = ∞ ∞ [ ] ∫ ∞ 1 Hence, H ( X ) = ∞ if − exists. ∞ f ( x ) log f ( x ) dx = E f ( X ) It is quite intuitive that the entropy of a continuous random variable can be arbitrarily large, because it can take infinitely many possible values. 9 / 63 I-Hsiang Wang IT Lecture 5

  10. Measures of Information for Continuous Random Variables Entropy and Mutual Information Mutual Information between Continuous Random Variables How about mutual information between two continuous r.v.'s X and Y , with joint p.d.f. f X,Y ( x, y ) and marginal p.d.f.'s f X ( x ) and f Y ( y ) ? Again, we use discretization: Partition R 2 into ∆ × ∆ squares: R 2 = ∪ ∞ k,j = −∞ I ∆ k × I ∆ j , where I ∆ k = [ k ∆ , ( k + 1)∆) . Suppose that f X,Y ( x, y ) is continuous, then by the mean-value theorem (MVT), ∀ k, j ∈ Z , ∃ ( x k , y j ) ∈ I ∆ k × I ∆ j such that ∫ 1 f X,Y ( x k , y j ) = j f X,Y ( x, y ) dx dy. ∆ 2 I ∆ k ×I ∆ Set ([ X ] ∆ , [ Y ] ∆ ) ≜ ( x k , y j ) if ( X, Y ) ∈ I ∆ k × I ∆ j , with p.m.f. P X,Y ( x k , y j ) = f X,Y ( x k , y j ) ∆ 2 . 10 / 63 I-Hsiang Wang IT Lecture 5

  11. log log Measures of Information for Continuous Random Variables Entropy and Mutual Information x k ∈ I ∆ y j ∈ I ∆ By MVT, ∀ k, j ∈ Z , ∃ � k and � j such that ∫ ∫ P X ( x k ) = k f X ( x ) dx = ∆ f X ( � x k ) , P Y ( y j ) = j f Y ( y ) dy = ∆ f Y ( � y j ) . I ∆ I ∆ Observation : lim ∆ → 0 I ([ X ] ∆ ; [ Y ] ∆ ) = I ( X ; Y ) (intuitively), while I ([ X ] ∆ ; [ Y ] ∆ ) = ∑ ∞ P X,Y ( x k ,y j ) k,j = −∞ P X,Y ( x k , y j ) log P X ( x k ) P Y ( y j ) ✟ ( f X,Y ( x k , y j ) ∆ 2 ) f X,Y ( x k ,y j ) ✟ = ∑ ∞ ∆ 2 ✟ y j ) ✟ k,j = −∞ ∆ 2 f X ( � x k ) f Y ( � = ∆ 2 ∑ ∞ f X,Y ( x k ,y j ) k,j = −∞ f X,Y ( x k , y j ) log f X ( � x k ) f Y ( � y j ) ∫ ∞ ∫ ∞ f X,Y ( x,y ) f X ( x ) f Y ( y ) dx dy as ∆ → 0 → −∞ f X,Y ( x, y ) log −∞ [ ] f ( X,Y ) Hence, I ( X ; Y ) = E if the improper integral exists. f ( X ) f ( Y ) 11 / 63 I-Hsiang Wang IT Lecture 5

  12. Measures of Information for Continuous Random Variables Entropy and Mutual Information Mutual Information Unlike entropy that is only well-defined for discrete r.v.'s, in general we can define the mutual information between two real-valued r.v.'s (no necessarily continuous or discrete) as follows. Definition 1 (Mutual information) The mutual information between two r.v.'s X and Y is defined as ( ) , I ( X ; Y ) = sup I [ X ] P ; [ Y ] Q P , Q where the supremum is taken over all pairs of partitions P and Q of R . Similar to mutual information, information divergence can also be defined between two probability measures, no matter the probability distributions are discrete, continuous, etc. Remark : Although defining information measures in such a general way is nice, these definitions do not provide explicit ways to compute these information measures. 12 / 63 I-Hsiang Wang IT Lecture 5

  13. Measures of Information for Continuous Random Variables Differential Entropy Measures of Information for Continuous Random Variables 1 Entropy and Mutual Information Differential Entropy 2 Channel Coding over Continuous Memoryless Channels Continuous Memoryless Channel Gaussian Channel Capacity 3 Lossy Source Coding for Continuous Memoryless Sources 13 / 63 I-Hsiang Wang IT Lecture 5

  14. log log log Measures of Information for Continuous Random Variables Differential Entropy Differential Entropy For continuous r.v.'s, let's define the following counterparts of entropy and conditional entropy. Definition 2 (Differential entropy and conditional differential entropy) [ ] 1 The differential entropy of a continuous r.v. X with p.d.f. f X is defined as h ( X ) ≜ E if f X ( X ) the (improper) integral exists. The conditional differential entropy of a continuous r.v. X given Y , where ( X, Y ) has joint p.d.f. f X,Y and conditional p.d.f. f X | Y , is defined as [ ] 1 h ( X | Y ) ≜ E if the (improper) integral exists. f X | Y ( X | Y ) We have the following theorem immediately from the previous discussion: Theorem 1 (Mutual information between two continuous r.v.'s) [ ] f X,Y ( X,Y ) = h ( X ) − h ( X | Y ) . I ( X ; Y ) = E f X ( X ) f Y ( Y ) 14 / 63 I-Hsiang Wang IT Lecture 5

  15. Measures of Information for Continuous Random Variables Differential Entropy Information Divergence Definition 3 (Information divergence for densities) The information divergence from density g ( x ) to f ( x ) is defined as [ ] ∫ log f ( X ) x ∈ supp f f ( x ) log f ( x ) D ( f ∥ g ) ≜ E X ∼ f = g ( x ) dx g ( X ) if the (improper) integral exists. By Jensen's inequality, it is straightforward to see that the non-negativity of KL divergence remains. Proposition 1 (Non-negativity of Information divergence) D ( f ∥ g ) ≥ 0 , with equality iff f = g almost everywhere (i.e., except for some points with zero probability) . Note : D ( f ∥ g ) is finite only if the support of f ( x ) is contained in the support of g ( x ) . 15 / 63 I-Hsiang Wang IT Lecture 5

  16. Measures of Information for Continuous Random Variables Differential Entropy Properties that Extend to Continuous R.V.'s Proposition 2 (Chain rule) � h ( X n ) = ∑ n ( � X i − 1 ) h ( X, Y ) = h ( X ) + h ( Y | X ) , i =1 h X i . Proposition 3 (Conditioning reduces differential entropy) h ( X | Y ) ≤ h ( X ) , h ( X | Y, Z ) ≤ h ( X | Z ) . Proposition 4 (Non-negativity of mutual information) I ( X ; Y ) ≥ 0 , I ( X ; Y | Z ) ≥ 0 . 16 / 63 I-Hsiang Wang IT Lecture 5

  17. Measures of Information for Continuous Random Variables Differential Entropy Examples Example 1 (Differential entropy of a uniform r.v.) 1 For a r.v. X ∼ Unif [ a, b ] , that is, its p.d.f. f X ( x ) = b − a 1 { a ≤ x ≤ b } , its differential entropy h ( X ) = log ( b − a ) . Example 2 (Differential entropy of N (0 , 1) ) 2 π e − x 2 1 2 , its differential entropy For a r.v. X ∼ N (0 , 1) , that is, its p.d.f. f X ( x ) = √ h ( X ) = 1 2 log (2 πe ) . 17 / 63 I-Hsiang Wang IT Lecture 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend