 
              Lecture 5 Continuous-Valued Sources and Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 4, 2016 1 / 63 I-Hsiang Wang IT Lecture 5
From Discrete to Continuous So far we have focused on the discrete (& finite-alphabet) sources and channels: Information measures for discrete r.v's. Lossless/Lossy source coding for discrete stationary/memoryless sources. Channel coding over discrete memoryless channels. In this lecture, we extend the basic principles and fundamental theorems to continuous random sources and channels. In particular: Mutual information and information divergence for continuous r.v.'s. Channel coding over continuous memoryless channels. Lossy source coding for continuous memoryless sources. 2 / 63 I-Hsiang Wang IT Lecture 5
Outline 1 First we investigate basic information measures – entropy, mutual information, and KL divergence – when the r.v.'s are continuous. We will see that both mutual information and KL divergence are well defined, while entropy of continuous r.v. is not. 2 Then, we introduce differential entropy as a continuous r.v.'s counterpart of Shannon entropy, and discuss the related properties. 3 / 63 I-Hsiang Wang IT Lecture 5
Measures of Information for Continuous Random Variables Entropy and Mutual Information Measures of Information for Continuous Random Variables 1 Entropy and Mutual Information Differential Entropy 2 Channel Coding over Continuous Memoryless Channels Continuous Memoryless Channel Gaussian Channel Capacity 3 Lossy Source Coding for Continuous Memoryless Sources 4 / 63 I-Hsiang Wang IT Lecture 5
Measures of Information for Continuous Random Variables Entropy and Mutual Information Entropy of a Continuous Random Variable Question : What is the entropy of a continuous real-valued random variable X ? Suppose X has the probability density function (p.d.f.) f X ( · ) . Let us discretize X to answer this question, as follows: Partition R into length- ∆ intervals: R = ∪ ∞ k = −∞ [ k ∆ , ( k + 1)∆) . Suppose that f X ( · ) is continuous (drop subscript " X " below) , then by the mean-value theorem, ∫ ( k +1)∆ ∀ k ∈ Z , ∃ x k ∈ [ k ∆ , ( k + 1)∆) such that f ( x k ) = 1 f ( x ) dx. ∆ k ∆ Set [ X ] ∆ ≜ x k if X ∈ [ k ∆ , ( k + 1)∆) , with p.m.f. P ( x k ) = f ( x k ) ∆ . 5 / 63 I-Hsiang Wang IT Lecture 5
Measures of Information for Continuous Random Variables Entropy and Mutual Information f ( x ) f ( x ) � dF X ( x ) F X ( x ) � P { X ≤ x } dx x 6 / 63 I-Hsiang Wang IT Lecture 5
Measures of Information for Continuous Random Variables Entropy and Mutual Information f ( x ) ∆ x 7 / 63 I-Hsiang Wang IT Lecture 5
Measures of Information for Continuous Random Variables Entropy and Mutual Information f ( x ) ∆ x x 1 x 3 x 5 8 / 63 I-Hsiang Wang IT Lecture 5
log Measures of Information for Continuous Random Variables Entropy and Mutual Information Observation : lim ∆ → 0 H ([ X ] ∆ ) = H ( X ) (intuitively), while ∞ ∞ ∑ ∑ H ([ X ] ∆ ) = − ( f ( x k ) ∆) log ( f ( x k ) ∆) = − ∆ f ( x k ) log f ( x k ) − log ∆ k = −∞ k = −∞ ∫ ∞ as ∆ → 0 → − f ( x ) log f ( x ) dx + ∞ = ∞ ∞ [ ] ∫ ∞ 1 Hence, H ( X ) = ∞ if − exists. ∞ f ( x ) log f ( x ) dx = E f ( X ) It is quite intuitive that the entropy of a continuous random variable can be arbitrarily large, because it can take infinitely many possible values. 9 / 63 I-Hsiang Wang IT Lecture 5
Measures of Information for Continuous Random Variables Entropy and Mutual Information Mutual Information between Continuous Random Variables How about mutual information between two continuous r.v.'s X and Y , with joint p.d.f. f X,Y ( x, y ) and marginal p.d.f.'s f X ( x ) and f Y ( y ) ? Again, we use discretization: Partition R 2 into ∆ × ∆ squares: R 2 = ∪ ∞ k,j = −∞ I ∆ k × I ∆ j , where I ∆ k = [ k ∆ , ( k + 1)∆) . Suppose that f X,Y ( x, y ) is continuous, then by the mean-value theorem (MVT), ∀ k, j ∈ Z , ∃ ( x k , y j ) ∈ I ∆ k × I ∆ j such that ∫ 1 f X,Y ( x k , y j ) = j f X,Y ( x, y ) dx dy. ∆ 2 I ∆ k ×I ∆ Set ([ X ] ∆ , [ Y ] ∆ ) ≜ ( x k , y j ) if ( X, Y ) ∈ I ∆ k × I ∆ j , with p.m.f. P X,Y ( x k , y j ) = f X,Y ( x k , y j ) ∆ 2 . 10 / 63 I-Hsiang Wang IT Lecture 5
log log Measures of Information for Continuous Random Variables Entropy and Mutual Information x k ∈ I ∆ y j ∈ I ∆ By MVT, ∀ k, j ∈ Z , ∃ � k and � j such that ∫ ∫ P X ( x k ) = k f X ( x ) dx = ∆ f X ( � x k ) , P Y ( y j ) = j f Y ( y ) dy = ∆ f Y ( � y j ) . I ∆ I ∆ Observation : lim ∆ → 0 I ([ X ] ∆ ; [ Y ] ∆ ) = I ( X ; Y ) (intuitively), while I ([ X ] ∆ ; [ Y ] ∆ ) = ∑ ∞ P X,Y ( x k ,y j ) k,j = −∞ P X,Y ( x k , y j ) log P X ( x k ) P Y ( y j ) ✟ ( f X,Y ( x k , y j ) ∆ 2 ) f X,Y ( x k ,y j ) ✟ = ∑ ∞ ∆ 2 ✟ y j ) ✟ k,j = −∞ ∆ 2 f X ( � x k ) f Y ( � = ∆ 2 ∑ ∞ f X,Y ( x k ,y j ) k,j = −∞ f X,Y ( x k , y j ) log f X ( � x k ) f Y ( � y j ) ∫ ∞ ∫ ∞ f X,Y ( x,y ) f X ( x ) f Y ( y ) dx dy as ∆ → 0 → −∞ f X,Y ( x, y ) log −∞ [ ] f ( X,Y ) Hence, I ( X ; Y ) = E if the improper integral exists. f ( X ) f ( Y ) 11 / 63 I-Hsiang Wang IT Lecture 5
Measures of Information for Continuous Random Variables Entropy and Mutual Information Mutual Information Unlike entropy that is only well-defined for discrete r.v.'s, in general we can define the mutual information between two real-valued r.v.'s (no necessarily continuous or discrete) as follows. Definition 1 (Mutual information) The mutual information between two r.v.'s X and Y is defined as ( ) , I ( X ; Y ) = sup I [ X ] P ; [ Y ] Q P , Q where the supremum is taken over all pairs of partitions P and Q of R . Similar to mutual information, information divergence can also be defined between two probability measures, no matter the probability distributions are discrete, continuous, etc. Remark : Although defining information measures in such a general way is nice, these definitions do not provide explicit ways to compute these information measures. 12 / 63 I-Hsiang Wang IT Lecture 5
Measures of Information for Continuous Random Variables Differential Entropy Measures of Information for Continuous Random Variables 1 Entropy and Mutual Information Differential Entropy 2 Channel Coding over Continuous Memoryless Channels Continuous Memoryless Channel Gaussian Channel Capacity 3 Lossy Source Coding for Continuous Memoryless Sources 13 / 63 I-Hsiang Wang IT Lecture 5
log log log Measures of Information for Continuous Random Variables Differential Entropy Differential Entropy For continuous r.v.'s, let's define the following counterparts of entropy and conditional entropy. Definition 2 (Differential entropy and conditional differential entropy) [ ] 1 The differential entropy of a continuous r.v. X with p.d.f. f X is defined as h ( X ) ≜ E if f X ( X ) the (improper) integral exists. The conditional differential entropy of a continuous r.v. X given Y , where ( X, Y ) has joint p.d.f. f X,Y and conditional p.d.f. f X | Y , is defined as [ ] 1 h ( X | Y ) ≜ E if the (improper) integral exists. f X | Y ( X | Y ) We have the following theorem immediately from the previous discussion: Theorem 1 (Mutual information between two continuous r.v.'s) [ ] f X,Y ( X,Y ) = h ( X ) − h ( X | Y ) . I ( X ; Y ) = E f X ( X ) f Y ( Y ) 14 / 63 I-Hsiang Wang IT Lecture 5
Measures of Information for Continuous Random Variables Differential Entropy Information Divergence Definition 3 (Information divergence for densities) The information divergence from density g ( x ) to f ( x ) is defined as [ ] ∫ log f ( X ) x ∈ supp f f ( x ) log f ( x ) D ( f ∥ g ) ≜ E X ∼ f = g ( x ) dx g ( X ) if the (improper) integral exists. By Jensen's inequality, it is straightforward to see that the non-negativity of KL divergence remains. Proposition 1 (Non-negativity of Information divergence) D ( f ∥ g ) ≥ 0 , with equality iff f = g almost everywhere (i.e., except for some points with zero probability) . Note : D ( f ∥ g ) is finite only if the support of f ( x ) is contained in the support of g ( x ) . 15 / 63 I-Hsiang Wang IT Lecture 5
Measures of Information for Continuous Random Variables Differential Entropy Properties that Extend to Continuous R.V.'s Proposition 2 (Chain rule) � h ( X n ) = ∑ n ( � X i − 1 ) h ( X, Y ) = h ( X ) + h ( Y | X ) , i =1 h X i . Proposition 3 (Conditioning reduces differential entropy) h ( X | Y ) ≤ h ( X ) , h ( X | Y, Z ) ≤ h ( X | Z ) . Proposition 4 (Non-negativity of mutual information) I ( X ; Y ) ≥ 0 , I ( X ; Y | Z ) ≥ 0 . 16 / 63 I-Hsiang Wang IT Lecture 5
Measures of Information for Continuous Random Variables Differential Entropy Examples Example 1 (Differential entropy of a uniform r.v.) 1 For a r.v. X ∼ Unif [ a, b ] , that is, its p.d.f. f X ( x ) = b − a 1 { a ≤ x ≤ b } , its differential entropy h ( X ) = log ( b − a ) . Example 2 (Differential entropy of N (0 , 1) ) 2 π e − x 2 1 2 , its differential entropy For a r.v. X ∼ N (0 , 1) , that is, its p.d.f. f X ( x ) = √ h ( X ) = 1 2 log (2 πe ) . 17 / 63 I-Hsiang Wang IT Lecture 5
Recommend
More recommend