Lecture 5 Continuous-Valued Sources and Channels I-Hsiang Wang - PowerPoint PPT Presentation

Lecture 5 Continuous-Valued Sources and Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 4, 2016 1 / 63 I-Hsiang Wang IT Lecture 5

From Discrete to Continuous So far we have focused on the discrete (& finite-alphabet) sources and channels: Information measures for discrete r.v's. Lossless/Lossy source coding for discrete stationary/memoryless sources. Channel coding over discrete memoryless channels. In this lecture, we extend the basic principles and fundamental theorems to continuous random sources and channels. In particular: Mutual information and information divergence for continuous r.v.'s. Channel coding over continuous memoryless channels. Lossy source coding for continuous memoryless sources. 2 / 63 I-Hsiang Wang IT Lecture 5

Outline 1 First we investigate basic information measures – entropy, mutual information, and KL divergence – when the r.v.'s are continuous. We will see that both mutual information and KL divergence are well defined, while entropy of continuous r.v. is not. 2 Then, we introduce differential entropy as a continuous r.v.'s counterpart of Shannon entropy, and discuss the related properties. 3 / 63 I-Hsiang Wang IT Lecture 5

Measures of Information for Continuous Random Variables Entropy and Mutual Information Measures of Information for Continuous Random Variables 1 Entropy and Mutual Information Differential Entropy 2 Channel Coding over Continuous Memoryless Channels Continuous Memoryless Channel Gaussian Channel Capacity 3 Lossy Source Coding for Continuous Memoryless Sources 4 / 63 I-Hsiang Wang IT Lecture 5

Measures of Information for Continuous Random Variables Entropy and Mutual Information Entropy of a Continuous Random Variable Question : What is the entropy of a continuous real-valued random variable X ? Suppose X has the probability density function (p.d.f.) f X ( · ) . Let us discretize X to answer this question, as follows: Partition R into length- ∆ intervals: R = ∪ ∞ k = −∞ [ k ∆ , ( k + 1)∆) . Suppose that f X ( · ) is continuous (drop subscript " X " below) , then by the mean-value theorem, ∫ ( k +1)∆ ∀ k ∈ Z , ∃ x k ∈ [ k ∆ , ( k + 1)∆) such that f ( x k ) = 1 f ( x ) dx. ∆ k ∆ Set [ X ] ∆ ≜ x k if X ∈ [ k ∆ , ( k + 1)∆) , with p.m.f. P ( x k ) = f ( x k ) ∆ . 5 / 63 I-Hsiang Wang IT Lecture 5

Measures of Information for Continuous Random Variables Entropy and Mutual Information f ( x ) f ( x ) � dF X ( x ) F X ( x ) � P { X ≤ x } dx x 6 / 63 I-Hsiang Wang IT Lecture 5

Measures of Information for Continuous Random Variables Entropy and Mutual Information f ( x ) ∆ x 7 / 63 I-Hsiang Wang IT Lecture 5

Measures of Information for Continuous Random Variables Entropy and Mutual Information f ( x ) ∆ x x 1 x 3 x 5 8 / 63 I-Hsiang Wang IT Lecture 5

log Measures of Information for Continuous Random Variables Entropy and Mutual Information Observation : lim ∆ → 0 H ([ X ] ∆ ) = H ( X ) (intuitively), while ∞ ∞ ∑ ∑ H ([ X ] ∆ ) = − ( f ( x k ) ∆) log ( f ( x k ) ∆) = − ∆ f ( x k ) log f ( x k ) − log ∆ k = −∞ k = −∞ ∫ ∞ as ∆ → 0 → − f ( x ) log f ( x ) dx + ∞ = ∞ ∞ [ ] ∫ ∞ 1 Hence, H ( X ) = ∞ if − exists. ∞ f ( x ) log f ( x ) dx = E f ( X ) It is quite intuitive that the entropy of a continuous random variable can be arbitrarily large, because it can take infinitely many possible values. 9 / 63 I-Hsiang Wang IT Lecture 5

Measures of Information for Continuous Random Variables Entropy and Mutual Information Mutual Information between Continuous Random Variables How about mutual information between two continuous r.v.'s X and Y , with joint p.d.f. f X,Y ( x, y ) and marginal p.d.f.'s f X ( x ) and f Y ( y ) ? Again, we use discretization: Partition R 2 into ∆ × ∆ squares: R 2 = ∪ ∞ k,j = −∞ I ∆ k × I ∆ j , where I ∆ k = [ k ∆ , ( k + 1)∆) . Suppose that f X,Y ( x, y ) is continuous, then by the mean-value theorem (MVT), ∀ k, j ∈ Z , ∃ ( x k , y j ) ∈ I ∆ k × I ∆ j such that ∫ 1 f X,Y ( x k , y j ) = j f X,Y ( x, y ) dx dy. ∆ 2 I ∆ k ×I ∆ Set ([ X ] ∆ , [ Y ] ∆ ) ≜ ( x k , y j ) if ( X, Y ) ∈ I ∆ k × I ∆ j , with p.m.f. P X,Y ( x k , y j ) = f X,Y ( x k , y j ) ∆ 2 . 10 / 63 I-Hsiang Wang IT Lecture 5

log log Measures of Information for Continuous Random Variables Entropy and Mutual Information x k ∈ I ∆ y j ∈ I ∆ By MVT, ∀ k, j ∈ Z , ∃ � k and � j such that ∫ ∫ P X ( x k ) = k f X ( x ) dx = ∆ f X ( � x k ) , P Y ( y j ) = j f Y ( y ) dy = ∆ f Y ( � y j ) . I ∆ I ∆ Observation : lim ∆ → 0 I ([ X ] ∆ ; [ Y ] ∆ ) = I ( X ; Y ) (intuitively), while I ([ X ] ∆ ; [ Y ] ∆ ) = ∑ ∞ P X,Y ( x k ,y j ) k,j = −∞ P X,Y ( x k , y j ) log P X ( x k ) P Y ( y j ) ✟ ( f X,Y ( x k , y j ) ∆ 2 ) f X,Y ( x k ,y j ) ✟ = ∑ ∞ ∆ 2 ✟ y j ) ✟ k,j = −∞ ∆ 2 f X ( � x k ) f Y ( � = ∆ 2 ∑ ∞ f X,Y ( x k ,y j ) k,j = −∞ f X,Y ( x k , y j ) log f X ( � x k ) f Y ( � y j ) ∫ ∞ ∫ ∞ f X,Y ( x,y ) f X ( x ) f Y ( y ) dx dy as ∆ → 0 → −∞ f X,Y ( x, y ) log −∞ [ ] f ( X,Y ) Hence, I ( X ; Y ) = E if the improper integral exists. f ( X ) f ( Y ) 11 / 63 I-Hsiang Wang IT Lecture 5

Measures of Information for Continuous Random Variables Entropy and Mutual Information Mutual Information Unlike entropy that is only well-defined for discrete r.v.'s, in general we can define the mutual information between two real-valued r.v.'s (no necessarily continuous or discrete) as follows. Definition 1 (Mutual information) The mutual information between two r.v.'s X and Y is defined as ( ) , I ( X ; Y ) = sup I [ X ] P ; [ Y ] Q P , Q where the supremum is taken over all pairs of partitions P and Q of R . Similar to mutual information, information divergence can also be defined between two probability measures, no matter the probability distributions are discrete, continuous, etc. Remark : Although defining information measures in such a general way is nice, these definitions do not provide explicit ways to compute these information measures. 12 / 63 I-Hsiang Wang IT Lecture 5

Measures of Information for Continuous Random Variables Differential Entropy Measures of Information for Continuous Random Variables 1 Entropy and Mutual Information Differential Entropy 2 Channel Coding over Continuous Memoryless Channels Continuous Memoryless Channel Gaussian Channel Capacity 3 Lossy Source Coding for Continuous Memoryless Sources 13 / 63 I-Hsiang Wang IT Lecture 5

log log log Measures of Information for Continuous Random Variables Differential Entropy Differential Entropy For continuous r.v.'s, let's define the following counterparts of entropy and conditional entropy. Definition 2 (Differential entropy and conditional differential entropy) [ ] 1 The differential entropy of a continuous r.v. X with p.d.f. f X is defined as h ( X ) ≜ E if f X ( X ) the (improper) integral exists. The conditional differential entropy of a continuous r.v. X given Y , where ( X, Y ) has joint p.d.f. f X,Y and conditional p.d.f. f X | Y , is defined as [ ] 1 h ( X | Y ) ≜ E if the (improper) integral exists. f X | Y ( X | Y ) We have the following theorem immediately from the previous discussion: Theorem 1 (Mutual information between two continuous r.v.'s) [ ] f X,Y ( X,Y ) = h ( X ) − h ( X | Y ) . I ( X ; Y ) = E f X ( X ) f Y ( Y ) 14 / 63 I-Hsiang Wang IT Lecture 5

Measures of Information for Continuous Random Variables Differential Entropy Information Divergence Definition 3 (Information divergence for densities) The information divergence from density g ( x ) to f ( x ) is defined as [ ] ∫ log f ( X ) x ∈ supp f f ( x ) log f ( x ) D ( f ∥ g ) ≜ E X ∼ f = g ( x ) dx g ( X ) if the (improper) integral exists. By Jensen's inequality, it is straightforward to see that the non-negativity of KL divergence remains. Proposition 1 (Non-negativity of Information divergence) D ( f ∥ g ) ≥ 0 , with equality iff f = g almost everywhere (i.e., except for some points with zero probability) . Note : D ( f ∥ g ) is finite only if the support of f ( x ) is contained in the support of g ( x ) . 15 / 63 I-Hsiang Wang IT Lecture 5

Measures of Information for Continuous Random Variables Differential Entropy Properties that Extend to Continuous R.V.'s Proposition 2 (Chain rule) � h ( X n ) = ∑ n ( � X i − 1 ) h ( X, Y ) = h ( X ) + h ( Y | X ) , i =1 h X i . Proposition 3 (Conditioning reduces differential entropy) h ( X | Y ) ≤ h ( X ) , h ( X | Y, Z ) ≤ h ( X | Z ) . Proposition 4 (Non-negativity of mutual information) I ( X ; Y ) ≥ 0 , I ( X ; Y | Z ) ≥ 0 . 16 / 63 I-Hsiang Wang IT Lecture 5

Measures of Information for Continuous Random Variables Differential Entropy Examples Example 1 (Differential entropy of a uniform r.v.) 1 For a r.v. X ∼ Unif [ a, b ] , that is, its p.d.f. f X ( x ) = b − a 1 { a ≤ x ≤ b } , its differential entropy h ( X ) = log ( b − a ) . Example 2 (Differential entropy of N (0 , 1) ) 2 π e − x 2 1 2 , its differential entropy For a r.v. X ∼ N (0 , 1) , that is, its p.d.f. f X ( x ) = √ h ( X ) = 1 2 log (2 πe ) . 17 / 63 I-Hsiang Wang IT Lecture 5

Lecture 5 Continuous-Valued Sources and Channels I-Hsiang Wang - PowerPoint PPT Presentation

Lecture 5 Continuous-Valued Sources and Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 4, 2016 1 / 63 I-Hsiang Wang IT Lecture 5 From Discrete to Continuous So far we have

Many-Valued Logic Daniel Bonevac February 27, 2013 Daniel Bonevac Many-Valued Logic Rationales

Algebraic Study of Lattice-Valued Logic and Lattice-Valued Modal Logic Yoshihiro Maruyama

Shuffle algebra perspective on operator valued probability theory 30 mars 2020 1/25 Operator

VECTOR-VALUED FUNCTIONS MATH 200 MAIN QUESTIONS FOR TODAY Whats a vector valued function?

Side Channels and Covert Channels Daniel Bosk Department of Information and Communication Systems

Multiple Input and Output Channels Multiple Input and Output Channels Multiple Input Channels In

Lecture 7 Space- Multiplexed Channel Electrical Channels-2 Types of Electrical Channels

Integer-Forcing for Channels, Sources and ADCs Or Ordentlich Tel Aviv University Or Ordentlich

Chapter 6 Marks and Channels Vis/Visual Analytics, Chap 6 Marks/Channels 1 CGGM Lab., CS

Other Communication Channels Select channels of communication that will reach your audiences.

Information Theory Lecture 5 Continuous variables and Gaussian channels: CT89

Lecture 6 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical

Lecture 6 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical

Lecture 6 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical

3 Short review Lecture no: NARROW-BAND CHANNELS WIDE-BAND CHANNELS Radio signals and

Lecture 2 Capacity of Fading Gaussian Channels Slow fading channels: Ch. 5.4.14 Fast

Before we begin Paper summaries for today? Intro to Sampling Theory Announcement

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Generation of Concurrency Control Code using Discrete-Event Systems Theory Christopher Dragert,

CSEE 3827: Fundamentals of Computer Systems, Spring 2011 0. Course Overview Prof. Martha Kim

Discrete time systems - Properties Lecture 6 Systems and Control Theory STADIUS - Center for

Tracking an AR(1) Process with limited communication Rooji Jinan, Parimal Parag, Himanshu Tyagi

Origin of Ultra-high Energy Cosmic Rays: Some Perspectives of a Theorist 1. Cosmic Rays and the

Learning Discrete Structures for Graph Neural Networks Luca Franceschi , Mathias Niepert,