Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang - - PowerPoint PPT Presentation

lecture 5 channel coding over continuous channels
SMART_READER_LITE
LIVE PREVIEW

Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang - - PowerPoint PPT Presentation

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University


slide-1
SLIDE 1

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Lecture 5 Channel Coding over Continuous Channels

I-Hsiang Wang

Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw

November 14, 2014

1 / 34 I-Hsiang Wang NIT Lecture 5

slide-2
SLIDE 2

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

From Discrete-Valued to Continuous-Valued (1)

So far we have focused on the discrete-valued (& finite-alphabet) r.v’s: Entropy and mutual information for discrete-valued r.v’s. Lossless source coding for discrete stationary sources. Channel coding over discrete memoryless channels. In this lecture we extend the basic principles and fundamental theorems to continuous-valued sources and channels. In particular: Mutual information for continuous-valued r.v.’s. Lossy source coding for continuous stationary sources. Channel coding with input cost over continuous memoryless

  • channels. (Main example: Gaussian channel capacity).

We skip lossy source coding (rate distortion theory) in this course.

2 / 34 I-Hsiang Wang NIT Lecture 5

slide-3
SLIDE 3

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

From Discrete-Valued to Continuous-Valued (2)

Main technique for extending coding theorems from the discrete-valued, finite-alphabet world to the continuous-valued world: Discretization. Advantages: No need for new tools (eg., typicality) for continuous-valued r.v.’s. Extends naturally to multi-terminal settings – can focus on discrete memoryless networks. Outline:

1 Differential entropy 2 Channel coding with input cost over DMC 3 Gaussian channel capacity

3 / 34 I-Hsiang Wang NIT Lecture 5

slide-4
SLIDE 4

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Disclaimer: due to time constraint, we will not be 100% rigorous in deriving the results in this lecture. Instead, you can find rigorous treatment in the references. Reading:

1 Mutual information and differential entropy:

Chapter 8, Cover&Thomas [2] Chapter 15, Moser [5] Chapter 2.2, El Gamal&Kim [1]

2 Gaussian channel capacity:

Chapter 9, Cover&Thomas [2] Chapter 16, Moser [5] Chapter 3.3,3.4, El Gamal&Kim [1]

Remark: Using discretization to derive the achievability of Gaussian channel capacity follows [1]. [2] uses weak typicality for continuous random variables; [5] uses threshold decoder, similar to weak typicality in spirit.

4 / 34 I-Hsiang Wang NIT Lecture 5

slide-5
SLIDE 5

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

1 Mutual Information and Differential Entropy 2 Channel Coding with Input Cost 3 Gaussian Channel Capacity 4 Summary

5 / 34 I-Hsiang Wang NIT Lecture 5

slide-6
SLIDE 6

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Entropy of a Continuous Random Variable

Question: What is the entropy of a continuous real-valued random variable X ? Suppose X has the probability density function (p.d.f.) f (x). Let us discretize X to answer this question, as follows: Partition R into length-∆ intervals: R = ∪∞

k=−∞ [k∆, (k + 1)∆).

Suppose that f (x) is continuous, then by the mean-value theorem, ∀ k ∈ Z, ∃ xk ∈ [k∆, (k + 1)∆) such that f (xk) = 1

∫ (k+1)∆

k∆

f (x) dx. Set [X]∆ := xk if X ∈ [k∆, (k + 1)∆), with p.m.f. p (xk) = f (xk) ∆. Observation: lim∆→0 H ([X]∆) = H (X) (intuitively), while

H ( [X]∆ ) = −

k=−∞

(f (xk) ∆) log (f (xk) ∆) = −∆

k=−∞

f (xk) log f (xk) − log ∆ → − ∫ ∞

f (x) log f (x) dx + ∞ = ∞ as ∆ → 0

We conclude that H (X) = ∞ if − ∫ ∞

∞ f (x) log f (x) = E

[ log

1 f(X)

] exists.

6 / 34 I-Hsiang Wang NIT Lecture 5

slide-7
SLIDE 7

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Differential Entropy

It is quite intuitive that the entropy of a continuous random variable can be arbitrarily large, because it can take infinitely many possible values. Hence, in general it is impossible to losslessly compress a continuous source with finite rate. Instead, lossy source coding is done. Yet, for continuous r.v.’s, it turns out to be useful to define the counterparts of entropy and conditional entropy, as follows: Definition 1 (Differential entropy and conditional differential entropy) The differential entropy of a continuous r.v. X with p.d.f. f (x) is defined as h (X) := E [ log

1 f(X)

] if the (improper) integral exists. The conditional differential entropy of a continuous r.v. X given Y, where (X, Y) has joint p.d.f. f (x, y) and conditional p.d.f. f (x|y), is defined as h (X|Y) := E [ log

1 f(X|Y)

] if the (improper) integral exists.

7 / 34 I-Hsiang Wang NIT Lecture 5

slide-8
SLIDE 8

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Mutual Information between Continuous Random Variables

How about mutual information between two continuous real-valued random variables X and Y, with joint p.d.f. fX,Y (x, y) and marginal p.d.f.’s fX (x) and fY (y)? Again, we use discretization: Partition R2 plane into ∆ × ∆ squares: R2 = ∪∞

k,j=−∞ I∆ k × I∆ j ,

where I∆

k := [k∆, (k + 1)∆).

Suppose that fX,Y (x, y) is continuous, then by the mean-value theorem (MVT), ∀ k, j ∈ Z, ∃ (xk, yj) ∈ I∆

k × I∆ j

such that fX,Y (xk, yj) =

1 ∆2

I∆

k ×I∆ j fX,Y (x, y) dx dy.

Set ([X]∆ , [Y]∆) := ∑

k,j (xk, yj) I

{ (X, Y) ∈ I∆

k × I∆ j

} , with p.m.f. p (xk, yj) = fX,Y (xk, yj) ∆2. By MVT, ∀ k, j ∈ Z, ∃ xk ∈ I∆

k and

yj ∈ I∆

j

such that p (xk) := ∫

I∆

k fX (x) dx = fX (

xk) ∆, p (yj) := ∫

I∆

j fY (y) dy = fY (

yj) ∆.

8 / 34 I-Hsiang Wang NIT Lecture 5

slide-9
SLIDE 9

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Mutual Information between Continuous Random Variables

Observation: lim∆→0 I ([X]∆ ; [Y]∆) = I (X; Y) (intuitively), while

I ( [X]∆ ; [Y]∆ ) =

k,j=−∞

p (xk, yj) log p (xk, yj) p (xk) p (yj) =

k,j=−∞

( fX,Y (xk, yj) ∆2) log fX,Y (xk, yj)✚

∆2 fX ( xk) fY ( yj)✚

∆2 = ∆2

k,j=−∞

fX,Y (xk, yj) log fX,Y (xk, yj) fX ( xk) fY ( yj) → ∫ ∞

−∞

∫ ∞

−∞

fX,Y (x, y) log fX,Y (x, y) fX (x) fY (y) dx dy as ∆ → 0

Hence, I (X; Y) = E [ log

f(X,Y) f(X)f(Y)

] if the improper integral exists.

9 / 34 I-Hsiang Wang NIT Lecture 5

slide-10
SLIDE 10

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Mutual Information

Unlike entropy that is only well-defined for discrete random variables, in general we can define the mutual information between two real-valued random variables (no necessarily continuous or discrete) as follows. Definition 2 (Mutual information) The mutual information between two random variables X and Y is defined as I (X; Y) = sup

P,Q

I ( [X]P ; [Y]Q ) , where the supremum is taken over all pairs of partitions P and Q of R. We have the following theorem immediately from the previous discussion: Theorem 1 (Mutual information between two continuous r.v.’s) I (X; Y) := E [ log

f(X,Y) f(X)f(Y)

] = h (X) − h (X|Y).

10 / 34 I-Hsiang Wang NIT Lecture 5

slide-11
SLIDE 11

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Properties that Extend to Continuous R.V.’s

Proposition 1 (Chain rule) h (X, Y) = h (X) + h (Y|X) , h (Xn) =

n

i=1

h ( Xi|Xi−1) . Proposition 2 (Conditioning reduces differential entropy) h (X|Y) ≤ h (X) , h (X|Y, Z) ≤ h (X|Z) . Proposition 3 (Non-negativity of mutual information) I (X; Y) ≥ 0, I (X; Y|Z) ≥ 0.

11 / 34 I-Hsiang Wang NIT Lecture 5

slide-12
SLIDE 12

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

New Properties of Differential Entropy

Example 1 (Differential entropy of a uniform r.v.) For a r.v. X ∼ Unif [a, b], that is, its p.d.f. fX (x) =

1 b−aI {a ≤ x ≤ b},

its differential entropy h (X) = log (b − a). Differential entropy can be negative. Since b − a can be made arbitrarily small, h (X) = log (b − a) can be

  • negative. Hence, the non-negative property of entropy cannot be

extended to differential entropy. Scaling will change the differential entropy. Consider X ∼ Unif [0, 1]. 2X ∼ Unif [0, 2]. Hence, h (X) = log 1 = 0, h (2X) = log 2 = 1 = ⇒ h (X) ̸= h (2X) . This is in sharp contrast to entropy: H (X) = H (g (X)) as long as g (·) is an invertible function.

12 / 34 I-Hsiang Wang NIT Lecture 5

slide-13
SLIDE 13

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Scaling and Translation

Proposition 4 Let X be a continuous random variable with differential entropy h (X). Translation does not change the differential entropy: For a constant c, h (X + c) = h (X). Scaling shift the differential entropy: For a constant a ̸= 0, h (aX) = h (X) + log |a|. Proposition 5 Let X be a continuous random vector with differential entropy h (X). For a constant vector c, h (X + c) = h (X). For an invertible matrix A ∈ Rn×n, h (AX) = h (X) + log|det A|. The proof of these propositions are left as exercises.

13 / 34 I-Hsiang Wang NIT Lecture 5

slide-14
SLIDE 14

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Differential Entropy of Gaussian Random Vectors

Example 2 (Differential entropy of N (0, 1)) For a r.v. X ∼ N (0, 1), that is, its p.d.f. fX (x) =

1 √ 2πe− x2

2 , its

differential entropy h (X) = 1

2 log (2πe).

For a n-dim random vector X ∼ N (m, K), we can rewrite X as X = AW + m, where AAT = K and W consists of i.i.d. Wi ∼ N (0, 1), i = 1, . . . , n. Hence, by the translation and scaling properties of differential entropy: h (X) = h (W) + log|det A| =

n

i=1

h (Wi) + 1 2 log det K = n 2 log (2πe) + 1 2 log det K = 1 2 log (2πe)n det K

14 / 34 I-Hsiang Wang NIT Lecture 5

slide-15
SLIDE 15

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Kullback-Leibler Divergence

Definition 3 (KL divergence between densities) The Kullback-Leibler divergence between two probability density functions f (x) and g (x) is defined as D (f ||g) := E [ log f(X)

g(X)

] if the (improper) integral exists. The expectation is taken over r.v. X ∼ f (x). By Jensen’s inequality, it is straightforward to see that the non-negativity

  • f KL divergence remains.

Proposition 6 (Non-negativity of KL divergence) D (f ||g) ≥ 0, with equality iff f = g almost everywhere (i.e., except for some points with zero probability). Note: D (f ||g) is finite only if the support of f (x) is contained in the support of g (x).

15 / 34 I-Hsiang Wang NIT Lecture 5

slide-16
SLIDE 16

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Maximum Differential Entropy

Theorem 2 (Maximum Differential Entropy under Covariance Constraint) Let X be a random vector with mean m and covariance matrix E [ (X − m) (X − m)T] = K, and XG be Gaussian with the same covariance K. Then, h (X) ≤ h ( XG) = 1

2 log (2πe)n det K.

pf: First, we can assume WLOG that both X and XG are zero-mean. Let the p.d.f. of X be f (x) and the p.d.f. of XG be fG (x). Hence, 0 ≤ D (f ||fG) = E [log f (X)] − E [log fG (X)] = −h (X) − Ef [log fG (X)] Note: log fG (x) is a quadratic function of x, and X, XG have the same second moment. Hence, Ef [log fG (X)] = EfG [log fG (X)] = −h ( XG) , and 0 ≤ D (f ||fG) = −h (X) + h ( XG) = ⇒ h (X) ≤ h ( XG) .

16 / 34 I-Hsiang Wang NIT Lecture 5

slide-17
SLIDE 17

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

1 Mutual Information and Differential Entropy 2 Channel Coding with Input Cost 3 Gaussian Channel Capacity 4 Summary

17 / 34 I-Hsiang Wang NIT Lecture 5

slide-18
SLIDE 18

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Input Cost

Our goal is to study channel coding over continuous memoryless channel. However, without any constraint on the channel input, it is quite easy to see that infinite amount of information can be delivered over the channel. To make the problem valid, we shall impose a constraint on the input cost of the communication. Let us begin by defining the cost function. Definition 4 (Input cost function) A non-negative input cost function b : X → [0, ∞) is defined over the input alphabet X of a DMC ( X, pY|X, Y ) . We can shift b such that there exists a symbol xo ∈ X with b (xo) = 0. WLOG assume the existence of such zero-cost symbol xo ∈ X. Average input cost constraint for channel coding: for N channel uses, the average input cost 1

N

∑N

i=1 b (xi) ≤ B.

18 / 34 I-Hsiang Wang NIT Lecture 5

slide-19
SLIDE 19

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Channel Coding with Input Cost over DMC: Problem Setup

Channel Encoder Channel Decoder

xN yN

Noisy Channel

w b w

1 A

( 2NR, N, B ) channel code consists of

an encoding function (encoder) encN : [1 : 2K] → X N that maps each message w to a length N codeword xN, where K := ⌈NR⌉. The codeword follows the input cost constraint

1 N

∑N

i=1 b (xi) ≤ B.

a decoding function (decoder) decN : YN → [1 : 2K] ∪ {∗} that maps a channel output yN to a reconstructed message w or an error ∗.

2 The error probability is defined as P(N) e

:= Pr { W ̸= W } .

3 A rate R is said to be achievable with input cost B if there exist a

sequence of ( 2NR, N, B ) codes such that P(N)

e

→ 0 as N → ∞. The channel capacity is defined as C(B) := sup {R | R : achievable}.

19 / 34 I-Hsiang Wang NIT Lecture 5

slide-20
SLIDE 20

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Channel Coding Theorem with Average Input Cost

Theorem 3 (Channel Coding Theorem for DMC with Average Input Cost) The capacity of the DMC p (y|x) with input cost B is given by C (B) = max

p(x): E[b(X)]≤BI (X; Y) .

(1) Compared to that without input cost constraint, the additional constraint in the extremal problem (1) is on the expected cost E [b (X)]. The capacity-cost function C (B) is non-decreasing, concave, and continuous in B (exercise).

20 / 34 I-Hsiang Wang NIT Lecture 5

slide-21
SLIDE 21

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Converse Proof

pf: Following the converse proof of DMC without input cost, we arrive at R − ϵN ≤ 1 N

N

k=1

I (Xk; Yk) , where ϵN → 0 as N → ∞. By the definition of C (B), ∀ k ∈ [1 : N], I (Xk; Yk) ≤ C (E [b (Xk)]). Let Bk := E [b (Xk)]. Hence, R − ϵN ≤ 1 N

N

k=1

C (Bk)

(a)

≤ C ( 1 N

N

k=1

Bk )

(b)

≤ C (B) . (a) is due to concavity of C (B) in B. (b) is due to Bk ≤ B for all k ∈ [1 : N], and the fact that C (B) is non-decreasing in B. Hence, for all R is achievable, R ≤ C (B).

21 / 34 I-Hsiang Wang NIT Lecture 5

slide-22
SLIDE 22

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Achievability Proof (1)

Achievability proof mostly follows that of DMC without input cost. Twist: in random codebook generation, how to ensure all codewords satisfy the input cost constraint? Random Codebook Generation and Encoding: Generate the random codebook C i.i.d. randomly according to pX (x) = arg max

p(x): E[b(X)]≤

B 1+ϵ

I (X; Y) . In other words, we step back and generate random codewords with a slightly smaller average cost

B 1+ϵ.

If the generate xN ∈ T (N)

ϵ

(X), then by Problem 6(a) of Quiz #1, we have 1

N

∑N

i=1 b (xi) ≤ (1 + ϵ)E [b (X)] = (1 + ϵ) B 1+ϵ = B.

If the generate xN / ∈ T (N)

ϵ

(X), it may violates the input constraint, and hence we send the zero-cost codeword [xo · · · xo ] instead.

22 / 34 I-Hsiang Wang NIT Lecture 5

slide-23
SLIDE 23

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Achievability Proof (2)

Error Probability Analysis: Following the same line in that of DMC without input cost constraint, we arrive at upper bounding Pr {E|W = 1} := P1 (E), where E = Ac

1 ∪ (∪w̸=1Aw) ,

Aw := {( XN (w) , YN) ∈ T (N)

ϵ

(X, Y) } . Upper bounding P1 (Aw) for w ̸= 1 remains the same: since the actual message W = 1, whether or not the codeword XN (1) violates the cost constraint is not relevant. Upper bounding P1 (Ac

1) requires slight modification, regarding whether

  • r not XN (1) ∈ T (N)

ϵ

(X): P1 (Ac

1) ≤ P1

{( XN (1) , YN) / ∈ T (N)

ϵ

(X, Y) , XN (1) ∈ T (N)

ϵ

(X) } + P1 { XN (1) / ∈ T (N)

ϵ

(X) } ≤ 2ϵ, for N sufficiently large.

23 / 34 I-Hsiang Wang NIT Lecture 5

slide-24
SLIDE 24

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Achievability Proof (3)

Hence, we conclude that for any R < C (

B 1+ϵ

) , R is achievable. Finally, since C (B) is continuous in B, we can make C (

B 1+ϵ

) arbitrarily close to C (B), and hence for any R < C (B), R is achievable. Exercise 1 Show that C (B) = max

p(x): E[b(X)]≤BI (X; Y)

is non-decreasing, concave, and continuous in B.

24 / 34 I-Hsiang Wang NIT Lecture 5

slide-25
SLIDE 25

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

1 Mutual Information and Differential Entropy 2 Channel Coding with Input Cost 3 Gaussian Channel Capacity 4 Summary

25 / 34 I-Hsiang Wang NIT Lecture 5

slide-26
SLIDE 26

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Additive White Gaussian Noise (AWGN) Channel

Channel Encoder Channel Decoder

xN yN w b w zN

1 Input/output alphabet X = Y = R. 2 AWGN Channel:

Conditional p.d.f. fY|X is given by Y = X + Z, Z ∼ N ( 0, σ2) ⊥ ⊥ X. {Zk} form an i.i.d. (white) Gaussian r.p. with Zk ∼ N ( 0, σ2) , ∀ k. Memoryless: Zk ⊥ ⊥ ( W, Xk−1, Zk−1) . Without feedback: ZN ⊥ ⊥ XN.

3 Average input power constraint P: 1 N

∑N

k=1|xk|2 ≤ P.

26 / 34 I-Hsiang Wang NIT Lecture 5

slide-27
SLIDE 27

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Channel Coding Theorem for Gaussian Channel

Theorem 4 (Gaussian Channel Capacity) The capacity of the AWGN channel with input power constraint P and noise variance σ2 is given by C = sup

f(x): E[|X|2]≤P

I (X; Y) = 1 2 log ( 1 + P σ2 ) . (2) Compared (1) in DMC with input cost constraint, in (2) the max

  • peration is replaced by the sup operation.

For the AWGN channel, the supremum is actually attainable with Gaussian input distribution f (x) =

1 √ 2πPe− x2

2P , i.e., X ∼ N (0, P).

The capacity-power function C (P) is non-decreasing, concave, and left-continuous in P (exercise).

27 / 34 I-Hsiang Wang NIT Lecture 5

slide-28
SLIDE 28

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Converse Proof and Evaluation of Capacity

Converse proof: Following the same line of proof, we arrive at R is achievable = ⇒ R ≤ sup

f(x): E[|X|2]≤P

I (X; Y) . Capacity evaluation: It is straightforward to see that the optimizing input X should have zero mean. Note that since Z ⊥ ⊥ X, I (X; Y) = h (Y) − h (Y|X) = h (Y) − h (X + Z|X) = h (Y) − h (Z|X) = h (Y) − h (Z) = h (Y) − 1 2 log (2πe) σ2

(a)

≤ 1 2 log (2πe) ( P + σ2) − 1 2 log (2πe) σ2 = 1 2 log ( 1 + P σ2 ) (a): ∵ h (Y) ≤ 1

2 log (2πe) E

[ Y2] , E [ Y2] = E [ X2] + E [ Z2] ≤ P + σ2.

28 / 34 I-Hsiang Wang NIT Lecture 5

slide-29
SLIDE 29

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Achievability Proof (1)

The proof of achievability for this continuous-valued channel makes use

  • f discretization, so that we can apply the result in DMC with input cost.

Discretization: here we use a different way to discretize the channel. ∀ m ∈ N, let Qm := {

l √m : l = 0, ±1, . . . , ±m

} be the set of quantized points. For any r ∈ R, quantize r to the closest point [r]m ∈ Qm such that |[r]m| ≤ |r|. Let us use the above quantizer to discretize the channel: Quantize the channel input X to [X]m. Let Y(m) := [X]m + Z denote the channel output corresponding to the quantized channel input [X]m. Note: Y(m) is continuous-valued. Quantize the channel output Y(m) to [ Y(m)]

n.

29 / 34 I-Hsiang Wang NIT Lecture 5

slide-30
SLIDE 30

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Achievability Proof (2)

Now we have a DMC with input [X]m and output [ Y(m)]

n.

Note that for any p.d.f. fX (x) with E [ |X|2] ≤ P, the quantized [X]m also satisfies the power constraint: E [ | [X]m |2] ≤ E [ |X|2] ≤ P. Hence, by the achievability result of DMC with input cost constraint, any R < I ( [X]m ; [ Y(m)]

n

) (evaluated under fX (x) =

1 √ 2πPe− x2

2P ) is indeed

achievable for the discretized channel under power constraint P. The only thing left to be shown is that, I ( [X]m ; [ Y(m)]

n

) can be made arbitrarily close to I (X; Y) = 1

2 log

( 1 + P

σ2

) as m, n → ∞.

30 / 34 I-Hsiang Wang NIT Lecture 5

slide-31
SLIDE 31

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Achievability Proof (3)

First, due to data processing inequality, since [X]m − Y(m) − [ Y(m)]

n, we

have I ( [X]m ; [ Y(m)]

n

) ≤ I ( [X]m ; Y(m)) = h ( Y(m)) − h (Z). Since E [ |Y(m)|2] ≤ P + σ2, we have h ( Y(m)) ≤ 1

2 log

( 2πe(P + σ2) ) , and hence I ( [X]m ; [ Y(m)]

n

) ≤ 1 2 log ( 1 + P σ2 ) . Second, for the lower bound, Lemma 3.2. in El Gamal&Kim [1] gives the following: lim inf

m→∞ lim n→∞ I

( [X]m ; [ Y(m)]

n

) ≥ 1 2 log ( 1 + P σ2 ) . Combining the above, proof is complete.

31 / 34 I-Hsiang Wang NIT Lecture 5

slide-32
SLIDE 32

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Lesson Learned

Discretization may not be the simplest way to obtain achievability results. Instead, such discretization shows how to extend the coding theorem for a DMC to a Gaussian or any other well-behaved continuous-valued channel. Similar procedures can be used to extend coding theorems for finite-alphabet multiuser channels to their Gaussian counterparts. Hence, later we may skip formal proofs of such extensions.

32 / 34 I-Hsiang Wang NIT Lecture 5

slide-33
SLIDE 33

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

1 Mutual Information and Differential Entropy 2 Channel Coding with Input Cost 3 Gaussian Channel Capacity 4 Summary

33 / 34 I-Hsiang Wang NIT Lecture 5

slide-34
SLIDE 34

Mutual Information and Differential Entropy Channel Coding with Input Cost Gaussian Channel Capacity Summary

Mutual information between two continuous r.v.’s X and Y with joint density fX,Y: I (X; Y) = E [ log fX,Y(X,Y)

fX(X)fY(Y)

] . Differential entropy and conditional differential entropy: h (X) := E [ log

1 fX(X)

] , h (X|Y) := E [ log

1 fX|Y(X|Y)

] . I (X; Y) = h (X) − h (X|Y) = h (Y) − h (Y|X). KL divergence between densities f and g: D (f ||g) := Ef [ log f(X)

g(X)

] . Chain rule, conditioning reduces differential entropy, non-negativity

  • f mutual information and KL divergence: remain to hold.

Differential entropy can be negative; h (X) ≤h (X, Y). Channel capacity with input cost constraint: DMC: C (B) = maxp(x): E[b(X)]≤BI (X; Y). Continuous-valued: C (B) = supf(x): E[b(X)]≤BI (X; Y). Gaussian channel capacity: C (P) = 1

2 log

( 1 + P

σ2

) .

34 / 34 I-Hsiang Wang NIT Lecture 5