[PPT] - Lecture 6 Channel Coding over Continuous Channels I-Hsiang Wang PowerPoint Presentation

SLIDE 1

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel

Lecture 6 Channel Coding over Continuous Channels

I-Hsiang Wang

Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw

November 10, 2015

1 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 2

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel

We have investigated the measures of information for continuous r.v.’s: The amount of uncertainty (entropy) is mostly infinite. Mutual information and KL divergences are well defined. Differential entropy is a useful entity to compute and manage measures of information for continuous r.v.’s. Question: How about coding theorems? Is there a general way or framework to extend coding theorems from discrete (memoryless) sources/channels to continuous (memoryless) sources/channels?

2 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 3

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel

C (B) = sup

X: E[b(X)]≤B

I (X ; Y ) .

Channel Encoder Channel Decoder Channel

xN yN w b w pY |X Discrete Memoryless Channel C (B) = max

X: E[b(X)]≤BI (X ; Y ) .

Channel Encoder Channel Decoder Channel

xN yN w b w Continuous Memoryless Channel fY |X

?

3 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 4

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel

Coding Theorems: from Discrete to Continuous (1)

Two main techniques for extending the achievability part of coding theorems from the discrete world to the continuous world:

1 Discretization: Discretize the source and channel input/output to

create a discrete system, and then make the discretization finer and finer to prove the achievability.

2 New typicality: Extend weak typicality for continuous r.v. and

repeat the arguments in a similar way. In particular, replace the entropy terms in the definitions of weakly typical sequences by differential entropy terms. Using discretization to derive the achievability of Gaussian channel capacity follows Gallager[2] and El Gamal&Kim[6]. Cover&Thomas[1] and Yeung[5] use weak typicality for continuous r.v.’s. Moser[4] uses threshold decoder, similar to weak typicality.

4 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 5

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel

Coding Theorems: from Discrete to Continuous (2)

In this lecture, we use discretization for the achievability proof. Pros: No need for new tools (eg., typicality) for continuous r.v.’s. Extends naturally to multi-terminal settings – can focus on discrete memoryless networks. Cons: Technical; not much insight on how to achieve capacity. Hence, we use a geometric argument to provide insights on how to achieve capacity. Disclaimer: We will not be 100% rigorous in deriving the results in this

lecture. Instead, you can find rigorous treatment in the references.

5 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 6

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel

Outline

1 First, we formulate the channel coding problem over continuous

memoryless channels (CMC), state the coding theorem, and sketch the converse and achievability proofs.

2 Second, we introduce additive Gaussian noise (AGN) channel, derive

the Gaussian channel capacity, and provide insights based on geometric arguments.

3 We then explore extensions, including parallel Gaussian channels and

correlated Gaussian channels, and continuous-time bandlimited Gaussian channels.

6 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 7

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

1 Channel Coding over Continuous Memoryless Channels

Continuous Memoryless Channel Gaussian Channel Capacity

2 Parallel Gaussian Channel

Parallel Channel with Independent Noises Parallel Channel with Colored Noises

3 Bandlimited Gaussian Channel

Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

7 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 8

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

1 Channel Coding over Continuous Memoryless Channels

Continuous Memoryless Channel Gaussian Channel Capacity

2 Parallel Gaussian Channel

Parallel Channel with Independent Noises Parallel Channel with Colored Noises

3 Bandlimited Gaussian Channel

Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

8 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 9

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Continous Memoryless Channel

Channel Encoder Channel Decoder Channel

xN yN w b w fY |X

1 Input/output alphabet X = Y = R. 2 Continuous Memoryless Channel (CMC):

Channel Law: Governed by the conditional density (p.d.f.) fY|X. Memoryless: Yk − Xk − ( Xk−1, Yk−1) .

3 Average input cost constraint B: 1 N

∑N

k=1 b (xk) ≤ B, where

b : R → [0, ∞) is the (single-letter) cost function. The definitions of error probability, achievable rate, and capacity, are the same as those in channel coding over DMC.

9 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 10

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Channel Coding Theorem

Theorem 1 (Continuous Memoryless Channel Capacity) The capacity of the CMC ( R, fY|X, R ) with input cost constraint B is C = sup

X: E[b(X)]≤B

I (X ; Y ) . (1) Note: The input distribution of the r.v. X needs not to have a density. In other words, it could also be discrete.

How to compute h (Y |X ) when X has no density? Recall h (Y |X ) = EX [ − ∫

suppY f (y|X) log f (y|X) dy

] , where f (y|x) is the conditional density of Y given X.

Converse proof: Exactly the same as that in the DMC case.

10 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 11

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Sketch of the Achievability (1): Discretization

ENC

xN yN w b w

DEC

fY |X

The proof of achievability makes use of discretization, so that one can apply the result in DMC with input cost:

11 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 12

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Sketch of the Achievability (1): Discretization

ENC

w b w

DEC

fY |X Qin Qout

The proof of achievability makes use of discretization, so that one can apply the result in DMC with input cost: Qin: (single-letter) discretization that maps X ∈ R to Xd ∈ Xd. Qout: (single-letter) discretization that maps Y ∈ R to Yd ∈ Yd. Note that both Xd and Yd are discrete (countable) alphabets.

12 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 13

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Sketch of the Achievability (1): Discretization

ENC

w b w Qin Qout

DEC

fY |X

New ENC Equivalent DMC

The proof of achievability makes use of discretization, so that one can apply the result in DMC with input cost: Qin: (single-letter) discretization that maps X ∈ R to Xd ∈ Xd. Qout: (single-letter) discretization that maps Y ∈ R to Yd ∈ Yd. Note that both Xd and Yd are discrete (countable) alphabets. Idea: With the two discretization blocks Qin and Qout, one can build an equivalent DMC ( Xd, pYd|Xd, Yd ) as shown above.

13 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 14

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Sketch of the Achievability (2): Arguments

Equivalent DMC New ENC

w b w

DEC

pYd|Xd xN

d

yN

d

Qin Qout

1 Random codebook generation: Generate the codebook randomly

based on the original (continuous) r.v. X, satisfying E [b (X)] ≤ B.

2 Choice of discretization: Choose Qin such that the cost constraint

will not be violated after discretization. Specifically, E [b (Xd)] ≤ B.

3 Achievability in the equivalent DMC: By the achievability part of the

channel coding theorem for DMC with input constraint, any rate R < I (Xd ; Yd ) is achievable.

4 Achievability in the original CMC: Prove that when the discretization

in Qin and Qout gets finer and finer, I (Xd ; Yd ) → I (X ; Y ).

14 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 15

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

1 Channel Coding over Continuous Memoryless Channels

Continuous Memoryless Channel Gaussian Channel Capacity

2 Parallel Gaussian Channel

Parallel Channel with Independent Noises Parallel Channel with Colored Noises

3 Bandlimited Gaussian Channel

Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

15 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 16

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Additive White Gaussian Noise (AWGN) Channel

Channel Encoder Channel Decoder

xN yN w b w zN

1 Input/output alphabet X = Y = R. 2 AWGN Channel:

Conditional p.d.f. fY|X is given by Y = X + Z, Z ∼ N ( 0, σ2) ⊥ ⊥ X. {Zk} form an i.i.d. (white) Gaussian r.p. with Zk ∼ N ( 0, σ2) , ∀ k. Memoryless: Zk ⊥ ⊥ ( W, Xk−1, Zk−1) . Without feedback: ZN ⊥ ⊥ XN.

3 Average input power constraint P: 1 N

∑N

k=1|xk|2 ≤ P.

16 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 17

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Channel Coding Theorem for Gaussian Channel

Theorem 2 (Gaussian Channel Capacity) The capacity of the AWGN channel with input power constraint P and noise variance σ2 is given by C = sup

X: E[|X|2]≤P

I (X ; Y ) = 1

2 log

( 1 + P

σ2

) . (2) Note: For the AWGN channel, the supremum is actually attainable with Gaussian input X ∼ N (0, P), that is, the input has density fX (x) =

1 √ 2πPe− x2

2P , as shown in the next slide. 17 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 18

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Evaluation of Capacity

Let us compute the capacity of AWGN channel (2) as follows: I (X ; Y ) = h (Y ) − h (Y |X ) = h (Y ) − h (X + Z |X ) = h (Y ) − h (Z |X ) = h (Y ) − h (Z ) (since Z ⊥ ⊥ X) = h (Y ) − 1 2 log (2πe) σ2

(a)

≤ 1 2 log (2πe) ( P + σ2) − 1 2 log (2πe) σ2 = 1 2 log ( 1 + P σ2 ) Here (a) is due to the fact that h (Y ) ≤ 1

2 log (2πe) Var [Y] and

Var [Y] = Var [X] + Var [Z] ≤ P + σ2, since Var [X] ≤ E [ X2] ≤ P. Finally, note that the above inequalities hold with equality when X ∼ N (0, P).

18 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 19

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Achievability Proof (1): Discretization

Here we use a simple quantizer as follows to construct the discretization blocks Qin and Qout: ∀ m ∈ N, let Qm := {

l √m : l = 0, ±1, . . . , ±m

} be the set of quantized points. For any r ∈ R, quantize r to the closest point [r]m ∈ Qm such that |[r]m| ≤ |r|. Discretization: For two given m, n ∈ N, define Channel input discretization: Qin (·) = [·]m. Channel output discretization: Qout (·) = [·]n In other words, Xd = Qm, Yd = Qn, Xd = [X]m, and Yd = [Xd + Z]n = [ [X]m + Z ]

n .

19 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 20

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Achievability Proof (2): Equivalent DMC

Now we have an equivalent DMC with Input Xd = [X]m Output Yd = [ Y(m)]

n, where Y(m) ≜ [X]m + Z.

Note that for any original input r.v. X with E [ |X|2] ≤ P, the discretized [X]m also satisfies the power constraint: E [ [X]m

2]

≤ E [ |X|2] ≤ P. Hence, by the achievability result of DMC with input cost constraint, any R < I ( [X]m ; [ Y(m)]

n

)

(evaluated under fX (x) =

1 √ 2πPe− x2

2P )

is indeed achievable for the equivalent DMC under power constraint P. The only thing left to be shown is that, I ( [X]m ; [ Y(m)]

n

) can be made arbitrarily close to I (X ; Y ) = 1

2 log

( 1 + P

σ2

) as m, n → ∞.

20 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 21

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Achievability Proof (3): Convergence

Due to data processing inequality and [X]m − Y(m) − [ Y(m)]

n, we have

I ( [X]m ; [ Y(m)]

n

) ≤ I ( [X]m ; Y(m) ) = h ( Y(m) ) − h (Z ) . Since Var [ Y(m)] ≤ P + σ2, we have h ( Y(m) ) ≤ 1

2 log

( 2πe(P + σ2) ) , and hence the upper bound I ( [X]m ; [ Y(m)]

n

) ≤ 1

2 log

( 1 + P

σ2

) . For the lower bound, we would like to prove lim inf

m→∞

lim

n→∞ I

( [X]m ; [ Y(m)]

n

) ≥ 1

2 log

( 1 + P

σ2

) . We skip the details here; see Appendix 3A of El Gamal&Kim[6].

21 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 22

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Geometric Intuition: Sphere Packing

y = x + z RN

p N(P + σ2)

By LLN, as N → ∞, most output y (yN) will lie inside the N-dimensional sphere of radius √ N (P + σ2).

22 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 23

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Geometric Intuition: Sphere Packing

y = x + z RN

p N(P + σ2) √ Nσ2

By LLN, as N → ∞, most output y (yN) will lie inside the N-dimensional sphere of radius √ N (P + σ2). Also by LLN, as N → ∞, y will lie near the surface of the N-dimensional sphere centered at x with radius √ Nσ2.

23 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 24

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Geometric Intuition: Sphere Packing

y = x + z RN

p N(P + σ2) √ Nσ2

By LLN, as N → ∞, most output y (yN) will lie inside the N-dimensional sphere of radius √ N (P + σ2). Also by LLN, as N → ∞, y will lie near the surface of the N-dimensional sphere centered at x with radius √ Nσ2. Vanishing error probability criterion = ⇒ non-overlapping spheres. Question: How many non-overlapping spheres can be packed into the large sphere? Maximum # of non-overlapping spheres = Maximum # of codewords that can be reliably delivered.

24 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 25

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Geometric Intuition: Sphere Packing

y = x + z RN

p N(P + σ2) √ Nσ2

Back-of-envelope calculation: 2NR ≤ √

N(P+σ2)

N

√ Nσ2N

= ⇒ R ≤ 1

N log

(√

N(P+σ2)

N

√ Nσ2N

) =

1 2 log

( 1 + P

σ2

) Hence, intuitively any achievable rate R cannot exceed C = 1 2 log ( 1 + P σ2 ) .

How to achieve it?

25 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 26

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Achieving Capacity via Good Packing

√ NP

x-sphere

x1 x2

Random codebook generation: Generate 2NR N-dim. vectors (codewords) {x1, . . . , x2NR} lying in the “x-sphere” of radius √ NP.

26 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 27

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Achieving Capacity via Good Packing

√ NP

x-sphere

x1 αy x2

Random codebook generation: Generate 2NR N-dim. vectors (codewords) {x1, . . . , x2NR} lying in the “x-sphere” of radius √ NP. Decoding: α ≜

P P+σ2 (MMSE coeff.)

y → MMSE → αy → Nearest Neighbor → x

27 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 28

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Achieving Capacity via Good Packing

√ NP

x-sphere

r N Pσ2 P + σ2 x1 αy x2

Random codebook generation: Generate 2NR N-dim. vectors (codewords) {x1, . . . , x2NR} lying in the “x-sphere” of radius √ NP. Decoding: α ≜

P P+σ2 (MMSE coeff.)

y → MMSE → αy → Nearest Neighbor → x

By LLN, we have ∥αy − x1∥2 = ∥αz + (α − 1)x1∥2 ≈ α2Nσ2 + (α − 1)2NP = N Pσ2

P+σ2

28 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 29

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Achieving Capacity via Good Packing

√ NP

x-sphere

r N Pσ2 P + σ2 x1 αy x2

Performance analysis: When does an error occur? When another codeword, say, x2, falls inside the uncertainty sphere centered at αy. What is that probability? It is the ratio of the volumes of the two spheres! P {x1 → x2} = √

NPσ2/(P+σ2)

N

√ NP

N

= (

σ2 P+σ2

)N/2

29 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 30

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Achieving Capacity via Good Packing

√ NP

x-sphere

r N Pσ2 P + σ2 x1 αy x2

By the Union of Events Bound, the total probability of error P {E} ≤ 2NR (

σ2 P+σ2

)N/2 = 2

N ( R+ 1

2 log

(

1 1+ P σ2

))

, which vanishes as N → ∞ if R < 1

2 log

( 1 + P

σ2

) . Hence, any R < 1

2 log

( 1 + P

σ2

) is achievable.

30 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 31

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Practical Relevance of the Gaussian Noise Model

In communication engineering, the additive Gaussian noise is the most widely used model for a noisy channel with real (complex) input/output. Reasons:

1 Gaussian is a good model for noise that consists of many small

perturbations, due to Central Limit Theorem.

2 Analytically Gaussian is highly tractable. 3 Consider a input-power-constrained channel with independent

additive noise. Within the family of noise distributions that have the same noise variances, Gaussian noise is the worst case noise. The last point is important – it suggests that for a additive-noise-channel with input power constraint P and noise variance σ2, its capacity is lower bounded by the Gaussian channel capacity 1

2 log

( 1 + P

σ2

) .

31 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 32

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Gaussian Noise is the Worst-Case Noise

Proposition 1 Consider a Gaussian r.v. XG ∼ N (0, P) and Y = XG + Z, where Z has density fZ (z), variance Var [Z] = σ2 and Z ⊥ ⊥ XG. Then, I ( XG ; Y ) ≥ 1

2 log

( 1 + P

σ2

) . With Proposition 1, we immediately obtain the following theorem: Theorem 3 (Gaussian is the Worst-Case Additive Noise) Consider a CMC fY|X: Y = X + Z, Z ⊥ ⊥ X, with input power constraint P and noise variance σ2. The additive noise has density. Then, the capacity C is minimized when Z ∼ N ( 0, σ2) and C ≥ CG ≜ 1

2 log

( 1 + P

σ2

) . pf: C ≥ I ( XG ; XG + Z ) ≥ 1

2 log

( 1 + P

σ2

) .

32 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 33

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Proof of Proposition 1

Let ZG ∼ N ( 0, σ2) , and denote YG ≜ XG + ZG. We aim to prove I ( XG ; Y ) ≥ I ( XG ; YG ) . First note that I ( XG ; Y ) = h (Y ) − h (Z ) does not change if we shift Z by a constant. Hence, WLOG assume E [Z] = 0. Since both XG and Z are zero-mean, so does Y. Note that YG ∼ N ( 0, P + σ2) and ZG ∼ N ( 0, σ2) . Hence, h ( YG ) = EYG [ − log fYG ( YG)] = 1

2 log

( 2π(P + σ2) ) +

log e 2(P+σ2)EYG

[( YG)2] = 1

2 log

( 2π(P + σ2) ) +

log e 2(P+σ2)EY

[ (Y)2] = EY [− log fYG (Y)]

33 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 34

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

The key in the above is to realize that Y and YG has the same variance. Similarly, h ( ZG ) = EZ [− log fZG (Z)]. Therefore, I ( XG ; YG ) − I ( XG ; Y ) = { h ( YG ) − h (Y ) } − { h ( ZG ) − h (Z ) } = {EY [− log fYG (Y)] − EY [− log fY (Y)]} − {EZ [− log fZG (Z)] − EZ [− log fZ (Z)]} = EY [ log fY(Y)

fYG(Y)

] − EZ [ log fZ(Z)

fZG(Z)

] = EY,Z [ log

fY(Y)fZG(Z) fYG(Y)fZ(Z)

] ≤ log ( EY,Z [ fY(Y)fZG(Z)

fYG(Y)fZ(Z)

]) .

(Jensen’s Inequality)

To finish the proof, we shall prove that EY,Z [ fY(Y)fZG(Z)

fYG(Y)fZ(Z)

] = 1.

34 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 35

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Continuous Memoryless Channel Gaussian Channel Capacity

Let us calculate EY,Z [ fY(Y)fZG(Z)

fYG(Y)fZ(Z)

] as follows: EY,Z [fY (Y) fZG (Z) fYG (Y) fZ (Z) ] = ∫ ∫ fY,Z (y, z) fY (y) fZG (z) fYG (y) fZ (z) dz dy = ∫ ∫ fZ (z) fXG (y − z) fY (y) fZG (z) fYG (y) fZ (z) dz dy

(∵ Y = XG + Z)

= ∫ ∫ [fXG (y − z) fZG (z)] fY (y) fYG (y) dz dy = ∫ ∫ fYG,ZG (y, z) fY (y) fYG (y) dz dy

(∵ Y = XG + Z)

= ∫ fY (y) fYG (y) (∫ fYG,ZG (y, z) dz ) dy = ∫ fY (y) fYG (y)fYG (y) dy = ∫ fY (y) dy = 1. Hence, the proof is complete.

35 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 36

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

1 Channel Coding over Continuous Memoryless Channels

Continuous Memoryless Channel Gaussian Channel Capacity

2 Parallel Gaussian Channel

Parallel Channel with Independent Noises Parallel Channel with Colored Noises

3 Bandlimited Gaussian Channel

Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

36 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 37

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

1 Channel Coding over Continuous Memoryless Channels

Continuous Memoryless Channel Gaussian Channel Capacity

2 Parallel Gaussian Channel

Parallel Channel with Independent Noises Parallel Channel with Colored Noises

3 Bandlimited Gaussian Channel

Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

37 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 38

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Motivation

We have investigated the capacity of the (discrete-time) memoryless Gaussian channel, an elementary model in digital communications. In wireless communications, however, due to various reasons such as frequency selectivity, inter-symbol interference, etc., a single Gaussian channel may not model the system well. Instead, a parallel Gaussian channel, which consists of several Gaussian channels with a common total power constraint is more relevant. For example, OFDM (Orthogonal Frequency Division Multiplexing) is a widely used technique in LTE and WiFi that mitigate frequency selectivity and inter-symbol interference. Parallel Gaussian channel is the equivalent channel model under OFDM.

38 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 39

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Model: Parallel Gaussian Channel

X1 Y1 Z1 ∼ N

0, σ2

1

X2

XL Y2 YL Z2 ∼ N

0, σ2

2

ZL ∼ N
0, σ2

L

.

. .

ENC DEC

w b w X    X1 . . . XL    Y    Y1 . . . YL   

1 Input/output alphabet X = Y = RL, the L-dimensional space. 2 Chanel law fY|X: Y = X + Z, Z ∼ N

( 0, diag ( σ2

1, . . . , σ2 L

)) ⊥ ⊥ X. Note that (Z1, . . . , ZL) : ⊥ ⊥

3 Average input power constraint P: 1 N

∑N

k=1 ∥x[k]∥2 ≤ P, where

∥x[k]∥2 = ∑L

l=1 |xl[k]|2.

39 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 40

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Capacity of Parallel Gaussian Channel

Invoking Theorem 1, the capacity of the parallel Gaussian channel C = sup

X: E[∥X∥2]≤P

I (X ; Y ) . The main issue is how to compute it. Let Pl ≜ E[|Xl|2]. Observe that I (X ; Y ) = I (X1, . . . , XL ; Y1, . . . , YL ) = h (Y1, . . . , YL ) − h (Z1, . . . , ZL ) = h (Y1, . . . , YL ) − ∑L

l=1 1 2 log

( 2πe σ2

l

)

(a)

≤ ∑L

l=1 h (Yl ) − ∑L l=1 1 2 log

( 2πe σ2

l

) (b) ≤ ∑L

l=1 1 2 log

( 1 + Pl

σ2

l

) . (a) holds since joint differential entropy ≤ sum of marginal ones. (b) is due to h (Yl ) ≤ 1

2 log (2πe Var [Yl]) ≤ 1 2 log

( 2πe ( Pl + σ2

l

)) .

40 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 41

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

X1 Y1 Z1 ∼ N

0, σ2

1

X2

XL Y2 YL Z2 ∼ N

0, σ2

2

ZL ∼ N
0, σ2

L

.

. .

Hence, I (X ; Y ) ≤

L

∑

l=1 1 2 log

( 1 + Pl

σ2

l

) for any input X with Pl = E [ |Xl|2] , l = 1, . . . , L. Furthermore, to satisfy the power constraint, E [ ∥X∥2] = E [∑L

l=1 |Xl|2]

≤ P = ⇒ ∑L

l=1 Pl ≤ P.

Question: Can we achieve this upper bound? Yes, by choosing (X1, . . . , XL) : ⊥ ⊥, and Xl ∼ N (0, Pl), that is, X ∼ N (0, diag (P1, . . . , PL)) , satisfying (1) ∑L

l=1 Pl ≤ P and (2) Pl ≥ 0, ∀ l = 1, 2, . . . , L.

41 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 42

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Computation of Capacity: a Power Allocation Problem

Intuition: The optimal scheme is to treat each branch separately, and the l-th branch is allocated with transmit power Pl, for l = 1, 2, . . . , L. In the l-th branch (sub-channel), the input Xl ∼ N (0, Pl), and inputs are mutually independent across L sub-channels. Characterization of capacity boils down to the following optimization: Power Allocation Problem C ( P, σ2

1, . . . , σ2 L

) = max

(P1,...,PL) L

∑

l=1 1 2 log

( 1 + Pl

σ2

l

) subject to

L

∑

l=1

Pl ≤ P Pl ≥ 0, ∀ l = 1, 2, . . . , L

42 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 43

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Optimal Power Allocation: Water-Filling

The optimal solution (P∗

1, . . . , P∗ L) of the

above power allocation problem turns out to be the following: (notation: (x)+ ≜ max (x, 0)) Water-Filling Solution P∗

l =

( ν − σ2

l

)+ , ∀ l = 1, . . . , L ν satisfies

L

∑

l=1

( ν − σ2

l

)+ = P

σ2

1

· · · σ2

2

σ2

L

Sub-Channel

1 2 L

Power

P ∗

1

P ∗

2

P ∗

L

ν Total Area = P

43 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 44

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Optimal Power Allocation: Water-Filling

The optimal solution (P∗

1, . . . , P∗ L) of the

above power allocation problem turns out to be the following: (notation: (x)+ ≜ max (x, 0)) Water-Filling Solution P∗

l =

( ν − σ2

l

)+ , ∀ l = 1, . . . , L ν satisfies

L

∑

l=1

( ν − σ2

l

)+ = P

σ2

1

· · · σ2

2

σ2

L

Sub-Channel

1 2 L

Power

P ∗

2

P ∗

L

ν Total Area = P When the power budget P ≫ maxl σ2

l (high SNR regime), the optimal

allocation is roughly uniform: P∗

l ≈ P L .

When the power budget P ≪ minl σ2

l (low SNR regime), the optimal

allocation is roughly choose-the-best: P∗

l ≈ P × 1

{ l = arg min σ2

l

} .

44 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 45

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

σ2

1

· · · σ2

2

σ2

L

Sub-Channel

1 2 L

Power

P ∗

1

P ∗

2

P ∗

L

ν Total Area = P

(a) High SNR

σ2

1

σ2

2

σ2

L

Sub-Channel

1 2 L

Power

P ∗

2

ν

(b) Low SNR

45 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 46

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

A Primer on Convex Optimization (1)

To show that Water-Filling Solution attains capacity (i.e., optimality in the Power Allocation Problem), let us give a quick overview on convex

ptimization, Lagrangian function, and Karush-Kuhn-Tucker theorem.

⋆ Convex Optimization: minimize f (x) subject to gi (x) ≤ 0, i = 1, . . . , m hi (x) = 0, i = 1, . . . , p (3) The above minimization problem is convex if The objective function f is convex. The inequality constraints g1, . . . , gm are convex. The equality constraints h1, . . . , hp are affine, i.e., hi (x) = aT

i x + bi.

46 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 47

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

A Primer on Convex Optimization (2)

Lagrangian Function: For the minimization problem (3), its Lagrangian function is a weighted sum of objective and constraints: L (x, λ, µ) ≜ f (x) +

m

∑

i=1

λigi (x) +

p

∑

i=1

µihi (x) (4) Karush-Kuhn-Tucker (KKT) Theorem: For a convex optimization problem with differentiable objective function f and inequality constraints g1, . . . , gm, suppose that there exists x in the interior of the domain that is strictly feasible (gi (x) < 0, ∀ i = 1, . . . , m and hi (x) = 0, ∀ i = 1, . . . , p.). Then, a feasible x∗ attains the optimality in (3) iff ∃ (λ∗, µ∗) such that λ∗

i ≥ 0 and λ∗ i gi (x∗) = 0, ∀ i = 1, 2, . . . , m

∇xL (x, λ, µ)

(x,λ,µ)=(x∗,λ∗,µ∗) = 0

(5) (5) together with the feasibility of x∗ are called the KKT conditions.

47 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 48

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Optimality of Water-Filling

Proposition 2 (Water-Filling) For a given ( σ2

1, . . . , σ2 L

) ≥ 0, the following maximization problem maximize

L

∑

l=1

log ( Pl + σ2

l

) subject to ∑L

l=1 Pl = P

Pl ≥ 0, l = 1, . . . , L (6) has the solution P∗

l =

( ν − σ2

l

)+, ∀ l = 1, . . . , L, where ν satisfies ∑L

l=1

( ν − σ2

l

)+ = P. The proof is based on evaluating the KKT conditions.

48 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 49

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

pf: First, rewrite (6) into the following equivalent form: minimize −

L

∑

l=1

log ( Pl + σ2

l

) subject to −Pl ≤ 0, l = 1, . . . , L ∑L

l=1 Pl − P = 0

(7) It can be easily checked that (7) is a convex optimization problem. Hence, the Lagrangian function L (P1, . . . , PL, λ1, . . . , λL, µ) = −

L

∑

l=1

log ( Pl + σ2

l

) −

L

∑

l=1

λlPl + µ ( L ∑

l=1

Pl − P ) .

49 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 50

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Proof is complete by finding P1, . . . , PL, λ1, . . . , λL ≥ 0 and µ such that ∑L

l=1 Pl = P

λlPl = 0, ∀ i = 1, . . . , L

∂L ∂Pl = − log e Pl+σ2

l − λl + µ = 0,

∀ i = 1, . . . , L

1 If µ < log e σ2

l : Condition λl = µ −

log e Pl+σ2

l ≥ 0 can only hold if Pl > 0

= ⇒ λl = 0 = ⇒ µ =

log e Pl+σ2

l

= ⇒ Pl = log e

µ

− σ2

l . 2 If µ ≥ log e σ2

l : Condition λl = µ −

log e Pl+σ2

l ≥ 0 and Condition λlPl = 0

imply that Pl = 0. Hence, P∗

l = max

(

log e µ

− σ2

l , 0

) , for l = 1, 2, . . . , L. Finally, by renaming ν ≜ log e

µ , and plugging in Condition ∑L l=1 Pl = P,

we complete the proof due to the KKT theorem.

50 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 51

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

1 Channel Coding over Continuous Memoryless Channels

Continuous Memoryless Channel Gaussian Channel Capacity

2 Parallel Gaussian Channel

Parallel Channel with Independent Noises Parallel Channel with Colored Noises

3 Bandlimited Gaussian Channel

Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

51 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 52

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

For the parallel Gaussian channel investigated so far, let us generalize the result to the case where the noises in the L branches are correlated. The idea behind our technique is simple: apply a pre-processor and a post-processor such that the end-to-end system is again a parallel Gaussian channel with independent noise components.

52 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 53

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Model: Parallel Gaussian Channel with Colored Noise

X1 Y1 X2 XL Y2 YL

. . .

ENC DEC

w b w

Z2 Z1 ZL

X    X1 . . . XL    Y    Y1 . . . YL   

1 Input/output alphabet X = Y = RL, the L-dimensional space. 2 Chanel law fY|X: Y = X + Z, Z ∼ N (0, KZ) ⊥

⊥ X. Note that (Z1, . . . , ZL) are not mutually independent anymore.

3 Average input power constraint P: 1 N

∑N

k=1 ∥x[k]∥2 ≤ P, where

∥x[k]∥2 = ∑L

l=1 |xl[k]|2.

53 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 54

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Eigenvalue Decomposition of a Covariance Matrix

To get to the main idea, we introduce some basic matrix theory. Definition 1 (Positive Semidefinite (PSD) Matrix) A Hermitian matrix A ∈ CL×L is positive semidefinite (A ⪰ 0), iff xHAx ≥ 0, ∀ x ̸= 0 ∈ CL. Here (·)H denotes the transpose of the complex conjugate of a matrix, and a Hermitian matrix A is a square matrix with AH = A. The following important lemma plays a key role in our development. Lemma 1 (Eigenvalue Decomposition of a PSD Matrix) If A ⪰ 0, then A = QΛQH, Q is unitary, i.e., QQH = QHQ = I, and Λ = diag (λ1, . . . , λL), {λi ≥ 0 | i = 1, . . . , L} are A’s eigenvalues. The j-th column of Q, qj, is the eigenvector of A with respect to λj.

54 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 55

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Fact 1 A valid covariance matrix is PSD. pf: By definition, a valid covariance matrix K = E [ YYH] for some complex zero-mean r.v. Y. Therefore, K is Hermitian because KH = ( E [ YYH])H = E [( YYH)H] = E [ YYH] = K. Moreover, it is PSD since for all non-zero x ∈ CL, xHKx = xHE [ YYH] x = E [ xHYYHx ] = E [ YHx

2]

≥ 0. Hence, for the covariance matrix KZ, we can always decompose it into KZ = QΛZQH, where ΛZ = diag ( σ2

1, . . . , σ2 L

) .

55 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 56

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Pre-Processor Q and Post-Processor QH

Based on the eigenvalue decomposition KZ = QΛZQH, we insert Pre-Processor Q and Post-Processor QH as follows:

Z ∼ N (0, KZ) X Y Q QH

X
Y

The end-to-end relationship between X and Y is characterized by the following equivalent channel:

Y =

X + Z, where X ≜ QHX, Y ≜ QHY, Z ≜ QHZ, and Z is zero-mean Gaussian, with covariance matrix QHKZQ = QHQΛZQHQ = ΛZ = diag ( σ2

1, . . . , σ2 L

) .

56 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 57

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Equivalent Parallel Channel with Independent Noises

Z ∼ N (0, ΛZ)
X
Y

Input Power Constraint P

For the above equivalent channel f

Y| X, observe that now the noise terms

in the L branches are now mutually independent. Furthermore, note that for this channel, the input power is the same as the original channel: ∥ x∥2 = xH x = xHQQHx = xHx = ∥x∥2 .

(∵ QQH = I)

Hence, we can use the water-filling solution to find the capacity of this channel, denoted by C.

57 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 58

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

No Loss in Optimality of the Pre-/Post-Processors

C ≤ C, since any scheme in f

Y| X can be transformed to one in fY|X.

Z ∼ N (0, KZ) X Y Q QH

X
Y

Q QH X Y

On the other hand, from the above figure, we can see that after inserting another pre-processor QH and post-processor Q, the new channel fY′|X′ is the same as the original channel fY|X. Let C ′ be the capacity of the above channel. Hence, C = C ′ ≤ C ≤ C = ⇒ C = C.

58 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 59

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Parallel Channel with Independent Noises Parallel Channel with Colored Noises

Summary: Capacity of Parallel Gaussian Channel

Theorem 4 (Capacity of Parallel Gaussian Channel) For the L-branch Gaussian parallel channel with average input power constraint P and noise covariance matrix KZ, the channel capacity is

L

∑

l=1

1 2 log ( 1 + P∗

l

σ2

l

) , where { σ2

1, . . . , σ2 L

} are the L eigenvalues of KZ, and the optimal power allocation {P∗

1, . . . , P∗ L} is given by the following water-filling solution:

P∗

l =

( ν − σ2

l

)+ , ∀ l = 1, . . . , L ν satisfies

L

∑

l=1

( ν − σ2

l

)+ = P

59 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 60

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

1 Channel Coding over Continuous Memoryless Channels

Continuous Memoryless Channel Gaussian Channel Capacity

2 Parallel Gaussian Channel

Parallel Channel with Independent Noises Parallel Channel with Colored Noises

3 Bandlimited Gaussian Channel

Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

60 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 61

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

1 Channel Coding over Continuous Memoryless Channels

Continuous Memoryless Channel Gaussian Channel Capacity

2 Parallel Gaussian Channel

Parallel Channel with Independent Noises Parallel Channel with Colored Noises

3 Bandlimited Gaussian Channel

Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

61 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 62

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

So far in this course we have been focusing on discrete-time systems. However, in the physical world, signals are mostly continuous-time. Reasons that our focus has been on discrete-time:

1 Discrete-time signals and systems are easier to handle. 2 The sampling theorem (by Nyquist and Shannon) asserts that any

bandlimited waveform {s (t) : t ∈ R} with (one-sided) bandwidth W can be exactly represented by a sequence {sd [n] : n ∈ Z}, where sd [n] ≜ s ( n

2W

) . In this lecture, we focus on the bandlimited waveform channel with Gaussian noise. Disclaimer: The treatment here will not be 100% rigorous. Instead, we aim to provide some intuition about the capacity of the continuous-time waveform channel.

62 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 63

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

Sampling Theorem

Let us also recall the sampling theorem as follows. Theorem 5 (Nyquist-Shannon) Let s (t) be finite-energy waveform bandlimited to [−W, W]. Then, s (t) is completely specified by the samples of s (t) spaced

1 2W seconds apart:

s (t) =

∞

∑

i=−∞

s ( i 2W ) sinc (2Wt − i) , where sinc (t) ≜ { sin(πt)

πt

t ̸= 0 1 t = 0 .

63 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 64

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

Continuous to Discrete Time: Orthogonal Expansion

The sampling theorem can be understood as an orthogonal expansion of bandlimited signals, where the set of sinc functions can be thought of as an orthogonal basis of bandlimited signals. Hence, by a change of basis, one converts the continuous-time waveform to a discrete-time sequence. To be more specific, first let us rewrite the sinc functions as follows: ϕi (t) ≜ √ 2W sinc (2Wt − i) , i ∈ Z It is not hard to show that {ϕi (t) | i ∈ Z} forms an orthonormal set: ⟨ϕi (t) , ϕj (t)⟩ = 1 {i = j} . Hence, what the sampling theorem states is simply that {ϕi (t) | i ∈ Z} forms an orthonormal basis of bandlimited signals.

64 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 65

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

Bandlimited Waveform Channel

In this lecture, we focus on the following bandlimited waveform channel Y (t) = hW (t) ∗ (Xa (t) + Za (t)) = X (t) + Z (t) , (8) where hW (t) is an ideal low pass filter HW (f ) = 1 {f ∈ [−W, W]}, X (t) , Z (t) are filtered bandlimited versions of Xa (t) , Za (t), and Input power constraint is P W. Power spectral density of the Gaussian process {Za (t)} is N0

2 W/Hz.

Let us sketch an intuitive (but not rigorous) derivation of the capacity.

1 Sampling theorem: convert the waveform channel to discrete-time. 2 For the discrete-time channel, input power is P × 1 2W J/symbol. 3 Noise variance is

( N0

2 × 2W

) ×

1 2W J/symbol.

= ⇒ C = 1

2 log

( 1 + P/(2W)

N0/2

) = 1

2 log

( 1 +

P WN0

) bits/symbol

65 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 66

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

Equivalent Discrete-Time Gaussian Channel

To be more rigorous in understanding the capacity of the bandlimited waveform channel, let us expand the waveform signals in (8) over the sinc basis { ϕi (t) ≜ √ 2W sinc (2Wt − i)

i ∈ Z

} as follows: X (t) = ∑

i∈Z Xi ϕi (t) , Z (t) = ∑ i∈Z Zi ϕi (t) , Y (t) = ∑ i∈Z Yi ϕi (t)

Xi ≜

1 √ 2W X

(

i 2W

) , Zi ≜

1 √ 2W Z

(

i 2W

) , Yi ≜

1 √ 2W Y

(

i 2W

) Hence, we get an equivalent discrete-time channel Yi = Xi + Zi. Input power constraint P = P(J/s) ×

1 2W (s/symbol) = P 2W(J/symbol).

The sampled { Z (

i 2W

) | i ∈ Z } are i.i.d. Gaussian, with zero mean and variance N0

2 × 2W = N0W.

= ⇒ {Zi | i ∈ Z}: i.i.d. Gaussian with variance N0W

2W = N0 2 .

Hence, the capacity is 1

2 log

( 1 +

P WN0

)

(bits/symbol).

66 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 67

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

AWGN Channel Capacity

Theorem 6 (Capacity of the AWGN Waveform Channel) Consider the AWGN waveform channel (8), bandlimited to [−W, W] Hz, constrained to average input power P watts, and the additive white Gaussian noise process has power spectral density N0

2 watts/Hz.

The capacity of this channel is given by C = 1

2 log

( 1 +

P WN0

) bits/symbol = W log ( 1 +

P WN0

) bits/s. Remark: The channel capacity is a function of Bandwidth W and Power P, the two most important resources in communication engineering.

67 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 68

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

Capacity vs. Power

3 4 5 6 7 20 40 60 80 100 1 2 SNR log (1 + SNR)

Fix W, power resource is measured by SNR ≜

P WN0 .

High SNR: SNR ≫ 1 = ⇒ C ∼ log SNR, log. growth with power. Low SNR: SNR ≪ 1 = ⇒ C ∼ (log2 e) SNR, linear growth with power.

68 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 69

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

Capacity vs. Bandwidth

30 5 Bandwidth W (MHz) Capacity Limit for W → ∞ Power limited region 0.2 1 Bandwidth limited region (Mbps) C(W ) 0.4 25 20 15 10 1.6 1.4 1.2 0.8 0.6 P N0log2 e

C(W) = W log ✓ 1 + P N0W ◆ ≈ W P N0W log2 e = P N0 log2 e

69 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 70

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

Power-Limited vs. Bandwidth-Limited

Let us consider power and bandwidth together: C (P, W) = W log ( 1 +

P WN0

) bits/s, SNR ≜

P WN0 .

Power-Limited Regime: When SNR ≪ 1, C (P, W) ≈ (log2 e) × W ×

P WN0 = (log2 e) P N0 .

Linear in power, insensitive to bandwidth (flat). Bandwidth-Limited Regime: When SNR ≫ 1, C (P, W) ≈ W log (

P WN0

) . Almost linear in bandwidth, insensitive to power (logarithmic).

70 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 71

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

1 Channel Coding over Continuous Memoryless Channels

Continuous Memoryless Channel Gaussian Channel Capacity

2 Parallel Gaussian Channel

Parallel Channel with Independent Noises Parallel Channel with Colored Noises

3 Bandlimited Gaussian Channel

Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

71 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 72

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

White Noise ⇐ ⇒ Flat Power Spectral Density Colored Noise ⇐ ⇒ Non-Flat Power Spectral Density

72 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 73

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

−W W f SZ (f)

73 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 74

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

−W W f SZ (f)

∆

Idea: Quantize the spectrum and form many small bands with almost “flat” power spectral density within it. Then, we have a parallel waveform channel, which we know how to solve! (Water-Filling)

74 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 75

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel Bandlimited Channel with White Gaussian Noise Bandlimited Channel with Colored Gaussian Noise

−W W f SZ (f) ν

C = ∫ W

−W

1 2 log ( 1 + (ν − SZ (f ))+ SZ (f ) df ) bits/s, where ν satisfies ∫ W

−W

(ν − SZ (f ))+ df = P.

75 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 76

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel

Summary

76 / 77 I-Hsiang Wang IT Lecture 6

SLIDE 77

Channel Coding over Continuous Memoryless Channels Parallel Gaussian Channel Bandlimited Gaussian Channel

Continuous channel capacity: C (B) = supf(x): E[b(X)]≤BI (X; Y). Gaussian channel capacity: C (P) = 1

2 log

( 1 + P

σ2

) . Parallel Gaussian channel: C = ∑L

l=1 1 2 log

( 1 + P∗

l

σ2

l

) , where { σ2

1, . . . , σ2 L

} are the L eigenvalues of KZ, and the optimal power allocation {P∗

1, . . . , P∗ L} is given by the water-filling solution:

P∗

l =

( ν − σ2

l

)+ ∀ l, and ν satisfies ∑L

l=1

( ν − σ2

l

)+ = P. Waveform AWGN channel: C = W log ( 1 +

P WN0

) bits/s Waveform channel with colored Gaussian noise: C = ∫ W

−W

1 2 log ( 1 + (ν − SZ (f ))+ SZ (f ) df ) bits/s, where ν satisfies ∫ W

−W

(ν − SZ (f ))+ df = P.

77 / 77 I-Hsiang Wang IT Lecture 6