Chapter 9 Gaussian Channel Peng-Hua Wang Graduate Inst. of Comm. - - PowerPoint PPT Presentation

chapter 9 gaussian channel
SMART_READER_LITE
LIVE PREVIEW

Chapter 9 Gaussian Channel Peng-Hua Wang Graduate Inst. of Comm. - - PowerPoint PPT Presentation

Chapter 9 Gaussian Channel Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 9 Gaussian Channel 9.1 Gaussian Channel: Definitions 9.2 Converse to the Coding Theorem for Gaussian Channels 9.3


slide-1
SLIDE 1

Chapter 9 Gaussian Channel

Peng-Hua Wang

Graduate Inst. of Comm. Engineering National Taipei University

slide-2
SLIDE 2

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 2/31

Chapter Outline

  • Chap. 9 Gaussian Channel

9.1 Gaussian Channel: Definitions 9.2 Converse to the Coding Theorem for Gaussian Channels 9.3 Bandlimited Channels 9.4 Parallel Gaussian Channels 9.5 Channels with Colored Gaussian Noise 9.6 Gaussian Channels with Feedback

slide-3
SLIDE 3

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 3/31

9.1 Gaussian Channel: Definitions

slide-4
SLIDE 4

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 4/31

Introduction

Yi = Xi + Zi, Zi ∼ N(0, N)

■ Xi: input, Yi:output, Zi: noise. Zi is independent of Xi. ■ Without further constraint, the capacity of this channel may be infinite. ◆ If the noise variance N is zero, the channel can transmit an

arbitrary real number with no error.

◆ If the noise variance N is nonzero, we can choose an infinite

subset of inputs arbitrary far apart, so that they are distinguishable at the output with arbitrarily small probability of error.

slide-5
SLIDE 5

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 5/31

Introduction

■ The most common limitation on the input is an energy or power

constraint.

■ We assume an average power constraint. For any codeword

(x1, x2, . . . , xn) transmitted over the channel, we require that 1 n

n

  • i=1

x2

i ≤ P

slide-6
SLIDE 6

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 6/31

Information Capacity

Definition 1 (Capacity) The information capacity of the Gaussian channel with power P is

C = max

f(x):E[X2]≤P I(X; Y )

We can calculate the information capacity as follows.

I(X; Y ) = h(Y ) − h(Y |X) = h(Y ) − h(X + Z|X) = h(Y ) − h(Z|X) = h(Y ) − h(Z) ≤ 1 2 log 2πe(P + N) − 1 2 log 2πeN = 1 2 log

  • 1 + P

N

  • Note that E[Y 2] = E[(X + Z)2] = P + N and the entropy of

gaussian with variance σ2 is 1

2 log 2πeσ2.

slide-7
SLIDE 7

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 7/31

Information Capacity

Therefore, the information capacity of the Gaussian channel is

C = max

E[X2]≤P I(X; Y ) = 1

2 log

  • 1 + P

N

  • and the equality holds when X ∼ N(0, P).

■ Next, we will show that this capacity is achievable.

slide-8
SLIDE 8

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 8/31

Code for Gaussian Channel

Definition 2 ((M, n) code for Gaussian Channel) An (M, n) code for the Gaussian channel with power constraint P consists the following:

  • 1. An index set {1, 2, . . . , M}.
  • 2. An encoding function x : {1, 2, . . . , M} → X n, yielding

codewords xn(1), xn(2), . . . , xn(M), satisfying the power constraint P

1 n

n

  • i=1

x2

i (w) ≤ P,

w = 1, 2, . . . , M.

  • 3. A decoding function g : Yn → {1, 2, . . . , M}.
slide-9
SLIDE 9

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 9/31

Definitions

Definition 3 (Conditional probability of error)

λi = Pr(g(Y n) = i|Xn = xn(i)) =

  • g(yn)=i

p(yn|xn(i)) =

  • yn

p(yn|xn(i))I(g(yn) = i)

■ I(·) is the indicator function.

slide-10
SLIDE 10

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 10/31

Definitions

Definition 4 (Maximal probability of error)

λ(n) = max

i∈{1,2,...,M} λi

Definition 5 (Average probability of error)

P (n)

e

= 1 M

M

  • i=1

λi

■ The decoding error is

Pr(g(Y n) = W) =

M

  • i=1

Pr(W = i) Pr(g(Y n) = i|W = i)

If the index W is chosen uniformly from {1, 2, . . . , M}, then

P (n)

e

= Pr(g(Y n) = W).

slide-11
SLIDE 11

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 11/31

Definitions

Definition 6 (Rate) The rate R of an (M, n) code is

R = log M n

bits per transmission Definition 7 (Achievable rate) A rate R is said to be achievable for a Gaussian channel with a power constraint P if there exists a

(⌈2nR⌉, n) code with codewords satisfying the power constraint such

that the maximal probability of error λ(n) tends to 0 as n → ∞. Definition 8 (Channel capacity) The capacity of a channel is the supremum of all achievable rates.

slide-12
SLIDE 12

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 12/31

Capacity of a Gaussian Channel

Theorem 1 (Capacity of a Gaussian Channel) The capacity of a Gaussian channel with power constraint P and noise variance N is

1 2 log

  • 1 + P

N

  • bits per transmission.
slide-13
SLIDE 13

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 13/31

Sphere Packing Argument

slide-14
SLIDE 14

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 14/31

Sphere Packing Argument

For each sent codeword, the received codeword is contained in a sphere of radius

  • nN. The received vectors have energy no grater

than n(P + N), so they lie in a sphere of radius

  • n(P + N). How

many codeword can we use without intersection in the decoding sphere?

M = An

  • n(P + N)

n An( √ nN)n =

  • 1 + P

N n/2

where A the constant for calculating the volume of n-dimensional sphere. For example,

A2 = π, A3 = 4

3π. Therefore, the capacity is

1 n log M = 1 2 log

  • 1 + P

N

  • .
slide-15
SLIDE 15

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 15/31

R < C → Achievable

■ Codebook. Let Xi(w), i = 1, 2, . . . , n, w = 1, 2, . . . , 2nR be

i.i.d. ∼ N(0, P − ǫ). For large n,

1 n

  • X2

i → P − ǫ. ■ Encoding. The codebook is revealed to both the sender and the

  • receiver. To send the message index w, the transmitter sends the

wth codeword Xn(w) in the codebook.

■ Decoding. The receiver searches for the one that is jointly typical

with the received vector. If there is one and only one such codeword

Xn(w), the receiver declares ˆ W = w. Otherwise, the receiver

declares an error. If the power constraint is not satisfied, the receiver also declare an error.

slide-16
SLIDE 16

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 16/31

R < C → Achievable

■ Probability of error. Assume that codeword 1 was sent.

Y n = Xn(1) + Zn. Define the events E0 =

  • 1

n

n

  • j=1

X2

j (1) > P

  • and

Ei = {

  • Xn(i), Y n(i) is in A(n)

ǫ

  • }.

Then an error occurs if

◆ The power constraint is violate. ⇒ E0 occurs. ◆ The transmitted codeword and the received sequence are not

jointly typical. ⇒ Ec

1 occurs. ◆ Wrong codeword is jointly typical with the received sequence. ⇒

E2 ∪ E3 ∪ · · · ∪ E2nR occurs.

slide-17
SLIDE 17

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 17/31

R < C → Achievable

Let W be uniformly distributed. We have

P (n)

e

= 1 2nR

  • λi = P(E) = Pr(E|W = 1)

= P(E0 ∪ Ec

a ∪ E2 ∪ E3 ∪ · · · ∪ E2nR)

≤ P(E0) + P(Ec

1) + 2nR

  • i=2

P(Ei) ≤ ǫ + ǫ +

2nR

  • i=2

2−n(I(X;Y )−3ǫ) ≤ 2ǫ + 2−n(I(X;Y )−R−3ǫ) ≤ 3ǫ

for n sufficient large and R < I(X; Y ) − 3ǫ.

slide-18
SLIDE 18

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 18/31

R < C → Achievable, final part

■ Since the average probability of error over codebooks is less then 3ǫ,

there exists at least one codebook C∗ such that Pr(E|C∗) < 3ǫ.

◆ C∗ can be found by an exhaustive search over all codes. ■ Deleting the worst half of the codewords in C∗, we obtain a code with

low maximal probability of error. The codewords that violates the power constraint is definitely deleted. (why?) Hence, we have construct a code that achieves a rate arbitrarily close to C.

slide-19
SLIDE 19

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 19/31

9.2 Converse to the Coding Theorem for Gaussian Channels

slide-20
SLIDE 20

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 20/31

Achievable → R < C

We will prove that if P (n)

e

→ 0 then R ≤ C = 1

2 log(1 + P N ). Let W

be distributed uniformly. We have W → Xn → Y n → ˆ

  • W. By Fano’s

inequality,

H(W| ˆ W) ≤ 1 + nRP (n)

e

= nǫn,

where ǫn = 1

n + RP (n)

e

→ 0

as P (n)

e

→ 0. Now,

nR = H(W) = I(W; ˆ W) + H(W| ˆ W) ≤ I(W; ˆ W) + nǫn ≤ I(Xn; Y n) + nǫn(data processing ineq.) = h(Y n) − h(Y n|Xn) + nǫn = h(Y n) − h(Zn) + nǫn ≤

n

  • i=1

h(Yi) − h(Zn) + nǫn ≤

n

  • i=1

h(Yi) −

n

  • i=1

h(Zi) + nǫn

slide-21
SLIDE 21

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 21/31

Achievable → R < C

nR ≤

n

  • i=1

(h(Yi) − h(Zi)) + nǫn ≤ 1 2 log (2πe(Pi + N)) − 1 2 log 2πeN

  • + nǫn

= 1 2 log

  • 1 + Pi

N

  • + nǫn

≤ n 2 log

  • 1 + P

N

  • + nǫn

since every codeword satisfies the power constraint. Thus,

R ≤ 1 2 log

  • 1 + P

N

  • + ǫn.
slide-22
SLIDE 22

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 22/31

9.3 Bandlimited Channels

slide-23
SLIDE 23

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 23/31

Capacity of Bandlimited Channels

■ Suppose the output of a band-limited channel can be represented by

Y (t) = (X(t) + N(t)) ∗ h(t)

where X(t) is the input signal, Z(t) is the white Gaussian noise, and h(t) is the impulse response of the channel with bandwidth W .

■ The sampling frequency is 2W. If the channel be used over the time

interval [0, T], then there are 2WT samples transmitted.

slide-24
SLIDE 24

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 24/31

Capacity of Bandlimited Channels

■ If the noise has power spectral density N0/2 watts/Hz, the noise

power is (N0/2)(2W) = N0W. The noise energy per sample is

N0W ∗ T/2WT = N0/2. If the signal power is P . The signal

energy per sample is PT/2WT = P/2W.

■ The capacity is 1 2 log

  • 1 + P/2W

N0/2

  • bits/sample or

C = W log

  • 1 +

P N0W

  • bits/second
slide-25
SLIDE 25

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 25/31

9.4 Parallel Gaussian Channels

slide-26
SLIDE 26

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 26/31

Capacity of Bandlimited Channels

■ In this section we consider k independent Gaussian channels in

parallel with a common power constraint. The objective is to distribute the total power among the channels so as to maximize the capacity. The channels are modeled as

Yj = Xj + Zj, j = 1, 2, . . . , k.

with Zj ∼ N(0, Nj). There is a common power constraint

E k

  • j=1

X2

j

  • ≤ P.
slide-27
SLIDE 27

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 27/31

Capacity of Bandlimited Channels

The information capacity is

C = max

f(X−1,...,xn):EX2

i <P I(X1, X2 . . . , Xk; Y1, Y2, . . . , Yk)

Since Z1, Z2, . . . , Zk are independent,

I(X1, X2 . . . , Xk; Y1, Y2, . . . , Yk) =h(Y1, Y2, . . . , Yk) − h(Y1, Y2, . . . , Yk|X1, X2 . . . , Xk) =h(Y1, Y2, . . . , Yk) − h(Z1, Z2, . . . , Zk|X1, X2 . . . , Xk) =h(Y1, Y2, . . . , Yk) − h(Z1, Z2, . . . , Zk) =h(Y1, Y2, . . . , Yk) −

  • i

h(Zi) ≤

  • i

h(Yi) −

  • i

h(Zi) ≤

  • i

1 2 log

  • 1 + Pi

Ni

  • where Pi = EX2

i and Pi = P

slide-28
SLIDE 28

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 28/31

Capacity of Bandlimited Channels

Therefore, we have a constrained optimization problem

max

  • i

1 2 log

  • 1 + Pi

Ni

  • subject to
  • i

Pi ≤ P, Pi ≥ 0.

This can be solved by Lagrange multiplier together with the Kuhn-Tucker condition.

− 1 2 1/Ni 1 + Pi/Ni − µi + λ = 0 − Pi ≤ 0,

  • i

Pi − P ≤ 0 µiPi = 0, λ(

  • i

Pi − P) = 0 µi ≥ 0, λ ≥ 0

slide-29
SLIDE 29

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 29/31

Capacity of Bandlimited Channels

Case I. λ = 0. We have

Pi + Ni = − 1 2µi , Pi = − 1 2µi − Ni

This violates the condition −Pi ≤ 0 since Ni > 0 and µi ≥ 0. Case II. λ = 0. We have

Pi + Ni = 1 2(λ − µi) =   

1 2λ = constant,

Pi > 0( imply µi = 0)

1 2(λ−µi),

Pi = 0.

We can solve λ by

i Pi = i( 1 2λ − Ni)+ = P

slide-30
SLIDE 30

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 30/31

Capacity of Bandlimited Channels

slide-31
SLIDE 31

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 31/31

Nonlinear Optimization

For the problem

min f(x1, x2, . . . , xn)

subject to

gj(x1, x2, . . . , xn) ≤ 0, j = 1, 2, . . . m

The necessary conditions for optimization are

∂f ∂xi +

  • j

µj ∂gj ∂xi = 0, i = 1, 2, . . . , n gj(x1, x2, . . . , xn) ≤ 0, j = 1, 2, . . . , m µjgj(x1, x2, . . . , xn) = 0, j = 1, 2, . . . , m µj ≥ 0, j = 1, 2, . . . , m