Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of - - PowerPoint PPT Presentation

lecture 4 noisy channel coding
SMART_READER_LITE
LIVE PREVIEW

Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of - - PowerPoint PPT Presentation

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56


slide-1
SLIDE 1

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation

Lecture 4 Noisy Channel Coding

I-Hsiang Wang

Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw

October 9, 2015

1 / 56 I-Hsiang Wang IT Lecture 4

slide-2
SLIDE 2

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation

The Channel Coding Problem

Channel Encoder Channel Decoder

xN yN

Noisy Channel

w b w

Meta Description

1 Message: Random message W ∼ Unif [1 : 2K]. 2 Channel: Consist of an input alphabet X, an output alphabet Y,

and a family of conditional distributions { p ( yk

  • xk, yk−1)

| k ∈ N } determining the stochastic relationship between the output symbol yk and the input symbol xk along with all past signals ( xk−1, yk−1) .

3 Encoder: Encode the message w by a length N codeword xN ∈ X N. 4 Decoder: Reconstruct message

w from the channel output yN.

5 Efficiency: Maximize the code rate R ≜ K N bits/channel use, given

certain decoding criterion.

2 / 56 I-Hsiang Wang IT Lecture 4

slide-3
SLIDE 3

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation

Decoding Criterion: Vanishing Error Probability

Channel Encoder Channel Decoder

xN yN

Noisy Channel

w b w

A key performance measure: Error Probability P(N)

e

≜ P { W ̸= W } . Question: Is it possible to get zero error probability? Ans: Probably not, unless the channel noise has some special structure. Following the development of lossless source coding, Shannon turned the attention to answering the following question: Is it possible to have a sequence of encoder/decoder pairs such that P(N)

e

→ 0 as N → ∞? If so, what is the largest possible code rate R where vanishing error probability is possible? Recall: In lossless source coding, we see that the infimum of compression rates where vanishing error probability is possible is H ({Si} ).

3 / 56 I-Hsiang Wang IT Lecture 4

slide-4
SLIDE 4

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation

Rate

R

Block Length

N P(N)

e

Probability

  • f Error

Capacity: Take N → ∞, Require P(N)

e

→ 0 = ⇒ sup R = C. Error Exponent: Take N → ∞, Fix rate R = ⇒ min P(N)

e

≈ 2−NE(R). Finite Block Length: Fix N, Require P(N)

e

≤ ε = ⇒ sup R = C − √

V NQ−1 (ε) + O

(

log N N

) . Remark: For source coding, one can establish a similar framework.

4 / 56 I-Hsiang Wang IT Lecture 4

slide-5
SLIDE 5

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation

In this lecture we only focus on capacity. In other words, we ignore the issue of finite block length (FBL). FBL performance can be obtained via techniques extending from CLT. We do not pursue finer analysis on the error probability via large deviation techniques either.

5 / 56 I-Hsiang Wang IT Lecture 4

slide-6
SLIDE 6

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation

Discrete Memoryless Channel (DMC)

In order to demonstrate the key ideas in channel coding, in this lecture we shall focus on discrete memoryless channels (DMC) defined below. Definition 1 (Discrete Memoryless Channel) A discrete channel ( X, { p ( yk

  • xk, yk−1)

| k ∈ N } , Y ) is memoryless if ∀ k ∈ N, p ( yk

  • xk, yk−1)

= pY|X (yk|xk) . In other words, Yk − Xk − ( Xk−1, Yk−1) . Here the conditional p.m.f. pY|X is called the channel law or channel transition function. Question: is our definition of a channel sufficient to specify p ( yN xN) , the stochastic relationship between the channel input (codeword) xN and the channel output yN?

6 / 56 I-Hsiang Wang IT Lecture 4

slide-7
SLIDE 7

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation

p ( yN xN) = p ( xN, yN) p (xN) p ( xN, yN) =

N

k=1

p ( xk, yk

  • xk−1, yk−1)

=

N

k=1

p ( yk

  • xk, yk−1)

p ( xk

  • xk−1, yk−1)

Hence, we need to further specify { p ( xk

  • xk−1, yk−1)

| k ∈ N } , which cannot be obtained from p ( xN) . Interpretation: { p ( xk

  • xk−1, yk−1)

| k ∈ N } is induced by the encoding function, which implies that the encoder can potentially make use of the past channel output, i.e., feedback.

7 / 56 I-Hsiang Wang IT Lecture 4

slide-8
SLIDE 8

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation

DMC without Feedback

Channel Encoder

xk yk

Noisy Channel

w

No Feedback

Channel Encoder

xk yk

Noisy Channel

w

D

yk−1

With Feedback

Suppose in the system, the encoder has no knowledge about the realization of the channel output, then, p ( xk

  • xk−1, yk−1)

= p ( xk

  • xk−1)

for all k ∈ N, and it is said the the channel has no feedback. In this case, specifying { p ( yk

  • xk, yk−1)

| k ∈ N } suffices to specify p ( yN xN) . Proposition 1 (DMC without Feedback) For a DMC ( X, pY|X, Y ) without feedback, p ( yN xN) =

N

k=1

pY|X (yi|xi).

8 / 56 I-Hsiang Wang IT Lecture 4

slide-9
SLIDE 9

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation

Overview

In this lecture, we would like to establish the following (informally described) noisy channel coding theorem due to Shannon: For a DMC ( X, pY|X, Y ) , the maximum code rate with vanishing error probability is the channel capacity C ≜ max

pX(·) I (X ; Y ) .

The above holds regardless of the availability of feedback. To demonstrate this result, we organize the lecture as follows:

1 Give the problem formulation, state the main theorem, and visit a

couple of examples to show how to compute channel capacity.

2 Prove the converse part: an achievable rate cannot exceed C. 3 Prove the achievability part with a random coding argument.

9 / 56 I-Hsiang Wang IT Lecture 4

slide-10
SLIDE 10

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

10 / 56 I-Hsiang Wang IT Lecture 4

slide-11
SLIDE 11

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

11 / 56 I-Hsiang Wang IT Lecture 4

slide-12
SLIDE 12

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Channel Coding without Feedback: Problem Setup

Channel Encoder Channel Decoder

xN yN

Noisy Channel

w b w

1 A

( 2NR, N ) channel code consists of

an encoding function (encoder) encN : [1 : 2K] → X N that maps each message w to a length N codeword xN, where K ≜ ⌈NR⌉. a decoding function (decoder) decN : YN → [1 : 2K] ∪ {∗} that maps a channel output sequence yN to a reconstructed message w or an error message ∗.

2 The error probability is defined as P(N) e

≜ P { W ̸= W } .

3 A rate R is said to be achievable if there exist a sequence of

( 2NR, N ) codes such that P(N)

e

→ 0 as N → ∞. The channel capacity is defined as C ≜ sup {R | R : achievable}.

12 / 56 I-Hsiang Wang IT Lecture 4

slide-13
SLIDE 13

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Channel Coding Theorem for Discrete Memoryless Channel

Theorem 1 (Channel Coding Theorem for DMC without Feedback) The capacity C of the DMC p (y|x) without feedback is given by C = max

p(x) I (X ; Y ) .

(1) The capacity formula (1) is intuitive, since I (X ; Y ) represents the amount of information about the channel input X that one can infer from the channel output Y. The maximization over p (x) stands for choosing the best possible input distribution so that the amount of information transfer is maximized.

13 / 56 I-Hsiang Wang IT Lecture 4

slide-14
SLIDE 14

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Rest of the lecture:

1 First we give some examples of noisy channels to show how to

compute capacity.

2 Then, we prove that for any rate R > C, it is impossible to have

vanishing error probability (converse).

3 Finally, we prove that for any R < C, there exist a sequence of

encoding/decoding schemes such that the error probability vanishes as blocklength tends to ∞ (achievability), based on a probabilistic argument called random coding.

14 / 56 I-Hsiang Wang IT Lecture 4

slide-15
SLIDE 15

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Binary Symmetric Channel

A binary symmetric channel (BSC) consists of Binary input/output X = Y = {0, 1}. Channel law p (y|x) = [1 − p p p 1 − p ] . The capacity of BSC is CBSC = 1 − Hb (p).

1 1 p 1 − p p 1 − p X Y

To compute BSC capacity, observe I (X ; Y ) = H (Y ) − H (Y |X ), and H (Y |X = 0 ) = H (Y |X = 1 ) = Hb (p) = ⇒ H (Y |X ) = Hb (p). H (Y ) ≤ log 2 = 1, with equality iff Y is uniform. Question: Is it possible to choose a p (x) such that Y is uniform? Ans: Yes, choose X to be uniform = ⇒ C = max

p(x) I (X ; Y ) = 1 − Hb (p).

15 / 56 I-Hsiang Wang IT Lecture 4

slide-16
SLIDE 16

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Binary Erasure Channel

A binary erasure channel (BEC) consists of Binary input X = {0, 1} and output with erasure Y = {0, 1, ∗}. Channel law p (y|x) = [1 − p p p 1 − p ] . The capacity of BEC is CBEC = 1 − p.

1 1 p 1 − p p 1 − p X Y ∗

Suppose we begin with I (X ; Y ) = H (Y ) − H (Y |X ). Then, H (Y |X = 0 ) = H (Y |X = 1 ) = Hb (p) = ⇒ H (Y |X ) = Hb (p). H (Y ) ≤ log 3, with equality iff Y is uniform. Question: Is it possible to choose a p (x) such that Y is uniform? Ans: No. So, we cannot say that maxp(x) H (Y ) = log 3.

16 / 56 I-Hsiang Wang IT Lecture 4

slide-17
SLIDE 17

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

1 1 p 1 − p p 1 − p X Y ∗ 1 1 X Y ∗ 1 1 α 1 − α

Instead, we can start with I (X ; Y ) = H (X ) − H (X |Y ). Then, we have the reverse channel law p (x|y) =   1 α 1 − α 1  , where α ≜ P {X = 0}. H (X |Y = 0 ) = H (X |Y = 1 ) = 0, H (X |Y = ∗ ) = Hb (α) = H (X ) = ⇒ H (X |Y ) = P {Y = ∗} = pH (X ). H (X ) ≤ 1, with equality iff X is uniform. Hence, CBEC = maxp(x) (1 − p) H (X ) = 1 − p.

17 / 56 I-Hsiang Wang IT Lecture 4

slide-18
SLIDE 18

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Erasure Channel

We can generalize BEC to the following erasure channel: Input X, output Y = X ∪ {∗}. Channel law p (y|x) =      1 − p, y = x p, y = ∗ 0,

  • therwise

A motivation for this model is from networking, where the erasure ∗ models the “packet drop”.

Exercise 1 Show that the capacity of the erasure channel is CEC = (1 − p) log|X|.

18 / 56 I-Hsiang Wang IT Lecture 4

slide-19
SLIDE 19

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Symmetric Channel

In computing the capacity of BSC, we observe that

1 H (Y |X ) = Hb (p) regardless of p (x).

Why? Because all rows of p (y|x) are permutations of a same probability vector [p 1 − p] .

2 H (Y ) = log|Y| can be attained, that is, Y can be made uniform by

choosing X to be uniform. Why? Because all columns of p (y|x), have the same sum ∑

x p (y|x).

Definition 2 (Symmetric Channel) A symmetric channel is a channel with channel law p (y|x) satisfying (1) all rows of p (y|x) are permutations of a same probability vector p, and (2) all columns of p (y|x), have the same sum ∑

x p (y|x).

Exercise 2 Show that the capacity of a symmetric channel is log|Y| − H (p).

19 / 56 I-Hsiang Wang IT Lecture 4

slide-20
SLIDE 20

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Computing Capacity of DMC via Convex Optimization

For a DMC, we are able to find its capacity efficiently by revoking efficient algorithms in solving convex programs, since I (X ; Y ) is a concave function of p (x) for fixed p (y|x). Proposition 2 I (X ; Y ) is a concave function of p (x) for fixed p (y|x). pf: By definition, I (X ; Y ) = H (Y ) − H (Y |X ). H (Y |X ) = ∑

x p (x) H (Y|X = x) is a linear function of p (x), because

H (Y|X = x) = − ∑ p (y|x) log p (y|x) is constant for fixed p (y|x). H (Y ) is a concave function of p (y). p (y) is a linear function of p (x) for fixed p (y|x). Hence, H (Y ) is a concave function of p (x) for fixed p (y|x). Putting the above together, we complete the proof.

20 / 56 I-Hsiang Wang IT Lecture 4

slide-21
SLIDE 21

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

21 / 56 I-Hsiang Wang IT Lecture 4

slide-22
SLIDE 22

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Proof of the (Weak) Converse (1)

We would like to show that for every sequence of ( 2NR, N ) codes such that P(N)

e

→ 0 as N → ∞, the rate R ≤ max

p(x) I (X ; Y ).

pf: Note that W ∼ Unif [1 : 2K] and hence K = H (W ). NR ≤ H (W ) = I ( W ; W ) + H ( W

  • W

) (2) ≤ I ( W ; YN ) + ( 1 + P(N)

e

log ( 2K + 1 )) (3) ≤

N

k=1

I ( W ; Yk

  • Yk−1 )

+ ( 1 + P(N)

e

(NR + 2) ) (4) (2) is due to K = ⌈NR⌉ ≥ NR and chain rule. (3) is due to W − YN − W and Fano’s inequality. (4) is due to chain rule and 2K + 1 ≤ 2NR+1 + 1 ≤ 2 × 2NR+1.

22 / 56 I-Hsiang Wang IT Lecture 4

slide-23
SLIDE 23

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Proof of the (Weak) Converse (2)

Set εN ≜ 1

N

( 1 + P(N)

e

(NR + 2) ) , we see that εN → 0 as N → ∞ because limN→∞ P(N)

e

= 0. The next step is to relate ∑N

k=1 I

( W ; Yk

  • Yk−1 )

to I (X ; Y ), by the following manipulation: I ( W ; Yk

  • Yk−1 )

≤ I ( W, Yk−1 ; Yk ) ≤ I ( W, Yk−1, Xk ; Yk ) (5) = I (Xk ; Yk ) ≤ max

p(x) I (X ; Y )

(6) (5) is due to the fact that conditioning reduces entropy. (6) is due to DMC: p ( yk|xk, yk−1, w ) = p ( yk|xk, yk−1) = p (yk|xk) = ⇒ Yk − Xk − ( W, Xk−1, Yk−1) = ⇒ Yk − Xk − ( W, Yk−1) .

23 / 56 I-Hsiang Wang IT Lecture 4

slide-24
SLIDE 24

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Proof of the (Weak) Converse (3)

Hence, we have NR ≤

N

k=1

I ( W ; Yk

  • Yk−1 )

+ NεN ≤ N max

p(x) I (X ; Y ) + NεN

= ⇒ R ≤ max

p(x) I (X ; Y ) + εN, ∀ N.

Taking N → ∞, we have: R ≤ max

p(x) I (X ; Y ) if it is achievable.

Remark: Similar to the source coding problem, a stronger version of the converse holds in the channel coding problem as well: if R > C, then P(N)

e

→ 1 as N → ∞ for any encoding/decoding functions.

24 / 56 I-Hsiang Wang IT Lecture 4

slide-25
SLIDE 25

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

25 / 56 I-Hsiang Wang IT Lecture 4

slide-26
SLIDE 26

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Channel Coding with Feedback: Problem Setup

Channel Encoder Channel Decoder

xN yN

Noisy Channel

w b w

D

1 A

( 2NR, N ) channel code consists of

an encoding function (encoder) encN : [1 : 2K]×YN−1 → X N that maps each message w to a length N codeword xN, where K ≜ ⌈NR⌉, and the k-th symbol xk is a function of ( w, yk−1) for all k ∈ [1 : N]. a decoding function (decoder) decN : YN → [1 : 2K] ∪ {∗} that maps a channel output sequence yN to a reconstructed message w or an error message ∗.

2 The error probability is defined as P(N) e

≜ P { W ̸= W } .

3 A rate R is said to be achievable if there exist a sequence of

( 2NR, N ) codes such that P(N)

e

→ 0 as N → ∞. The channel capacity is defined as C ≜ sup {R | R : achievable}.

26 / 56 I-Hsiang Wang IT Lecture 4

slide-27
SLIDE 27

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Dependency Graph: Without vs. With Feedback

X1 W X2 Xk XN YN Yk Y2 Y1 c W XN Y N pY |X encN decN

No Feedback

27 / 56 I-Hsiang Wang IT Lecture 4

slide-28
SLIDE 28

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Dependency Graph: Without vs. With Feedback

X1 W X2 Xk XN YN Yk Y2 Y1 c W XN Y N pY |X encN decN

With Feedback

28 / 56 I-Hsiang Wang IT Lecture 4

slide-29
SLIDE 29

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Capacity Proof of the Weak Converse Feedback Capacity

Feedback Capacity

Theorem 2 (Channel Coding Theorem for DMC with Feedback) The capacity of the DMC p (y|x) with feedback is given by (1), the same as that without feedback. In other words, feedback does not increase the channel capacity for DMC. The proof is immediate because in the converse proof of channel coding theorem without feedback, we do not make use of the assumption that there is no feedback. In other words, the proof is identical even with feedback. Remark: Although feedback does not increase capacity, it does improve the reliability (error exponent) and finite-blocklength performance greatly. Furthermore, the design and the complexity of the coding scheme may also be greatly simplified and reduced due to feedback. The details are

  • ut of scope of this lecture.

29 / 56 I-Hsiang Wang IT Lecture 4

slide-30
SLIDE 30

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

30 / 56 I-Hsiang Wang IT Lecture 4

slide-31
SLIDE 31

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

31 / 56 I-Hsiang Wang IT Lecture 4

slide-32
SLIDE 32

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Overview

In order to prove the achievability part of Theorem 1, we need to show the following mathematical statement: ∀ R < C, R ≥ 0, ∃ a sequence of ( 2NR, N ) codes such that lim

N→∞ P(N) e

= 0. In general, to prove the existence of certain objects satisfying some desirable properties, there are two possible ways:

1 Explicitly construct an object and prove that the properties hold. 2 Assume that no objects can satisfy the properties, and show

contradiction. The achievability proof presented in this lecture is more of the second flavor, and in fact belongs to the so-called probabilistic method.

32 / 56 I-Hsiang Wang IT Lecture 4

slide-33
SLIDE 33

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

The Probabilistic Method

What is the probabilistic method? Roughly speaking, in order to show the existence of certain objects satisfying some desirable properties, One first imposes particular probability distribution over the possible

  • bject space.

Then, by showing that “on average” the properties hold or the properties hold with non-zero probability, one concludes the existence of such objects. Example 1 Given a set of n-dimensional unit vectors {v1, v2, . . . , vk}, show that ∃ xi ∈ {±1}, i = 1, . . . , k such that

  • ∑k

i=1 xivi

√ k.

33 / 56 I-Hsiang Wang IT Lecture 4

slide-34
SLIDE 34

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

pf: Let {Xi}k

i=1 be i.i.d. r.v.’s with P {Xi = 1} = P {Xi = −1} = 1 2.

Define V ≜ ∑k

i=1 Xivi. Compute E

[ |V|2] as follows: E [ |V|2] = E [ VTV ] = E [( k ∑

i=1

XivT

i

) ( k ∑

i=1

Xivi )] = E  

k

i=1 k

j=1

XiXjvT

i vj

  =

k

i=1 k

j=1

E [XiXj] vT

i vj

∵ {Xi} are mutually independent, E [XiXj] = E [Xi] E [Xj] = 0 for i ̸= j. ∴ E [ |V|2] = ∑k

i=1 E

[ X2

i

] ∥vi∥2 = k. Hence, ∃ xi ∈ {±1}, i = 1, . . . , k such that

  • ∑k

i=1 xivi

  • k. Otherwise,

E [ |V|2] should be less than k, leading to contradiction.

34 / 56 I-Hsiang Wang IT Lecture 4

slide-35
SLIDE 35

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Paul Erdős (1913 – 1996)

35 / 56 I-Hsiang Wang IT Lecture 4

slide-36
SLIDE 36

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Coding over Noisy Channel

Before we prove the main theorem, let us set up a few notations related to coding over noisy channel.

1 Codebook c =

{ xN (1) , xN (2) . . . xN ( 2K)} consists of the 2K codewords and is the range of the encoding function.

2 ML Decoder (maximum likelihood) is the optimal decoder that

minimizes the probability of error P(N)

e

when the messages are uniformly chosen (uniform prior):

  • wML = arg maxw∈[1:2K] p

( yN xN (w) ) .

3 Probability of Error of Message m: λm ≜ P

{

  • W ̸= W
  • W = m

} In principle, one can derive the ML decoding rule and compute P(N)

e

for a given codebook. But, there are some challenges toward proving the channel coding theorem.

36 / 56 I-Hsiang Wang IT Lecture 4

slide-37
SLIDE 37

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Challenges and Work-Arounds

First, the expression of error probability of ML is usually intractable, and it is hard to obtain any insight regarding the asymptotic behaviors. Second, it is unclear how to construct the codebook and the corresponding decoding scheme. In summary, to prove the achievability part of the channel coding theorem, there are two main challenges we shall overcome:

1 How to show the existence of good codebooks?

We circumvent the issue of explicit construction by using a random coding argument (a kind of the probabilistic method)

2 How to analyze the error probability?

We circumvent the issue of ML decoding error analysis by using a suboptimal decoder and derive upper bounds on the probability of error of the chosen decoder.

37 / 56 I-Hsiang Wang IT Lecture 4

slide-38
SLIDE 38

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Proof Program

1 Random Codebook Generation:

Generate an ensemble of codebooks according to certain probability

  • distribution. Hence, codebook C becomes a random object.

2 Error Probability Analysis:

Goal: Show that as N → ∞, EC [ P(N)

e,ML (C)

] → 0, and conclude that there must exist a codebook c such that the decoding error probability P(N)

e,ML → 0.

To simplify analysis, we shall introduce suboptimal decoders and give a tractable upper bound of error probability using union of events bound.

38 / 56 I-Hsiang Wang IT Lecture 4

slide-39
SLIDE 39

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Random Codebook Generation

A simple way is to i.i.d. generate 2K codewords, and each codeword p ( xN) ∼ ∏

i=1 pX (xi).

In other words, if we stack all 2K codewords together into a 2K × N matrix C, the elements of the matrix C will be i.i.d. distributed according to pX: (each row is a codeword) c =      X1 (1) X2 (1) · · · XN (1) X1 (2) X2 (2) · · · XN (2) . . . . . . ... . . . X1 ( 2K) X2 ( 2K) · · · XN ( 2K)      and p (c) ≜ P {C = c} = ∏2K

w=1

∏N

i=1 pX (xi(w)).

It turns out the symmetry in such codebook ensemble distribution helps simplify the analysis.

39 / 56 I-Hsiang Wang IT Lecture 4

slide-40
SLIDE 40

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Encoding and Decoding

For a realization c of the codebook random ensemble C, we describe the encoding and decoding methods below. Encoding: for a message m ∈ [1 : 2K], choose the m-th row of the codebook c and send it out. Decoding: ideally one would like to use the following ML decoding rule:

  • wML = arg maxw∈[1:2K] p

( yN xN (w) ) . However, the performance of ML decoder is usually not tractable, as mentioned before. Instead, we introduce a suboptimal decoder based on typical sequences as follows:

  • wT = a unique w such that

( xN(w), yN) ∈ T (N)

ε

(X, Y). Note: there are some other suboptimal decoders can be used, such as threshold decoders.

40 / 56 I-Hsiang Wang IT Lecture 4

slide-41
SLIDE 41

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Error Probability Analysis (1)

Since the ML decoder is optimal, we can analyze the performance of the typicality decoder and use it as an upper bound. Hence, our goal is turned to proving limN→∞ EC [ P(N)

e,T (C)

] = 0.

1 The first step is to use the symmetry of codebook ensemble to

simplify EC [ P(N)

e,T (C)

] and argue that we can focus on analyzing the error probability of the first codeword XN (1) averaged over C: EC [ P(N)

e,T (C)

] = EC [ 2−K ∑2K

m=1 λm (C)

] = 2−K ∑

m

EC [λm (C)] = 2−K ∑

m

EC [λ1 (C)] = EC [λ1 (C)] = P {Error, averaged over C|W = 1}

41 / 56 I-Hsiang Wang IT Lecture 4

slide-42
SLIDE 42

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Error Probability Analysis (2)

2 For notational simplicity, use E denote the text “Error” event and

drop the “averaged over C”. Our next focus is to upper bound P {E|W = 1} ≜ P1 (E). The trick here is to distinguish into two different kinds of errors: E = Ea ∪ Et,    Ea ≜ {( XN (1) , YN) / ∈ T (N)

ε

} Et ≜ {( XN (w) , YN) ∈ T (N)

ε

for some w ̸= 1 } The core is whether or not the joint sequence ( XN (w) , YN) are ε-typical. Let us define Aw ≜ {( XN (w) , YN) ∈ T (N)

ε

} . We can then rewrite Ea = Ac

1, Et = ∪w̸=1Aw, and hence

E = Ea ∪ Et = Ac

1 ∪ (∪w̸=1Aw).

42 / 56 I-Hsiang Wang IT Lecture 4

slide-43
SLIDE 43

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Error Probability Analysis (3)

3 We are now ready to apply the union of events bound:

P1 {E} = P1 {Ac

1 ∪ (∪w̸=1Aw)} ≤ P1 {Ac 1} + 2K

w=2

P1 {Aw} . Next, we shall develop upper bounds on

the probability that the actual transmitted codeword XN (1) and the actual received signal YN are not (jointly) typical. the probability that some other (random) codeword XN (̸= 1) and the actual received signal YN are (jointly) typical.

Lemma 1 (A Key Lemma) P1 {A1} ≥ 1 − ε for N large enough, and P1 {Aw} ≤ 2−N(I(X ;Y )−δ(ε)) for all w ̸= 1, where δ (ε) → 0 as ε → 0.

43 / 56 I-Hsiang Wang IT Lecture 4

slide-44
SLIDE 44

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Error Probability Analysis (4)

4 Finally, let us put all the above together and apply Lemma 1:

EC [ P(N)

e,T (C)

] ≜ P {E} = P {E|W = 1} ≜ P1 {E} ≤ P1 {Ac

1} + 2K

w=2

P1 {Aw} ≤ ε +

2K

w=2

2−N(I(X ;Y )−δ(ε)) ≤ ε + 2−N(I(X ;Y )−δ(ε)−R) As long as R ≤ I (X ; Y ) − δ(ε), we are able to make P {E} ≤ 2ε for N large enough, which is equivalent to limN→∞ EC [ P(N)

e,T (C)

] → 0.

44 / 56 I-Hsiang Wang IT Lecture 4

slide-45
SLIDE 45

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Completion of the Achievability Proof

We have shown that as long as R ≤ I (X ; Y ) − δ(ε), limN→∞ EC [ P(N)

e,T (C)

] → 0, and hence there must exist a realization of codebook c such that P(N)

e,T (c) → 0 as N → ∞.

Finally, taking the codebook generating distribution pX = arg maxp(x)I (X ; Y ) , we conclude that ∀ R < C = maxp(x) I (X ; Y ), R is achievable.

45 / 56 I-Hsiang Wang IT Lecture 4

slide-46
SLIDE 46

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Proof of Lemma 1 (1): Recap of Typicality

Recall: by definition, an ε-typical (vector) sequence (xn, yn) shall satisfy |π (a, b|xn, yn) − pX,Y (a, b)| ≤ εpX,Y (a, b) , ∀ (a, b) ∈ X × Y.

(Note: we can think of (X, Y) as a r.v. and apply the same definition of typicality!)

Hence, if (Xn, Yn) ∼ ∏n

i=1 pX,Y (xi, yi), then we have 0 (xn, yn) ∈ T (n) ε

(X, Y) = ⇒ xn ∈ T (n)

ε

(X) , yn ∈ T (n)

ε

(Y).

1 ∀ (xn, yn) ∈ T (n) ε

(X, Y),

  • − 1

n log p (xn, yn) − H (X, Y )

  • ≤ δ(ε),

where δ(ε) = εH (X, Y ).

2 p

( T (n)

ε

(X, Y) ) ≥ 1 − ε for n large enough.

3 |T (n) ε

(X, Y)| ≤ 2n(H(X,Y )+δ(ε)).

4 |T (n) ε

(X, Y)| ≥ (1 − ε)2n(H(X,Y )−δ(ε)) for n large enough.

46 / 56 I-Hsiang Wang IT Lecture 4

slide-47
SLIDE 47

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Proof of Lemma 1 (2): Typical with Actual Codeword

Let us first consider P1 {A1} = P {( XN (1) , YN) ∈ T (N)

ε

  • W = 1

} . We are averaging over a random codebook ensemble C, and the random codebook is generated element-by-element i.i.d. based on pX. DMC without feedback implies p ( yN xN) = ∏N

i=1 pY|X (yi|xi).

Hence, given W = 1, ( XN(1), YN) has the following joint distribution: p ( xN, yN) = p ( xN) · p ( yN xN) = ∏N

i=1 pX (xi) ·

∏N

i=1 pY|X (yi|xi)

= ∏N

i=1 pX,Y (xi, yi)

By Property 2 (LLN), we see that for N large enough, P1 {A1} = P {( XN (1) , YN) ∈ T (N)

ε

  • W = 1

} ≥ 1 − ε.

47 / 56 I-Hsiang Wang IT Lecture 4

slide-48
SLIDE 48

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Proof of Lemma 1 (3): Typical with a Wrong Codeword

Consider P1 {Aw} = P {( XN (w) , YN) ∈ T (N)

ε

  • W = 1

} for w ̸= 1. Note that we are averaging over a random codebook ensemble C, and the random codebook is generated element-by-element i.i.d. based on pX. Hence, although XN (1) and XN (w) have the same marginal distribution pX, they are actually independent. Due to DMC, ( XN(1), YN) ⊥ ⊥ XN (w). Hence, YN ⊥ ⊥ XN (w), and P1 {Aw} = ∑

(xN,yN)∈T (N)

ε

p ( xN) · p ( yN) ≤ 2N(1+ε)H(X,Y )

  • cardinality upper

bound on typical set

· 2−N(1−ε)H(X)

  • upper bound on prob.
  • f a typical sequence

· 2−N(1−ε)H(Y)

  • upper bound on prob.
  • f a typical sequence

= 2−N(I(X ;Y )−δ(ε)), where δ (ε) = ε (H (X, Y ) + H (X) + H (Y)) → 0 as ε → 0.

48 / 56 I-Hsiang Wang IT Lecture 4

slide-49
SLIDE 49

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Some Reflections

Reflection 1: Mutual independence of codewords.

In the random coding argument of the proof, 2K × N elements of the codebook matrix C are generated i.i.d., and hence the 2K rows { XN (1) , . . . , XN ( 2K)} are mutually independent. However, in the proof we only require pairwise independence: XN (1) ⊥ ⊥ XN (w) , ∀ w ̸= 1.

Reflection 2: Typicality decoder.

We use typicality decoder other than the optimal ML decoder to find tractable upper bounds on the error probability. There are other suboptimal decoders can be used. For example, the following threshold decoder can also work:

  • wth ≜ a unique w such that i

( xN (w) ; yN) > β, where i ( xN; yN) ≜ log

p(xN,yN) p(xN)p(yN) = N

k=1

log

pY|X(yk|xk) pX(xk)

, and β ≜ I (X ; Y ) − ε.

49 / 56 I-Hsiang Wang IT Lecture 4

slide-50
SLIDE 50

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

50 / 56 I-Hsiang Wang IT Lecture 4

slide-51
SLIDE 51

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Joint Source-Channel Coding: Problem Setup

Encoder Decoder Channel Source Destination

pY |X b sNs sNs xNc yNc {Si}

Source model: discrete stationary ergodic with entropy rate H ({Si} ). Channel model: DMC pY|X with channel capacity C ( pY|X ) .

1 A

( |S|NcR , Nc ) joint source-channel code consists of

an encoding function (encoder) encNc : SNs → X Nc that maps each source sequence sNs to a length Nc codeword xNc, Ns ≜ ⌈NcR⌉. a decoding function (decoder) decNc : YNc → SNs that maps a channel output sequence yNc to a reconstructed sequence sNs.

2 The error probability is defined as P(Nc) e

≜ P { SNs ̸= SNs } .

3 A rate R is said to be achievable if there exist a sequence of

( |S|NcR , Nc ) codes such that P(Nc)

e

→ 0 as Nc → ∞.

51 / 56 I-Hsiang Wang IT Lecture 4

slide-52
SLIDE 52

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Source-Channel Separation Theorem

Theorem 3 (Source-Channel Separation)

1 If R < C H({Si} ), then R is achievable, i.e., lossless reconstruction of

source {Si} is possible via the noisy channel pY|X.

2 Conversely, if R > C H({Si} ), then R is not achievable, i.e., lossless

reconstruction is impossible.

Source Encoder Source Noisy Channel Channel Encoder Destination Source Decoder Channel Decoder Binary Interface

sNs b sNs xNc yNc bK b bK

52 / 56 I-Hsiang Wang IT Lecture 4

slide-53
SLIDE 53

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Proof of Achievability

pf: (Achievability Part): Choose a ( 2NsRs, Ns ) lossless source code with Rs = H ({Si} ) + εs. Choose a ( 2NcRc, Nc ) channel code with Rc = C − εc. Due the the channel coding theorem, the binary sequence bK lives in the digital interface between the source and the channel coders can be decoded with vanishing error probability. Due to the lossless source coding theorem, the source sequences can be reconstructed with vanishing error probability as long as the bit sequence bK can be successfully decoded by the channel decoder. Concatenate the above two codes together, we see that as long as NsRs < NcRc ⇐ ⇒

Ns Nc < Rc Rs = C−εc H({Si} )+εs , the separation scheme is

able to reconstruct the source sequence with vanishing error probability. Since εs, εc can be made arbitrarily small, as long as R <

C H({Si} ), it is

achievable.

53 / 56 I-Hsiang Wang IT Lecture 4

slide-54
SLIDE 54

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Achievability Proof Source-Channel Separation

Proof of Converse

pf: (Converse Part): We shall prove that ∀ achievable R, R ≤

C H({Si} ).

NsH ({Si} ) ≤ H ( SNs ) = I ( SNs ; SNs ) + H ( SNs

  • SNs )

(7) ≤ I ( SNs ; YNc ) + ( 1 + P(Nc)

e

Ns log|S| ) (8) ≤

Nc

k=1

I ( SNs ; Yk

  • Yk−1 )

+ ( 1 + P(Nc)

e

Ns log|S| ) ≤ Nc (C + εNc) , where εNc → 0 as Nc → ∞. (9) (7) is due to the property of entropy rate and chain rule. (8) is due to SNs − YNc − SNs and Fano’s inequality. (9) is due to similar steps as in the channel coding converse proof. Hence, R ≤ Ns

Nc ≤ C H({Si} ) if R is achievable.

54 / 56 I-Hsiang Wang IT Lecture 4

slide-55
SLIDE 55

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation

Summary

55 / 56 I-Hsiang Wang IT Lecture 4

slide-56
SLIDE 56

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation

Channel coding theorem: C = maxp(x) I (X ; Y ), for DMC pY|X with

  • r without feedback

Weak converse: Fano’s inequality, data processing inequality, and DMC assumption Achievability: random coding argument, typicality decoder Feedback does not increase the capacity of DMC. Symmetric channel capacity = log|Y| − H (p ), where p permutes all rows of pY|X. Erasure channel capacity = (1 − p) log|X|. Joint source-channel coding theorem: R <

C H({Si} ) =

⇒ R is achievable; R >

C H({Si} ) =

⇒ R is not achievable. Source-channel separation is optimal.

56 / 56 I-Hsiang Wang IT Lecture 4