Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of - - PowerPoint PPT Presentation

lecture 4 noisy channel coding
SMART_READER_LITE
LIVE PREVIEW

Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of - - PowerPoint PPT Presentation

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University


slide-1
SLIDE 1

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost

Lecture 4 Noisy Channel Coding

I-Hsiang Wang

Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw

November 2, 2015

1 / 74 I-Hsiang Wang IT Lecture 4

slide-2
SLIDE 2

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost

The Channel Coding Problem

Channel Encoder Channel Decoder

xN yN

Noisy Channel

w b w

Meta Description

1 Message: Random message W ∼ Unif [1 : 2K]. 2 Channel: Consist of an input alphabet X, an output alphabet Y,

and a family of conditional distributions { p ( yk

  • xk, yk−1)

| k ∈ N } determining the stochastic relationship between the output symbol yk and the input symbol xk along with all past signals ( xk−1, yk−1) .

3 Encoder: Encode the message w by a length N codeword xN ∈ X N. 4 Decoder: Reconstruct message

w from the channel output yN.

5 Efficiency: Maximize the code rate R ≜ K N bits/channel use, given

certain decoding criterion.

2 / 74 I-Hsiang Wang IT Lecture 4

slide-3
SLIDE 3

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost

Decoding Criterion: Vanishing Error Probability

Channel Encoder Channel Decoder

xN yN

Noisy Channel

w b w

A key performance measure: Error Probability P(N)

e

≜ P { W ̸= W } . Question: Is it possible to get zero error probability? Ans: Probably not, unless the channel noise has some special structure. Following the development of lossless source coding, Shannon turned the attention to answering the following question: Is it possible to have a sequence of encoder/decoder pairs such that P(N)

e

→ 0 as N → ∞? If so, what is the largest possible code rate R where vanishing error probability is possible? Recall: In lossless source coding, we see that the infimum of compression rates where vanishing error probability is possible is H ({Si} ).

3 / 74 I-Hsiang Wang IT Lecture 4

slide-4
SLIDE 4

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost

Rate

R

Block Length

N P(N)

e

Probability

  • f Error

Capacity: Take N → ∞, Require P(N)

e

→ 0 = ⇒ sup R = C. Error Exponent: Take N → ∞, Fix rate R = ⇒ min P(N)

e

≈ 2−NE(R). Finite Block Length: Fix N, Require P(N)

e

≤ ε = ⇒ sup R = C − √

V NQ−1 (ε) + O

(

log N N

) . Remark: For source coding, one can establish a similar framework.

4 / 74 I-Hsiang Wang IT Lecture 4

slide-5
SLIDE 5

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost

In this lecture we only focus on capacity. In other words, we ignore the issue of finite block length (FBL). FBL performance can be obtained via techniques extending from CLT. We do not pursue finer analysis on the error probability via large deviation techniques either.

5 / 74 I-Hsiang Wang IT Lecture 4

slide-6
SLIDE 6

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost

Discrete Memoryless Channel (DMC)

In order to demonstrate the key ideas in channel coding, in this lecture we shall focus on discrete memoryless channels (DMC) defined below. Definition 1 (Discrete Memoryless Channel) A discrete channel ( X, { p ( yk

  • xk, yk−1)

| k ∈ N } , Y ) is memoryless if ∀ k ∈ N, p ( yk

  • xk, yk−1)

= pY|X (yk|xk) . In other words, Yk − Xk − ( Xk−1, Yk−1) . Here the conditional p.m.f. pY|X is called the channel law or channel transition function. Question: is our definition of a channel sufficient to specify p ( yN xN) , the stochastic relationship between the channel input (codeword) xN and the channel output yN?

6 / 74 I-Hsiang Wang IT Lecture 4

slide-7
SLIDE 7

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost

p ( yN xN) = p ( xN, yN) p (xN) p ( xN, yN) =

N

k=1

p ( xk, yk

  • xk−1, yk−1)

=

N

k=1

p ( yk

  • xk, yk−1)

p ( xk

  • xk−1, yk−1)

Hence, we need to further specify { p ( xk

  • xk−1, yk−1)

| k ∈ N } , which cannot be obtained from p ( xN) . Interpretation: { p ( xk

  • xk−1, yk−1)

| k ∈ N } is induced by the encoding function, which implies that the encoder can potentially make use of the past channel output, i.e., feedback.

7 / 74 I-Hsiang Wang IT Lecture 4

slide-8
SLIDE 8

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost

DMC without Feedback

Channel Encoder

xk yk

Noisy Channel

w

No Feedback

Channel Encoder

xk yk

Noisy Channel

w

D

yk−1

With Feedback

Suppose in the system, the encoder has no knowledge about the realization of the channel output, then, p ( xk

  • xk−1, yk−1)

= p ( xk

  • xk−1)

for all k ∈ N, and it is said the the channel has no feedback. In this case, specifying { p ( yk

  • xk, yk−1)

| k ∈ N } suffices to specify p ( yN xN) . Proposition 1 (DMC without Feedback) For a DMC ( X, pY|X, Y ) without feedback, p ( yN xN) =

N

k=1

pY|X (yi|xi).

8 / 74 I-Hsiang Wang IT Lecture 4

slide-9
SLIDE 9

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost

Overview

In this lecture, we would like to establish the following (informally described) noisy channel coding theorem due to Shannon: For a DMC ( X, pY|X, Y ) , the maximum code rate with vanishing error probability is the channel capacity C ≜ max

pX(·) I (X ; Y ) .

The above holds regardless of the availability of feedback. To demonstrate this result, we organize the lecture as follows:

1 Give the problem formulation, state the main theorem, and visit a

couple of examples to show how to compute channel capacity.

2 Prove the converse part: an achievable rate cannot exceed C. 3 Prove the achievability part with a random coding argument.

9 / 74 I-Hsiang Wang IT Lecture 4

slide-10
SLIDE 10

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

3 Channel Coding with Input Cost

Cost Constraints Converse Proof Achievability Proof

10 / 74 I-Hsiang Wang IT Lecture 4

slide-11
SLIDE 11

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

3 Channel Coding with Input Cost

Cost Constraints Converse Proof Achievability Proof

11 / 74 I-Hsiang Wang IT Lecture 4

slide-12
SLIDE 12

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Channel Coding without Feedback: Problem Setup

Channel Encoder Channel Decoder

xN yN

Noisy Channel

w b w

1 A

( 2NR, N ) channel code consists of

an encoding function (encoder) encN : [1 : 2K] → X N that maps each message w to a length N codeword xN, where K ≜ ⌈NR⌉. a decoding function (decoder) decN : YN → [1 : 2K] ∪ {∗} that maps a channel output sequence yN to a reconstructed message w or an error message ∗.

2 The error probability is defined as P(N) e

≜ P { W ̸= W } .

3 A rate R is said to be achievable if there exist a sequence of

( 2NR, N ) codes such that P(N)

e

→ 0 as N → ∞. The channel capacity is defined as C ≜ sup {R | R : achievable}.

12 / 74 I-Hsiang Wang IT Lecture 4

slide-13
SLIDE 13

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Channel Coding Theorem for Discrete Memoryless Channel

Theorem 1 (Channel Coding Theorem for DMC without Feedback) The capacity C of the DMC p (y|x) without feedback is given by C = max

p(x) I (X ; Y ) .

(1) The capacity formula (1) is intuitive, since I (X ; Y ) represents the amount of information about the channel input X that one can infer from the channel output Y. The maximization over p (x) stands for choosing the best possible input distribution so that the amount of information transfer is maximized.

13 / 74 I-Hsiang Wang IT Lecture 4

slide-14
SLIDE 14

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Rest of the lecture:

1 First we give some examples of noisy channels to show how to

compute capacity.

2 Then, we prove that for any rate R > C, it is impossible to have

vanishing error probability (converse).

3 Finally, we prove that for any R < C, there exist a sequence of

encoding/decoding schemes such that the error probability vanishes as blocklength tends to ∞ (achievability), based on a probabilistic argument called random coding.

14 / 74 I-Hsiang Wang IT Lecture 4

slide-15
SLIDE 15

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Binary Symmetric Channel

A binary symmetric channel (BSC) consists of Binary input/output X = Y = {0, 1}. Channel law p (y|x) = [1 − p p p 1 − p ] . The capacity of BSC is CBSC = 1 − Hb (p).

1 1 p 1 − p p 1 − p X Y

To compute BSC capacity, observe I (X ; Y ) = H (Y ) − H (Y |X ), and H (Y |X = 0 ) = H (Y |X = 1 ) = Hb (p) = ⇒ H (Y |X ) = Hb (p). H (Y ) ≤ log 2 = 1, with equality iff Y is uniform. Question: Is it possible to choose a p (x) such that Y is uniform? Ans: Yes, choose X to be uniform = ⇒ C = max

p(x) I (X ; Y ) = 1 − Hb (p).

15 / 74 I-Hsiang Wang IT Lecture 4

slide-16
SLIDE 16

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Binary Erasure Channel

A binary erasure channel (BEC) consists of Binary input X = {0, 1} and output with erasure Y = {0, 1, ∗}. Channel law p (y|x) = [1 − p p p 1 − p ] . The capacity of BEC is CBEC = 1 − p.

1 1 p 1 − p p 1 − p X Y ∗

Suppose we begin with I (X ; Y ) = H (Y ) − H (Y |X ). Then, H (Y |X = 0 ) = H (Y |X = 1 ) = Hb (p) = ⇒ H (Y |X ) = Hb (p). H (Y ) ≤ log 3, with equality iff Y is uniform. Question: Is it possible to choose a p (x) such that Y is uniform? Ans: No. So, we cannot say that maxp(x) H (Y ) = log 3.

16 / 74 I-Hsiang Wang IT Lecture 4

slide-17
SLIDE 17

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

1 1 p 1 − p p 1 − p X Y ∗ 1 1 X Y ∗ 1 1 α 1 − α

Instead, we can start with I (X ; Y ) = H (X ) − H (X |Y ). Then, we have the reverse channel law p (x|y) =   1 α 1 − α 1  , where α ≜ P {X = 0}. H (X |Y = 0 ) = H (X |Y = 1 ) = 0, H (X |Y = ∗ ) = Hb (α) = H (X ) = ⇒ H (X |Y ) = P {Y = ∗} = pH (X ). H (X ) ≤ 1, with equality iff X is uniform. Hence, CBEC = maxp(x) (1 − p) H (X ) = 1 − p.

17 / 74 I-Hsiang Wang IT Lecture 4

slide-18
SLIDE 18

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Erasure Channel

We can generalize BEC to the following erasure channel: Input X, output Y = X ∪ {∗}. Channel law p (y|x) =      1 − p, y = x p, y = ∗ 0,

  • therwise

A motivation for this model is from networking, where the erasure ∗ models the “packet drop”.

Exercise 1 Show that the capacity of the erasure channel is CEC = (1 − p) log|X|.

18 / 74 I-Hsiang Wang IT Lecture 4

slide-19
SLIDE 19

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Symmetric Channel

In computing the capacity of BSC, we observe that

1 H (Y |X ) = Hb (p) regardless of p (x).

Why? Because all rows of p (y|x) are permutations of a same probability vector [p 1 − p] .

2 H (Y ) = log|Y| can be attained, that is, Y can be made uniform by

choosing X to be uniform. Why? Because all columns of p (y|x), have the same sum ∑

x p (y|x).

Definition 2 (Symmetric Channel) A symmetric channel is a channel with channel law p (y|x) satisfying (1) all rows of p (y|x) are permutations of a same probability vector p, and (2) all columns of p (y|x), have the same sum ∑

x p (y|x).

Exercise 2 Show that the capacity of a symmetric channel is log|Y| − H (p).

19 / 74 I-Hsiang Wang IT Lecture 4

slide-20
SLIDE 20

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Computing Capacity of DMC via Convex Optimization

For a DMC, we are able to find its capacity efficiently by revoking efficient algorithms in solving convex programs, since I (X ; Y ) is a concave function of p (x) for fixed p (y|x). Proposition 2 I (X ; Y ) is a concave function of p (x) for fixed p (y|x). pf: By definition, I (X ; Y ) = H (Y ) − H (Y |X ). H (Y |X ) = ∑

x p (x) H (Y |X = x ) is a linear function of p (x), because

H (Y |X = x ) = − ∑ p (y|x) log p (y|x) is constant for fixed p (y|x). H (Y ) is a concave function of p (y). p (y) is a linear function of p (x) for fixed p (y|x). Hence, H (Y ) is a concave function of p (x) for fixed p (y|x). Putting the above together, we complete the proof.

20 / 74 I-Hsiang Wang IT Lecture 4

slide-21
SLIDE 21

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

3 Channel Coding with Input Cost

Cost Constraints Converse Proof Achievability Proof

21 / 74 I-Hsiang Wang IT Lecture 4

slide-22
SLIDE 22

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Proof of the (Weak) Converse (1)

We would like to show that for every sequence of ( 2NR, N ) codes such that P(N)

e

→ 0 as N → ∞, the rate R ≤ max

p(x) I (X ; Y ).

pf: Note that W ∼ Unif [1 : 2K] and hence K = H (W ). NR ≤ H (W ) = I ( W ; W ) + H ( W

  • W

) (2) ≤ I ( W ; YN ) + ( 1 + P(N)

e

log ( 2K + 1 )) (3) ≤

N

k=1

I ( W ; Yk

  • Yk−1 )

+ ( 1 + P(N)

e

(NR + 2) ) (4) (2) is due to K = ⌈NR⌉ ≥ NR and chain rule. (3) is due to W − YN − W and Fano’s inequality. (4) is due to chain rule and 2K + 1 ≤ 2NR+1 + 1 ≤ 2 × 2NR+1.

22 / 74 I-Hsiang Wang IT Lecture 4

slide-23
SLIDE 23

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Proof of the (Weak) Converse (2)

Set εN ≜ 1

N

( 1 + P(N)

e

(NR + 2) ) , we see that εN → 0 as N → ∞ because limN→∞ P(N)

e

= 0. The next step is to relate ∑N

k=1 I

( W ; Yk

  • Yk−1 )

to I (X ; Y ), by the following manipulation: I ( W ; Yk

  • Yk−1 )

≤ I ( W, Yk−1 ; Yk ) ≤ I ( W, Yk−1, Xk ; Yk ) (5) = I (Xk ; Yk ) ≤ max

p(x) I (X ; Y )

(6) (5) is due to the fact that conditioning reduces entropy. (6) is due to DMC: p ( yk|xk, yk−1, w ) = p ( yk|xk, yk−1) = p (yk|xk) = ⇒ Yk − Xk − ( W, Xk−1, Yk−1) = ⇒ Yk − Xk − ( W, Yk−1) .

23 / 74 I-Hsiang Wang IT Lecture 4

slide-24
SLIDE 24

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Proof of the (Weak) Converse (3)

Hence, we have NR ≤

N

k=1

I ( W ; Yk

  • Yk−1 )

+ NεN ≤ N max

p(x) I (X ; Y ) + NεN

= ⇒ R ≤ max

p(x) I (X ; Y ) + εN, ∀ N.

Taking N → ∞, we have: R ≤ max

p(x) I (X ; Y ) if it is achievable.

Remark: Similar to the source coding problem, a stronger version of the converse holds in the channel coding problem as well: if R > C, then P(N)

e

→ 1 as N → ∞ for any encoding/decoding functions.

24 / 74 I-Hsiang Wang IT Lecture 4

slide-25
SLIDE 25

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

3 Channel Coding with Input Cost

Cost Constraints Converse Proof Achievability Proof

25 / 74 I-Hsiang Wang IT Lecture 4

slide-26
SLIDE 26

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Channel Coding with Feedback: Problem Setup

Channel Encoder Channel Decoder

xN yN

Noisy Channel

w b w

D

1 A

( 2NR, N ) channel code consists of

an encoding function (encoder) encN : [1 : 2K]×YN−1 → X N that maps each message w to a length N codeword xN, where K ≜ ⌈NR⌉, and the k-th symbol xk is a function of ( w, yk−1) for all k ∈ [1 : N]. a decoding function (decoder) decN : YN → [1 : 2K] ∪ {∗} that maps a channel output sequence yN to a reconstructed message w or an error message ∗.

2 The error probability is defined as P(N) e

≜ P { W ̸= W } .

3 A rate R is said to be achievable if there exist a sequence of

( 2NR, N ) codes such that P(N)

e

→ 0 as N → ∞. The channel capacity is defined as C ≜ sup {R | R : achievable}.

26 / 74 I-Hsiang Wang IT Lecture 4

slide-27
SLIDE 27

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Dependency Graph: Without vs. With Feedback

X1 W X2 Xk XN YN Yk Y2 Y1 c W XN Y N pY |X encN decN

No Feedback

27 / 74 I-Hsiang Wang IT Lecture 4

slide-28
SLIDE 28

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Dependency Graph: Without vs. With Feedback

X1 W X2 Xk XN YN Yk Y2 Y1 c W XN Y N pY |X encN decN

With Feedback

28 / 74 I-Hsiang Wang IT Lecture 4

slide-29
SLIDE 29

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Channel Capacity Proof of the Weak Converse Feedback Capacity

Feedback Capacity

Theorem 2 (Channel Coding Theorem for DMC with Feedback) The capacity of the DMC p (y|x) with feedback is given by (1), the same as that without feedback. In other words, feedback does not increase the channel capacity for DMC. The proof is immediate because in the converse proof of channel coding theorem without feedback, we do not make use of the assumption that there is no feedback. In other words, the proof is identical even with feedback. Remark: Although feedback does not increase capacity, it does improve the reliability (error exponent) and finite-blocklength performance greatly. Furthermore, the design and the complexity of the coding scheme may also be greatly simplified and reduced due to feedback. The details are

  • ut of scope of this lecture.

29 / 74 I-Hsiang Wang IT Lecture 4

slide-30
SLIDE 30

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

3 Channel Coding with Input Cost

Cost Constraints Converse Proof Achievability Proof

30 / 74 I-Hsiang Wang IT Lecture 4

slide-31
SLIDE 31

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

3 Channel Coding with Input Cost

Cost Constraints Converse Proof Achievability Proof

31 / 74 I-Hsiang Wang IT Lecture 4

slide-32
SLIDE 32

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Overview

In order to prove the achievability part of Theorem 1, we need to show the following mathematical statement: ∀ R < C, R ≥ 0, ∃ a sequence of ( 2NR, N ) codes such that lim

N→∞ P(N) e

= 0. In general, to prove the existence of certain objects satisfying some desirable properties, there are two possible ways:

1 Explicitly construct an object and prove that the properties hold. 2 Assume that no objects can satisfy the properties, and show

contradiction. The achievability proof presented in this lecture is more of the second flavor, and in fact belongs to the so-called probabilistic method.

32 / 74 I-Hsiang Wang IT Lecture 4

slide-33
SLIDE 33

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

The Probabilistic Method

What is the probabilistic method? Roughly speaking, in order to show the existence of certain objects satisfying some desirable properties, One first imposes particular probability distribution over the possible

  • bject space.

Then, by showing that “on average” the properties hold or the properties hold with non-zero probability, one concludes the existence of such objects. Example 1 Given a set of n-dimensional unit vectors {v1, v2, . . . , vk}, show that ∃ xi ∈ {±1}, i = 1, . . . , k such that

  • ∑k

i=1 xivi

√ k.

33 / 74 I-Hsiang Wang IT Lecture 4

slide-34
SLIDE 34

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

pf: Let {Xi}k

i=1 be i.i.d. r.v.’s with P {Xi = 1} = P {Xi = −1} = 1 2.

Define V ≜ ∑k

i=1 Xivi. Compute E

[ |V|2] as follows: E [ |V|2] = E [ VTV ] = E [( k ∑

i=1

XivT

i

) ( k ∑

i=1

Xivi )] = E  

k

i=1 k

j=1

XiXjvT

i vj

  =

k

i=1 k

j=1

E [XiXj] vT

i vj

∵ {Xi} are mutually independent, E [XiXj] = E [Xi] E [Xj] = 0 for i ̸= j. ∴ E [ |V|2] = ∑k

i=1 E

[ X2

i

] ∥vi∥2 = k. Hence, ∃ xi ∈ {±1}, i = 1, . . . , k such that

  • ∑k

i=1 xivi

  • k. Otherwise,

E [ |V|2] should be less than k, leading to contradiction.

34 / 74 I-Hsiang Wang IT Lecture 4

slide-35
SLIDE 35

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Paul Erdős (1913 – 1996)

35 / 74 I-Hsiang Wang IT Lecture 4

slide-36
SLIDE 36

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Coding over Noisy Channel

Before we prove the main theorem, let us set up a few notations related to coding over noisy channel.

1 Codebook c =

{ xN (1) , xN (2) . . . xN ( 2K)} consists of the 2K codewords and is the range of the encoding function.

2 ML Decoder (maximum likelihood) is the optimal decoder that

minimizes the probability of error P(N)

e

when the messages are uniformly chosen (uniform prior):

  • wML = arg maxw∈[1:2K] p

( yN xN (w) ) .

3 Probability of Error of Message m: λm ≜ P

{

  • W ̸= W
  • W = m

} In principle, one can derive the ML decoding rule and compute P(N)

e

for a given codebook. But, there are some challenges toward proving the channel coding theorem.

36 / 74 I-Hsiang Wang IT Lecture 4

slide-37
SLIDE 37

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Challenges and Work-Arounds

First, the expression of error probability of ML is usually intractable, and it is hard to obtain any insight regarding the asymptotic behaviors. Second, it is unclear how to construct the codebook and the corresponding decoding scheme. In summary, to prove the achievability part of the channel coding theorem, there are two main challenges we shall overcome:

1 How to show the existence of good codebooks?

We circumvent the issue of explicit construction by using a random coding argument (a kind of the probabilistic method)

2 How to analyze the error probability?

We circumvent the issue of ML decoding error analysis by using a suboptimal decoder and derive upper bounds on the probability of error of the chosen decoder.

37 / 74 I-Hsiang Wang IT Lecture 4

slide-38
SLIDE 38

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Proof Program

1 Random Codebook Generation:

Generate an ensemble of codebooks according to certain probability

  • distribution. Hence, codebook C becomes a random object.

2 Error Probability Analysis:

Goal: Show that as N → ∞, EC [ P(N)

e,ML (C)

] → 0, and conclude that there must exist a codebook c such that the decoding error probability P(N)

e,ML → 0.

To simplify analysis, we shall introduce suboptimal decoders and give a tractable upper bound of error probability using union of events bound.

38 / 74 I-Hsiang Wang IT Lecture 4

slide-39
SLIDE 39

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Random Codebook Generation

A simple way is to i.i.d. generate 2K codewords, and each codeword p ( xN) ∼ ∏

i=1 pX (xi).

In other words, if we stack all 2K codewords together into a 2K × N matrix C, the elements of the matrix C will be i.i.d. distributed according to pX: (each row is a codeword) c =      X1 (1) X2 (1) · · · XN (1) X1 (2) X2 (2) · · · XN (2) . . . . . . ... . . . X1 ( 2K) X2 ( 2K) · · · XN ( 2K)      and p (c) ≜ P {C = c} = ∏2K

w=1

∏N

i=1 pX (xi(w)).

It turns out the symmetry in such codebook ensemble distribution helps simplify the analysis.

39 / 74 I-Hsiang Wang IT Lecture 4

slide-40
SLIDE 40

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Encoding and Decoding

For a realization c of the codebook random ensemble C, we describe the encoding and decoding methods below. Encoding: for a message m ∈ [1 : 2K], choose the m-th row of the codebook c and send it out. Decoding: ideally one would like to use the following ML decoding rule:

  • wML = arg maxw∈[1:2K] p

( yN xN (w) ) . However, the performance of ML decoder is usually not tractable, as mentioned before. Instead, we introduce a suboptimal decoder based on typical sequences as follows:

  • wT = a unique w such that

( xN(w), yN) ∈ T (N)

ε

(X, Y). Note: there are some other suboptimal decoders can be used, such as threshold decoders.

40 / 74 I-Hsiang Wang IT Lecture 4

slide-41
SLIDE 41

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Error Probability Analysis (1)

Since the ML decoder is optimal, we can analyze the performance of the typicality decoder and use it as an upper bound. Hence, our goal is turned to proving limN→∞ EC [ P(N)

e,T (C)

] = 0.

1 The first step is to use the symmetry of codebook ensemble to

simplify EC [ P(N)

e,T (C)

] and argue that we can focus on analyzing the error probability of the first codeword XN (1) averaged over C: EC [ P(N)

e,T (C)

] = EC [ 2−K ∑2K

m=1 λm (C)

] = 2−K ∑

m

EC [λm (C)] = 2−K ∑

m

EC [λ1 (C)] = EC [λ1 (C)] = P {Error, averaged over C|W = 1}

41 / 74 I-Hsiang Wang IT Lecture 4

slide-42
SLIDE 42

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Error Probability Analysis (2)

2 For notational simplicity, use E denote the text “Error” event and

drop the “averaged over C”. Our next focus is to upper bound P {E|W = 1} ≜ P1 (E). The trick here is to distinguish into two different kinds of errors: E = Ea ∪ Et,    Ea ≜ {( XN (1) , YN) / ∈ T (N)

ε

} Et ≜ {( XN (w) , YN) ∈ T (N)

ε

for some w ̸= 1 } The core is whether or not the joint sequence ( XN (w) , YN) are ε-typical. Let us define Aw ≜ {( XN (w) , YN) ∈ T (N)

ε

} . We can then rewrite Ea = Ac

1, Et = ∪w̸=1Aw, and hence

E = Ea ∪ Et = Ac

1 ∪ (∪w̸=1Aw).

42 / 74 I-Hsiang Wang IT Lecture 4

slide-43
SLIDE 43

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Error Probability Analysis (3)

3 We are now ready to apply the union of events bound:

P1 {E} = P1 {Ac

1 ∪ (∪w̸=1Aw)} ≤ P1 {Ac 1} + 2K

w=2

P1 {Aw} . Next, we shall develop upper bounds on

the probability that the actual transmitted codeword XN (1) and the actual received signal YN are not (jointly) typical. the probability that some other (random) codeword XN (̸= 1) and the actual received signal YN are (jointly) typical.

Lemma 1 (A Key Lemma) P1 {A1} ≥ 1 − ε for N large enough, and P1 {Aw} ≤ 2−N(I(X ;Y )−δ(ε)) for all w ̸= 1, where δ (ε) → 0 as ε → 0.

43 / 74 I-Hsiang Wang IT Lecture 4

slide-44
SLIDE 44

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Error Probability Analysis (4)

4 Finally, let us put all the above together and apply Lemma 1:

EC [ P(N)

e,T (C)

] ≜ P {E} = P {E|W = 1} ≜ P1 {E} ≤ P1 {Ac

1} + 2K

w=2

P1 {Aw} ≤ ε +

2K

w=2

2−N(I(X ;Y )−δ(ε)) ≤ ε + 2−N(I(X ;Y )−δ(ε)−R) As long as R ≤ I (X ; Y ) − δ(ε), we are able to make P {E} ≤ 2ε for N large enough, which is equivalent to limN→∞ EC [ P(N)

e,T (C)

] → 0.

44 / 74 I-Hsiang Wang IT Lecture 4

slide-45
SLIDE 45

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Completion of the Achievability Proof

We have shown that as long as R ≤ I (X ; Y ) − δ(ε), limN→∞ EC [ P(N)

e,T (C)

] → 0, and hence there must exist a realization of codebook c such that P(N)

e,T (c) → 0 as N → ∞.

Finally, taking the codebook generating distribution pX = arg maxp(x)I (X ; Y ) , we conclude that ∀ R < C = maxp(x) I (X ; Y ), R is achievable.

45 / 74 I-Hsiang Wang IT Lecture 4

slide-46
SLIDE 46

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Proof of Lemma 1 (1): Recap of Typicality

Recall: by definition, an ε-typical (vector) sequence (xn, yn) shall satisfy |π (a, b|xn, yn) − pX,Y (a, b)| ≤ εpX,Y (a, b) , ∀ (a, b) ∈ X × Y.

(Note: we can think of (X, Y) as a r.v. and apply the same definition of typicality!)

Hence, if (Xn, Yn) ∼ ∏n

i=1 pX,Y (xi, yi), then we have 0 (xn, yn) ∈ T (n) ε

(X, Y) = ⇒ xn ∈ T (n)

ε

(X) , yn ∈ T (n)

ε

(Y).

1 ∀ (xn, yn) ∈ T (n) ε

(X, Y),

  • − 1

n log p (xn, yn) − H (X, Y )

  • ≤ δ(ε),

where δ(ε) = εH (X, Y ).

2 p

( T (n)

ε

(X, Y) ) ≥ 1 − ε for n large enough.

3 |T (n) ε

(X, Y)| ≤ 2n(H(X,Y )+δ(ε)).

4 |T (n) ε

(X, Y)| ≥ (1 − ε)2n(H(X,Y )−δ(ε)) for n large enough.

46 / 74 I-Hsiang Wang IT Lecture 4

slide-47
SLIDE 47

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Proof of Lemma 1 (2): Typical with Actual Codeword

Let us first consider P1 {A1} = P {( XN (1) , YN) ∈ T (N)

ε

  • W = 1

} . We are averaging over a random codebook ensemble C, and the random codebook is generated element-by-element i.i.d. based on pX. DMC without feedback implies p ( yN xN) = ∏N

i=1 pY|X (yi|xi).

Hence, given W = 1, ( XN(1), YN) has the following joint distribution: p ( xN, yN) = p ( xN) · p ( yN xN) = ∏N

i=1 pX (xi) ·

∏N

i=1 pY|X (yi|xi)

= ∏N

i=1 pX,Y (xi, yi)

By Property 2 (LLN), we see that for N large enough, P1 {A1} = P {( XN (1) , YN) ∈ T (N)

ε

  • W = 1

} ≥ 1 − ε.

47 / 74 I-Hsiang Wang IT Lecture 4

slide-48
SLIDE 48

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Proof of Lemma 1 (3): Typical with a Wrong Codeword

Consider P1 {Aw} = P {( XN (w) , YN) ∈ T (N)

ε

  • W = 1

} for w ̸= 1. Note that we are averaging over a random codebook ensemble C, and the random codebook is generated element-by-element i.i.d. based on pX. Hence, although XN (1) and XN (w) have the same marginal distribution pX, they are actually independent. Due to DMC, ( XN(1), YN) ⊥ ⊥ XN (w). Hence, YN ⊥ ⊥ XN (w), and P1 {Aw} = ∑

(xN,yN)∈T (N)

ε

p ( xN) · p ( yN) ≤ 2N(1+ε)H(X,Y )

  • cardinality upper

bound on typical set

· 2−N(1−ε)H(X)

  • upper bound on prob.
  • f a typical sequence

· 2−N(1−ε)H(Y)

  • upper bound on prob.
  • f a typical sequence

= 2−N(I(X ;Y )−δ(ε)), where δ (ε) = ε (H (X, Y ) + H (X) + H (Y)) → 0 as ε → 0.

48 / 74 I-Hsiang Wang IT Lecture 4

slide-49
SLIDE 49

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Some Reflections

Reflection 1: Mutual independence of codewords.

In the random coding argument of the proof, 2K × N elements of the codebook matrix C are generated i.i.d., and hence the 2K rows { XN (1) , . . . , XN ( 2K)} are mutually independent. However, in the proof we only require pairwise independence: XN (1) ⊥ ⊥ XN (w) , ∀ w ̸= 1.

Reflection 2: Typicality decoder.

We use typicality decoder other than the optimal ML decoder to find tractable upper bounds on the error probability. There are other suboptimal decoders can be used. For example, the following threshold decoder can also work:

  • wth ≜ a unique w such that i

( xN (w) ; yN) > β, where i ( xN; yN) ≜ log

p(xN,yN) p(xN)p(yN) = N

k=1

log

pY|X(yk|xk) pX(xk)

, and β ≜ I (X ; Y ) − ε.

49 / 74 I-Hsiang Wang IT Lecture 4

slide-50
SLIDE 50

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

3 Channel Coding with Input Cost

Cost Constraints Converse Proof Achievability Proof

50 / 74 I-Hsiang Wang IT Lecture 4

slide-51
SLIDE 51

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Joint Source-Channel Coding: Problem Setup

Encoder Decoder Channel Source Destination

pY |X b sNs sNs xNc yNc {Si}

Source model: discrete stationary ergodic with entropy rate H ({Si} ). Channel model: DMC pY|X with channel capacity C ( pY|X ) .

1 A

( |S|NcR , Nc ) joint source-channel code consists of

an encoding function (encoder) encNc : SNs → X Nc that maps each source sequence sNs to a length Nc codeword xNc, Ns ≜ ⌈NcR⌉. a decoding function (decoder) decNc : YNc → SNs that maps a channel output sequence yNc to a reconstructed sequence sNs.

2 The error probability is defined as P(Nc) e

≜ P { SNs ̸= SNs } .

3 A rate R is said to be achievable if there exist a sequence of

( |S|NcR , Nc ) codes such that P(Nc)

e

→ 0 as Nc → ∞.

51 / 74 I-Hsiang Wang IT Lecture 4

slide-52
SLIDE 52

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Source-Channel Separation Theorem

Theorem 3 (Source-Channel Separation)

1 If R < C H({Si} ), then R is achievable, i.e., lossless reconstruction of

source {Si} is possible via the noisy channel pY|X.

2 Conversely, if R > C H({Si} ), then R is not achievable, i.e., lossless

reconstruction is impossible.

Source Encoder Source Noisy Channel Channel Encoder Destination Source Decoder Channel Decoder Binary Interface

sNs b sNs xNc yNc bK b bK

52 / 74 I-Hsiang Wang IT Lecture 4

slide-53
SLIDE 53

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Proof of Achievability

pf: (Achievability Part): Choose a ( 2NsRs, Ns ) lossless source code with Rs = H ({Si} ) + εs. Choose a ( 2NcRc, Nc ) channel code with Rc = C − εc. Due the the channel coding theorem, the binary sequence bK lives in the digital interface between the source and the channel coders can be decoded with vanishing error probability. Due to the lossless source coding theorem, the source sequences can be reconstructed with vanishing error probability as long as the bit sequence bK can be successfully decoded by the channel decoder. Concatenate the above two codes together, we see that as long as NsRs < NcRc ⇐ ⇒

Ns Nc < Rc Rs = C−εc H({Si} )+εs , the separation scheme is

able to reconstruct the source sequence with vanishing error probability. Since εs, εc can be made arbitrarily small, as long as R <

C H({Si} ), it is

achievable.

53 / 74 I-Hsiang Wang IT Lecture 4

slide-54
SLIDE 54

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Achievability Proof Source-Channel Separation

Proof of Converse

pf: (Converse Part): We shall prove that ∀ achievable R, R ≤

C H({Si} ).

NsH ({Si} ) ≤ H ( SNs ) = I ( SNs ; SNs ) + H ( SNs

  • SNs )

(7) ≤ I ( SNs ; YNc ) + ( 1 + P(Nc)

e

Ns log|S| ) (8) ≤

Nc

k=1

I ( SNs ; Yk

  • Yk−1 )

+ ( 1 + P(Nc)

e

Ns log|S| ) ≤ Nc (C + εNc) , where εNc → 0 as Nc → ∞. (9) (7) is due to the property of entropy rate and chain rule. (8) is due to SNs − YNc − SNs and Fano’s inequality. (9) is due to similar steps as in the channel coding converse proof. Hence, R ≤ Ns

Nc ≤ C H({Si} ) if R is achievable.

54 / 74 I-Hsiang Wang IT Lecture 4

slide-55
SLIDE 55

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

3 Channel Coding with Input Cost

Cost Constraints Converse Proof Achievability Proof

55 / 74 I-Hsiang Wang IT Lecture 4

slide-56
SLIDE 56

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

3 Channel Coding with Input Cost

Cost Constraints Converse Proof Achievability Proof

56 / 74 I-Hsiang Wang IT Lecture 4

slide-57
SLIDE 57

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

Channel Encoder Channel Decoder

xN yN

Noisy Channel

w b w

We have investigated the fundamental limit on the data rate of reliable communication over a DMC ( X, pY|X, Y ) . The only constraint on the design of coding schemes are specified by the channel – channel inputs must lie in X and channel outputs must lie in Y. In practice, there might be additional constraints on certain costs such as power consumption in transmitting/receiveing certain symbols. The channel capacity becomes larger when the cost is less constrained. Question: What is the channel capacity under certain cost constraints? The goal of the following treatment is to take average cost constraint into account and derive the capacity-cost function.

57 / 74 I-Hsiang Wang IT Lecture 4

slide-58
SLIDE 58

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

Input Cost and Average Cost Constraints

We begin the treatment with input cost. Later, we will make a remark on how to extend the framework to incorporate output cost. Definition 3 (Input cost function) A non-negative input cost function b : X → [0, ∞) is defined over the input alphabet X of a DMC ( X, pY|X, Y ) . Remark: One can always shift b (·) such that there exists a symbol xo ∈ X with b (xo) = 0. Hence, without loss of generality, we assume the existence of such zero-cost symbol xo ∈ X. In this lecture, we focus on average cost constraints as follows: Definition 4 (Average Input Cost Constraint) For N channel uses, the average input cost constraint requires the coding scheme satisfy 1

N

∑N

i=1 b (xi) ≤ B.

58 / 74 I-Hsiang Wang IT Lecture 4

slide-59
SLIDE 59

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

Channel Coding with Input Cost over DMC: Problem Setup

Channel Encoder Channel Decoder

xN yN

Noisy Channel

w b w

1 A

( 2NR, N, B ) channel code consists of

an encoding function (encoder) encN : [1 : 2K] → X N that maps each message w to a length N codeword xN, where K ≜ ⌈NR⌉. The codeword follows the input cost constraint

1 N

∑N

i=1 b (xi) ≤ B.

a decoding function (decoder) decN : YN → [1 : 2K] ∪ {∗} that maps a channel output yN to a reconstructed message w or an error ∗.

2 The error probability is defined as P(N) e

≜ P { W ̸= W } .

3 A rate R is said to be achievable with input cost B if there exist a

sequence of ( 2NR, N, B ) codes such that P(N)

e

→ 0 as N → ∞. The channel capacity is defined as C(B) ≜ sup {R | R : achievable}.

59 / 74 I-Hsiang Wang IT Lecture 4

slide-60
SLIDE 60

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

Channel Coding Theorem with Average Input Cost

Theorem 4 (Channel Coding Theorem for DMC with Average Input Cost) The capacity of the DMC p (y|x) with input cost B is given by C (B) = max

p(x): E[b(X)]≤BI (X ; Y ) .

(10) Remark: Compared to Theorem 1, the channel coding theorem without input cost constraint, there is an additional constraint in the extremal problem (10) laid on the expected cost E [b (X)]. Before we prove this theorem using standard arguments (Converse: Fano,

data processing, single letterization; Achievability: random coding, typicality),

let us first discuss some functional properties of C (B). These properties will be useful in proving the converse and the achievability.

60 / 74 I-Hsiang Wang IT Lecture 4

slide-61
SLIDE 61

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

Properties of the Capacity-Cost Function C (B)

Proposition 3 C (B) is non-decreasing, concave, and left continuous in B. pf: C(B) is non-decreasing in B since the maximization problem in (10) becomes more constrained when B is smaller. To show that C (B) is concave, we’d like to show for any λ ∈ (0, 1), λC (B1) + (1 − λ)C (B2) ≤ C (λB1 + (1 − λ)B2) . To prove this, let pi ≜ arg max

p(x): E[b(X)]≤Bi

I (X ; Y ), i = 1, 2, be the capacity-achieving distribution under cost constraint Bi. Define pλ ≜ λp1 + (1 − λ)p2. Let Xi ∼ pi for i = 1, 2 and Xλ ∼ pλ be the mixture of X1 and X2.

61 / 74 I-Hsiang Wang IT Lecture 4

slide-62
SLIDE 62

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

Since I (X ; Y ) is concave in p(x) for a given p(y|x), we have λC (B1) + (1 − λ)C (B2) = λI (X1 ; Y ) + (1 − λ)I (X2 ; Y ) ≤ I (Xλ ; Y ) . Note E [b (Xλ)] = λE [b (X1)] + (1 − λ)E [b (X2)] ≤ λB1 + (1 − λ)B2. Hence, the average cost associated with Xλ is ≤ λB1 + (1 − λ)B2 = ⇒ I (Xλ ; Y ) ≤ C (λB1 + (1 − λ)B2). Finally, to prove that C (B) is left continuous, we’d like to show that lim

B′↗B C (B′) = C (B) .

Note that for B′′ ≤ B, since C (·) is non-decreasing and concave, we have λC (B′′) + (1 − λ)C (B) ≤ C (λB′′ + (1 − λ)B) ≤ C (B) . Since B′ ≜ λB′′ + (1 − λ)B ↗ B as λ ↘ 0, and both sides in the above converges to C(B) as λ ↘ 0, proof complete.

62 / 74 I-Hsiang Wang IT Lecture 4

slide-63
SLIDE 63

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

3 Channel Coding with Input Cost

Cost Constraints Converse Proof Achievability Proof

63 / 74 I-Hsiang Wang IT Lecture 4

slide-64
SLIDE 64

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

Converse Proof

pf: Following the converse proof of DMC without input cost, we arrive at R − εN ≤ 1

N N

k=1

I (Xk ; Yk ) , where εN → 0 as N → ∞. By (10), ∀ k ∈ [1 : N], I (Xk; Yk) ≤ C (E [b (Xk)]). Let Bk ≜ E [b (Xk)]. Since 1

N

∑N

k=1 b (xk) ≤ B for all xN, we must have 1 N

∑N

k=1 Bk ≤ B.

Hence, R − εN ≤ 1

N

∑N

k=1 C (Bk) (a)

≤ C (

1 N

∑N

k=1 Bk

) (b) ≤ C (B) . (a) is due to concavity of C (B) in B. (b) is due to 1

N

∑N

k=1 Bk ≤ B, and C (B) is non-decreasing.

Therefore, for all R is achievable, R ≤ C (B).

64 / 74 I-Hsiang Wang IT Lecture 4

slide-65
SLIDE 65

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

1 Channel Capacity and the Weak Converse

Channel Capacity Proof of the Weak Converse Feedback Capacity

2 Achievability Proof and Source-Channel Separation

Achievability Proof Source-Channel Separation

3 Channel Coding with Input Cost

Cost Constraints Converse Proof Achievability Proof

65 / 74 I-Hsiang Wang IT Lecture 4

slide-66
SLIDE 66

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

Achievability proof mostly follows that of DMC without input cost. However, there is an issue: In random codebook generation, how to ensure all codewords satisfy the input cost constraint? Idea: If a generated codeword violates the input cost constraint, then we replace it by the zero-cost codeword xo ≜ [xo · · · xo ] when we send it.

66 / 74 I-Hsiang Wang IT Lecture 4

slide-67
SLIDE 67

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

x1 x2 x3 x2K . . .

Check & Replace

x3

Send to Channel

ENC x1 x2 x3 x2K . . .

Check & Replace Send to Channel

ENC xo

67 / 74 I-Hsiang Wang IT Lecture 4

slide-68
SLIDE 68

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

It is then tempting to apply the analysis and bounding techniques used in DMC without input cost constraints on this equivalent channel, which contains the “check-and-replace” procedure. However, this procedure involves the computation of 1

N

∑N

i=1 b (xi),

which is not symbol-by-symbol, and hence introduce memory into the equivalent channel. = ⇒ the previous analysis and bounding techniques cannot be used!

68 / 74 I-Hsiang Wang IT Lecture 4

slide-69
SLIDE 69

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

To circumvent the issue, we shall modify the proposed scheme into another one with worse error probability, using the following steps: Do not “replace” when a codeword violates the cost constraint. Instead, send it directly and take care of the violation at the decoder. If the decoded codeword violates the constraint, it declares an error. This way, if a violation happens, an error always occurs under the modified scheme, while in the original scheme, there is a slight chance that an error does not occur. In the following we focus on the modified scheme where the violation of cost constraints is taken care by the decoder, which introduces another kind of error event. Note: Once we show the existence of a sequence of modified schemes with vanishing error probability, we can add the “check-and-replace” procedure back and then obtain a valid scheme.

69 / 74 I-Hsiang Wang IT Lecture 4

slide-70
SLIDE 70

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

Achievability Proof (1)

Achievability proof mostly follows that of DMC without input cost. Keep in mind that if a chosen codeword violates the cost constraint, it results in a decoding error. How to control the probability of such violations? = ⇒ Typicality! Lemma 2 (Typical Average Lemma) For any nonnegative function g (x) on X, if xn ∈ T (n)

ε

(X), then (1 − ε) E [g (X)] ≤ 1

n n

i=1

g (xi) ≤ (1 + ε) E [g (X)] . pf: Straightforward due to the definition of typical sequences. Hence, we should step back and generate random codewords with a slightly smaller average cost

B 1+ε.

70 / 74 I-Hsiang Wang IT Lecture 4

slide-71
SLIDE 71

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

Achievability Proof (2)

Random Codebook Generation: Generate the random codebook C i.i.d. randomly according to pX (x) = arg max

p(x): E[b(X)]≤

B 1+ε

I (X; Y) . Observe that: If xN ∈ T (N)

ε

(X), it satisfies the cost constraint due to Lemma 2:

1 N

∑N

i=1 b (xi) ≤ (1 + ε)E [b (X)] = (1 + ε) B 1+ε = B.

If the generated xN / ∈ T (N)

ε

(X), it may violates the constraint. Nevertheless, the probability that this happens vanishes as N → ∞, so the (relaxed) decoder can declare an error whenever the decoded xN / ∈ T (N)

ε

(X).

71 / 74 I-Hsiang Wang IT Lecture 4

slide-72
SLIDE 72

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost Cost Constraints Converse Proof Achievability Proof

Achievability Proof (3)

Error Probability Analysis: Following the same line in that of DMC without input cost constraint, we arrive at upper bounding P {E|W = 1} ≜ P1 (E), where E = E0 ∪ Ac

1 ∪ (∪w̸=1Aw) ,

Aw ≜ {( XN (w) , YN) ∈ T (N)

ε

(X, Y) } , and E0 ≜ { XN (1) / ∈ T (N)

ε

(X) } . Upper bounding P1 (Ac

1) and P1 (Aw) for w ̸= 1 remains the same.

Upper bounding P1 (E0) is simple because it vanishes as N → ∞ due to the weak law of large numbers. Hence, we conclude that for any R < C (B/(1 + ε)), R is achievable. Since C (B) is left continuous in B, make C (B/(1 + ε)) arbitrarily close to C (B) from below, and conclude ∀ R < C (B), R is achievable.

72 / 74 I-Hsiang Wang IT Lecture 4

slide-73
SLIDE 73

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost

Summary

73 / 74 I-Hsiang Wang IT Lecture 4

slide-74
SLIDE 74

Channel Capacity and the Weak Converse Achievability Proof and Source-Channel Separation Channel Coding with Input Cost

DMC pY|X with or without feedback: C = maxp(x) I (X ; Y ). Weak converse: Fano’s inequality, data processing, DMC assumption Achievability: random coding argument, typicality decoder Feedback does not increase the capacity of DMC. Symmetric channel capacity = log|Y| − H (p ), where p permutes all rows of pY|X. Erasure channel capacity = (1 − p) log|X|. Joint source-channel coding theorem: R <

C H({Si} ) =

⇒ R is achievable; R >

C H({Si} ) =

⇒ R is not achievable. Source-channel separation is optimal. DMC with average input cost: C (B) = max

p(x): E[b(X)]≤BI (X ; Y ).

74 / 74 I-Hsiang Wang IT Lecture 4