Empirical Properties of Good Channel Codes Qinghua (Devon) Ding - - PowerPoint PPT Presentation

empirical properties of good channel codes
SMART_READER_LITE
LIVE PREVIEW

Empirical Properties of Good Channel Codes Qinghua (Devon) Ding - - PowerPoint PPT Presentation

Empirical Properties of Good Channel Codes Qinghua (Devon) Ding June 8, 2020 The Chinese University of Hong Kong 1 Introduction Shannons Channel Coding Theorem 2 Shannons Channel Coding Theorem 2 { R > C C , P ( m = m


slide-1
SLIDE 1

Empirical Properties of Good Channel Codes

Qinghua (Devon) Ding June 8, 2020

The Chinese University of Hong Kong

slide-2
SLIDE 2

1

slide-3
SLIDE 3

Introduction

slide-4
SLIDE 4

Shannon’s Channel Coding Theorem

2

slide-5
SLIDE 5

Shannon’s Channel Coding Theorem

C = maxPX I(X; Y) { R > C ⇒ ∀C, P(m ̸= ˆ m) → 1. R < C ⇒ ∃C, P(m ̸= ˆ m) → 0.

2

slide-6
SLIDE 6

Shannon’s Channel Coding Theorem

C = maxPX I(X; Y) { R > C ⇒ ∀C, P(m ̸= ˆ m) → 1. R < C ⇒ ∃C, P(m ̸= ˆ m) → 0. P∗

X = arg max PX I(X; Y) need not be unique. 2

slide-7
SLIDE 7

Result I: Characterization of the P∗

X

Consider a channel W = (p1, ..., p|X|), denote r = (H(p1), ..., H(pm)).

3

slide-8
SLIDE 8

Result I: Characterization of the P∗

X

Consider a channel W = (p1, ..., p|X|), denote r = (H(p1), ..., H(pm)). Given some P∗

X ∈ arg maxPX I(X; Y) (e.g. by Blahut-Amiroto algorithm). 3

slide-9
SLIDE 9

Result I: Characterization of the P∗

X

Consider a channel W = (p1, ..., p|X|), denote r = (H(p1), ..., H(pm)). The whole set of capacity-achieving input distribution is P∗

X =

{ P∗

X + ker

(W r )} ∩ Rm

+.1

1A non-linear equation system for P∗ X is developed in [Mur53] and its followup works.

There’s no analytical solutions in general.

3

slide-10
SLIDE 10

Optimizing Input Distribution

Suppose W has unique CAID P∗

X with supp(P∗ X) = X ′ ⊂ X. 4

slide-11
SLIDE 11

Optimizing Input Distribution

Suppose W has unique CAID P∗

X with supp(P∗ X) = X ′ ⊂ X.

  • Claim. The distributions {pi, i ∈ X ′} should be linearly independent.

4

slide-12
SLIDE 12

Optimizing Input Distribution

Suppose W has unique CAID P∗

X with supp(P∗ X) = X ′ ⊂ X.

  • Claim. The distributions {pi, i ∈ X ′} should be linearly independent.

Proof by contrapositive (details later).

4

slide-13
SLIDE 13

Property of Random Code Ensemble

C = maxPX I(X; Y) { R > C ⇒ ∀C, P(W ̸= ˆ W) → 1. R < C ⇒ ∃C, P(W ̸= ˆ W) → 0.

5

slide-14
SLIDE 14

Property of Random Code Ensemble

C = maxPX I(X; Y) { R > C ⇒ ∀C, P(W ̸= ˆ W) → 1. R < C ⇒ ∃C, P(W ̸= ˆ W) → 0. Random code ensemble is capacity-achieving.

5

slide-15
SLIDE 15

Property of Random Code Ensemble

Random codes: each alphabet i.i.d. from P∗

X.

P∗

X ∈ arg maxPX I(X; Y) 6

slide-16
SLIDE 16

Property of Random Code Ensemble

Random codes: each alphabet i.i.d. from P∗

X.

P∗

X ∈ arg maxPX I(X; Y)

Empirical independence: #{(xi, x′

i) = (a, b)} ≈ nP∗ X(a)P∗ X(b).2

2This condition is difgerent from [HV93, PV13, SV97].

6

slide-17
SLIDE 17

Property of Random Code Ensemble

Random codes: each alphabet i.i.d. from P∗

X.

P∗

X ∈ arg maxPX I(X; Y)

Empirical independence: #{(xi, x′

i) = (a, b)} ≈ nP∗ X(a)P∗ X(b).2

  • Observation. Random codes have “most” (1 − o(1) fraction)

codeword pairs that’s empirically independent, w.h.p.

2This condition is difgerent from [HV93, PV13, SV97].

6

slide-18
SLIDE 18

Property of Random Code Ensemble

Random codes: each alphabet i.i.d. from P∗

X.

P∗

X ∈ arg maxPX I(X; Y)

#{(xi, x′

i, x′′ i ) = (a, b, c)} ≈ nPX∗(a)PX∗(b)PX∗(c).

Generalization to k = O(1). Random codes have “most” codeword k-tuples that’s empirically independent, w.h.p.

6

slide-19
SLIDE 19

Result II: Necessary Conditions for Good Codes

Capacity-achieving code (or good code) { R = C − ϵ, P(m ̸= ˆ m) → 0.

7

slide-20
SLIDE 20

Result II: Necessary Conditions for Good Codes

Capacity-achieving code (or good code) { R = C − ϵ, P(m ̸= ˆ m) → 0. Theorem (Property of Good Codes) For any DMC with unique P∗

X, any good code for it should have

1 − o(1) fraction of codeword k-tuples empirically independent.

7

slide-21
SLIDE 21

Result II: Necessary Conditions for Good Codes

Capacity-achieving code (or good code) { R = C − ϵ, P(m ̸= ˆ m) → 0. Theorem (Property of Good Codes) For any DMC with unique P∗

X, any good code for it should have

1 − o(1) fraction of codeword k-tuples empirically independent. Similar results holds for AWGN channel.

7

slide-22
SLIDE 22

Advertisement Parallel work [ZVJ20] on Quadratically Constrained Two-Way Adversarial Channels ISIT 2020 https://sites.google.com/view/yihan/

8

slide-23
SLIDE 23

Advertisement Parallel work [ZVJ20] on Quadratically Constrained Two-Way Adversarial Channels ISIT 2020 https://sites.google.com/view/yihan/

8

slide-24
SLIDE 24

Result III: Non-universality of Good Codes

Two channels W and W′ are similar ifg P∗

X = P∗ X′ and C = C′. 9

slide-25
SLIDE 25

Result III: Non-universality of Good Codes

Two channels W and W′ are similar ifg P∗

X = P∗ X′ and C = C′.

  • Observation. Random code ensemble with alphabet

i.i.d.

∼ P∗

X achieves

capacity under vanishing error probability for all similar channels.

9

slide-26
SLIDE 26

Result III: Non-universality of Good Codes

Two channels W and W′ are similar ifg P∗

X = P∗ X′ and C = C′.

  • Observation. Random code ensemble with alphabet

i.i.d.

∼ P∗

X achieves

capacity under vanishing error probability for all similar channels. Theorem (Non-universality of Good Codes) There exists similar DMCs W, W′ and code C that’s capacity-achieving for W, s.t. no expurgation of C with the same rate is good for W′.

9

slide-27
SLIDE 27

Result III: Non-universality of Good Codes

10

slide-28
SLIDE 28

Result III: Non-universality of Good Codes

10

slide-29
SLIDE 29

Result III: Non-universality of Good Codes

10

slide-30
SLIDE 30

Proof Ideas

slide-31
SLIDE 31

Proof to Characterization Result

For DMC W, given some PX∗ ∈ arg maxPX I(X; Y), we have P∗

X =

{ P∗

X + ker

(W r )} ∩ Rm

+. 11

slide-32
SLIDE 32

Proof to Characterization Result

For DMC W, given some PX∗ ∈ arg maxPX I(X; Y), we have P∗

X =

{ P∗

X + ker

(W r )} ∩ Rm

+.

Proof by standard linear algebra.

11

slide-33
SLIDE 33

Proof to Characterization Result

Generalizing to k-use channel, we have P∗

Xk =

{ P∗⊗k

X

+ ker (W⊗k r(k) )} ∩ Rmk

+ . 12

slide-34
SLIDE 34

Proof to Characterization Result

Generalizing to k-use channel, we have P∗

Xk =

{ P∗⊗k

X

+ ker (W⊗k r(k) )} ∩ Rmk

+ .

Consider the following noisy typewritter channel. W =      1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2      Although C and P∗

Y tensorize, P∗ Xk does not tensorize. 12

slide-35
SLIDE 35

Proof to Linear Indepence Lemma

Suppose W has unique CAID P∗

X with supp(P∗ X) = X ′ ⊂ X.

  • Claim. The distributions {pi, i ∈ X ′} should be linearly independent.

Proof by contrapositive.

13

slide-36
SLIDE 36

Proof to Linear Indepence Lemma

Suppose linear independence does not hold.

14

slide-37
SLIDE 37

Proof to Linear Indepence Lemma

Suppose linear independence does not hold. We can find feasible direction δ ̸= 0 such that ⟨r, δ⟩ = 0, Wδ = 0.

14

slide-38
SLIDE 38

Proof to Linear Indepence Lemma

Suppose linear independence does not hold. We can find feasible direction δ ̸= 0 such that ⟨r, δ⟩ = 0, Wδ = 0. Then I(PX∗ + ϵδ; PY) = C for small enough ϵ, contradiction!

14

slide-39
SLIDE 39

Proof to Empirical Independence Property

Consider a discrete memoryless channel W with unique PX∗.

15

slide-40
SLIDE 40

Proof to Empirical Independence Property

Consider a discrete memoryless channel W with unique PX∗.

  • Claim. Any good code C for W has the property that ∀δ > 0,

Px1,...,xk∼C(∥τx1,...,xk − P∗⊗k

X

∥1 > δ) → 0.

15

slide-41
SLIDE 41

Proof to Empirical Independence Property

Consider a discrete memoryless channel W with unique PX∗.

  • Claim. Any good code C for W has the property that ∀δ > 0,

Px1,...,xk∼C(∥τx1,...,xk − P∗⊗k

X

∥1 > δ) → 0. Proof by considering the k-use channel.

15

slide-42
SLIDE 42

Proof to Empirical Independece Property

16

slide-43
SLIDE 43

Proof to Empirical Independece Property

16

slide-44
SLIDE 44

Proof to Empirical Independece Property

16

slide-45
SLIDE 45

Proof to Empirical Independece Property

16

slide-46
SLIDE 46

Proof to Empirical Independence Property (Cont’d)

Consider the AWGN(P, N) channel denoted as W.

17

slide-47
SLIDE 47

Proof to Empirical Independence Property (Cont’d)

Consider the AWGN(P, N) channel denoted as W.

  • Claim. Any good code for W has the property that ∀δ > 0,

PX1,...,Xk∼C(∃i ̸= j, |⟨Xi, Xj⟩| > δn) → 0.

17

slide-48
SLIDE 48

Proof to Empirical Independence Property (Cont’d)

Consider the AWGN(P, N) channel denoted as W.

  • Claim. Any good code for W has the property that ∀δ > 0,

PX1,...,Xk∼C(∃i ̸= j, |⟨Xi, Xj⟩| > δn) → 0. Proof by contrapositive.

17

slide-49
SLIDE 49

Properties of Good Channel Codess

Suppose the codewords are empirically correlated.

18

slide-50
SLIDE 50

Properties of Good Channel Codess

Suppose the codewords are empirically correlated. Then we can extract a large subcode good for another channel with P′ < P, contradiction!

18

slide-51
SLIDE 51

Non-universality of Good Channel Codes

slide-52
SLIDE 52

Non-universality of Good Channel Codes

Consider similar channels BEC(H(p)) and BSC(p). 2

2The figures are from [CT12].

19

slide-53
SLIDE 53

Non-universality of Good Channel Codes

Consider similar channels BEC(H(p)) and BSC(p). 2

  • Claim. There exists good code for BEC(H(p)) such that no expurgated

subcode of the same rate can be good for BSC(p).

2The figures are from [CT12].

19

slide-54
SLIDE 54

Non-universality of Good Channel Codes

Consider similar channels BEC(H(p)) and BSC(p). 2

  • Claim. There exists good code for BEC(H(p)) such that no expurgated

subcode of the same rate can be good for BSC(p). Proof by construction.

2The figures are from [CT12].

19

slide-55
SLIDE 55

Non-universality of good codes

We can show certain superposition code good for BEC(H(p)) can be bad for BSC(p) even for more powerful decoders.

20

slide-56
SLIDE 56

Conclusions

slide-57
SLIDE 57

Conclusions

  • 1. Capacity-achieving input distributions.

21

slide-58
SLIDE 58

Conclusions

  • 1. Capacity-achieving input distributions.
  • 2. Necessary conditions on good codes.

21

slide-59
SLIDE 59

Conclusions

  • 1. Capacity-achieving input distributions.
  • 2. Necessary conditions on good codes.
  • 3. Non-universality of good codes.

21

slide-60
SLIDE 60

References i

References

[CT12] Thomas M Cover and Joy A Thomas. Elements of information

  • theory. John Wiley & Sons, 2012.

[HV93] Te Sun Han and Sergio Verdú. Approximation theory of

  • utput statistics. IEEE Transactions on Information Theory,

39(3):752–772, 1993. [Mur53] Saburo Muroga. On the capacity of a discrete channel. i mathematical expression of capacity of a channel which is disturbed by noise in its every one symbol and expressible in

  • ne state diagram. Journal of the Physical Society of Japan,

8(4):484–494, 1953.

22

slide-61
SLIDE 61

References ii

[PV13] Yury Polyanskiy and Sergio Verdú. Empirical distribution of good channel codes with nonvanishing error probability. IEEE transactions on information theory, 60(1):5–21, 2013. [SV97] Shlomo Shamai and Sergio Verdú. The empirical distribution

  • f good codes. IEEE Transactions on Information Theory,

43(3):836–846, 1997. [ZVJ20] Yihan Zhang, Shashank Vatedka, and Sidharth Jaggi. Quadratically constrained two-way adversarial channels. arXiv preprint arXiv:2001.02575, 2020.

23