[PPT] - Revisiting Zero-Rate Bounds on the Reliability Function of Discrete PowerPoint Presentation

SLIDE 1

Revisiting Zero-Rate Bounds on the Reliability Function of Discrete Memoryless Channels

Marco Bondaschi & Marco Dalai

Department of Information Engineering University of Brescia, Italy

ISIT 2020

SLIDE 2

Setting

2 / 21

SLIDE 3

Setting

2 / 21

SLIDE 4

Setting

2 / 21

SLIDE 5

Setting

2 / 21

SLIDE 6

Setting

2 / 21

SLIDE 7

Setting

2 / 21

SLIDE 8

Setting

2 / 21

SLIDE 9

Setting

2 / 21

SLIDE 10

Setting

Code: set of M codewords C = {x1, x2, . . . , xM} ⊂ X n

3 / 21

SLIDE 11

Setting

Code: set of M codewords C = {x1, x2, . . . , xM} ⊂ X n Rate: R = log M n

3 / 21

SLIDE 12

Setting

Code: set of M codewords C = {x1, x2, . . . , xM} ⊂ X n Rate: R = log M n Decoding regions: Ym =

y ∈ Yn : m ∈ L(y)
3 / 21

SLIDE 13

Setting

Code: set of M codewords C = {x1, x2, . . . , xM} ⊂ X n Rate: R = log M n Decoding regions: Ym =

y ∈ Yn : m ∈ L(y)
Probability of error:

Pe,m =

y/

∈Ym

Pm(y) Pe,max = max

m∈M Pe,m

3 / 21

SLIDE 14

Setting

Code: set of M codewords C = {x1, x2, . . . , xM} ⊂ X n Rate: R = log M n Decoding regions: Ym =

y ∈ Yn : m ∈ L(y)
Probability of error:

Pe,m =

y/

∈Ym

Pm(y) Pe,max = max

m∈M Pe,m

L-list reliability function: EL(R) = lim

n→∞ −log Pe,max

n Pe,max ≈ e−n EL(R)

3 / 21

SLIDE 15

Reliability Function

4 / 21

SLIDE 16

Reliability Function

4 / 21

SLIDE 17

Reliability Function

4 / 21

SLIDE 18

Reliability Function

4 / 21

SLIDE 19

Reliability Function

4 / 21

SLIDE 20

Outline of the Proof

1 Lower-bound Pe,max for codes with L + 1 codewords.

5 / 21

SLIDE 21

Outline of the Proof

1 Lower-bound Pe,max for codes with L + 1 codewords.

Berlekamp and Blinovsky’s approach: study of a gradient on the boundary of the probability simplex − → cumbersome for L > 1.

5 / 21

SLIDE 22

Outline of the Proof

1 Lower-bound Pe,max for codes with L + 1 codewords.

Berlekamp and Blinovsky’s approach: study of a gradient on the boundary of the probability simplex − → cumbersome for L > 1. Our approach: method of types + trick by Shayevitz − → straightforward for L > 1.

5 / 21

SLIDE 23

Outline of the Proof

1 Lower-bound Pe,max for codes with L + 1 codewords.

Berlekamp and Blinovsky’s approach: study of a gradient on the boundary of the probability simplex − → cumbersome for L > 1. Our approach: method of types + trick by Shayevitz − → straightforward for L > 1.

2 For M ≥ L + 1 codewords, Pe,max is lower-bounded by the largest bound over all

subsets of L + 1 codewords.

5 / 21

SLIDE 24

Outline of the Proof

3 Upper bound the smallest error exponent with the average over all

(L + 1)-subcodes.

6 / 21

SLIDE 25

Outline of the Proof

3 Upper bound the smallest error exponent with the average over all

(L + 1)-subcodes.

4 Bound the average over a carefully selected subcode.

6 / 21

SLIDE 26

Outline of the Proof

3 Upper bound the smallest error exponent with the average over all

(L + 1)-subcodes.

4 Bound the average over a carefully selected subcode.

Berlekamp and Blinovsky’s approach: Selection of ordered subcode + complex iterative concatenation of codewords − → cumbersome for L > 1.

6 / 21

SLIDE 27

Outline of the Proof

3 Upper bound the smallest error exponent with the average over all

(L + 1)-subcodes.

4 Bound the average over a carefully selected subcode.

Berlekamp and Blinovsky’s approach: Selection of ordered subcode + complex iterative concatenation of codewords − → cumbersome for L > 1. Our approach: Selection of subcode using Ramsey theory + theorem by Komlós (Blinovsky’s idea for L = 1) − → straightforward for L > 1.

6 / 21

SLIDE 28

Outline of the Proof

3 Upper bound the smallest error exponent with the average over all

(L + 1)-subcodes.

4 Bound the average over a carefully selected subcode.

Berlekamp and Blinovsky’s approach: Selection of ordered subcode + complex iterative concatenation of codewords − → cumbersome for L > 1. Our approach: Selection of subcode using Ramsey theory + theorem by Komlós (Blinovsky’s idea for L = 1) − → straightforward for L > 1.

5 Show that for the subcode EL(0) = EL,ex(0).

6 / 21

SLIDE 29

1. Probability of Error for L + 1 Codewords

7 / 21

SLIDE 30

1. Probability of Error for L + 1 Codewords

For any vector x = (x1, . . . , xL+1) ∈ X L+1, q(x) = fraction of times the code has x as a column.

7 / 21

SLIDE 31

1. Probability of Error for L + 1 Codewords

For any vector x = (x1, . . . , xL+1) ∈ X L+1, q(x) = fraction of times the code has x as a column. Fundamental concave function: for any probability vector α, µ(α) =

x∈X L+1

q(x) µx(α), µx(α) = − log

y∈Y

P(y|x1)α1 · · · P(y|xL+1)αL+1

7 / 21

SLIDE 32

1. Probability of Error for L + 1 Codewords

For any vector x = (x1, . . . , xL+1) ∈ X L+1, q(x) = fraction of times the code has x as a column. Fundamental concave function: for any probability vector α, µ(α) =

x∈X L+1

q(x) µx(α), µx(α) = − log

y∈Y

P(y|x1)α1 · · · P(y|xL+1)αL+1 Objective: prove that Pe,max ≥ e−n DM, DM = max

α

µ(α)

7 / 21

SLIDE 33

1. Probability of Error for L + 1 Codewords

Berlekamp & Blinovsky’s approach: Auxiliary distribution on Yn that depends on ∇µ(α).

8 / 21

SLIDE 34

1. Probability of Error for L + 1 Codewords

Berlekamp & Blinovsky’s approach: Auxiliary distribution on Yn that depends on ∇µ(α). Requires careful analysis of the behavior of ∇µ(α) when µ(α) is maximized on the border of the probability simplex {α} → Easy for L = 1, complicated for L > 1.

8 / 21

SLIDE 35

1. Probability of Error for L + 1 Codewords

Berlekamp & Blinovsky’s approach: Auxiliary distribution on Yn that depends on ∇µ(α). Requires careful analysis of the behavior of ∇µ(α) when µ(α) is maximized on the border of the probability simplex {α} → Easy for L = 1, complicated for L > 1. Our approach: method of types + a result by Shayevitz → Much more straightforward to generalize to L > 1.

8 / 21

SLIDE 36

1. Probability of Error for L + 1 Codewords

Case L = 1: two messages M = {1, 2}.

9 / 21

SLIDE 37

1. Probability of Error for L + 1 Codewords

Case L = 1: two messages M = {1, 2}. Method of types: for output sequences y Same conditional type V given (x1, x2) = ⇒ same conditional probabilities P1(y) and P2(y)

9 / 21

SLIDE 38

1. Probability of Error for L + 1 Codewords

Case L = 1: two messages M = {1, 2}. Method of types: for output sequences y Same conditional type V given (x1, x2) = ⇒ same conditional probabilities P1(y) and P2(y) Decoding regions on conditional types: Y1 =

y : P1(y) > P2(y)
⇐

⇒ T1 =

V : D(V ||P1) < D(V ||P2)
D(V ||P1) =
a∈X
b∈X

q(a, b) D

V (·|a, b) || P(·|a)
9 / 21

SLIDE 39

1. Probability of Error for L + 1 Codewords

Binary hypothesis testing (Cover & Thomas):

10 / 21

SLIDE 40

1. Probability of Error for L + 1 Codewords

Binary hypothesis testing (Cover & Thomas):

10 / 21

SLIDE 41

1. Probability of Error for L + 1 Codewords

Case L > 1: same approach with L + 1 messages M = {1, 2, . . . , L + 1}.

11 / 21

SLIDE 42

1. Probability of Error for L + 1 Codewords

Case L > 1: same approach with L + 1 messages M = {1, 2, . . . , L + 1}. One message left out from each list.

11 / 21

SLIDE 43

1. Probability of Error for L + 1 Codewords

Case L > 1: same approach with L + 1 messages M = {1, 2, . . . , L + 1}. One message left out from each list. Decoding regions: Ym =

y : Pm(y) > Pi(y) for some i
Tm =
V : D(V ||Pm) < D(V ||Pi) for some i
11 / 21

SLIDE 44

1. Probability of Error for L + 1 Codewords

Case L > 1: same approach with L + 1 messages M = {1, 2, . . . , L + 1}. One message left out from each list. Decoding regions: Ym =

y : Pm(y) > Pi(y) for some i
Tm =
V : D(V ||Pm) < D(V ||Pi) for some i
D(V ||Pm) =
x∈X L+1

q(x) D

V (·|x) || P(·|xm)
11 / 21

SLIDE 45

1. Probability of Error for L + 1 Codewords

When n → ∞, same dominant exponent in all regions: DM = min

Q/ ∈T1

D(Q||P1) = · · · = min

Q/ ∈TL+1

D(Q||PL+1) Pe,max ≥ e−n

DM+o(1)
12 / 21

SLIDE 46

1. Probability of Error for L + 1 Codewords

When n → ∞, same dominant exponent in all regions: DM = min

Q/ ∈T1

D(Q||P1) = · · · = min

Q/ ∈TL+1

D(Q||PL+1) Pe,max ≥ e−n

DM+o(1)
Alternative expression for µ(α) by Shayevitz (2010) + minimax theorem:

DM = max

α

µ(α)

12 / 21

SLIDE 47

2. Probability of Error for M ≥ L + 1 Codewords

For a code C with M ≥ L + 1 messages M = {1, . . . , M}:

13 / 21

SLIDE 48

2. Probability of Error for M ≥ L + 1 Codewords

For a code C with M ≥ L + 1 messages M = {1, . . . , M}: Pick the (L + 1)-subcode ˆ C ⊂ C with smallest error exponent: Dmin(C) = min

C⊂C max α

µC(α)

13 / 21

SLIDE 49

2. Probability of Error for M ≥ L + 1 Codewords

For a code C with M ≥ L + 1 messages M = {1, . . . , M}: Pick the (L + 1)-subcode ˆ C ⊂ C with smallest error exponent: Dmin(C) = min

C⊂C max α

µC(α) Pick the message ˆ m with the maximal probability of error for the code ˆ C alone: Pe, ˆ

m(ˆ

C) = Pe,max(ˆ C) ≥ e−n

Dmin(C)+o(1)
13 / 21

SLIDE 50

2. Probability of Error for M ≥ L + 1 Codewords

For a code C with M ≥ L + 1 messages M = {1, . . . , M}: Pick the (L + 1)-subcode ˆ C ⊂ C with smallest error exponent: Dmin(C) = min

C⊂C max α

µC(α) Pick the message ˆ m with the maximal probability of error for the code ˆ C alone: Pe, ˆ

m(ˆ

C) = Pe,max(ˆ C) ≥ e−n

Dmin(C)+o(1)
When the whole code C is considered, Pe, ˆ

m can only increase:

Pe,max(C) ≥ Pe, ˆ

m(C) ≥ e−n

Dmin(C)+o(1)
13 / 21

SLIDE 51

3. Averaging the Error Exponents

Dmin(C) = min

C⊂C max α

µC(α) = min

C⊂C max α

x∈X L+1

qC(x) µx(α)

14 / 21

SLIDE 52

3. Averaging the Error Exponents

Dmin(C) = min

C⊂C max α

µC(α) = min

C⊂C max α

x∈X L+1

qC(x) µx(α) We need an upper bound for Dmin(C) valid for all codes C.

14 / 21

SLIDE 53

3. Averaging the Error Exponents

Dmin(C) = min

C⊂C max α

µC(α) = min

C⊂C max α

x∈X L+1

qC(x) µx(α) We need an upper bound for Dmin(C) valid for all codes C. Standard idea: bound the minimum error exponent with the average over all (L + 1)-subcodes C Dmin(C) = min

C⊂C max α

µC(α) ≤ E

max

α

µC(α)

14 / 21

SLIDE 54

3. Averaging the Error Exponents

Dmin(C) ≤

1

M − L L+1

C⊂C

max

α

x∈X L+1

qC(x) µx(α)

15 / 21

SLIDE 55

3. Averaging the Error Exponents

Dmin(C) ≤

1

M − L L+1

C⊂C

max

α

x∈X L+1

qC(x) µx(α) Problem: Maximum inside the sum! Subcodes C can have very different maximizing α.

15 / 21

SLIDE 56

3. Averaging the Error Exponents

Dmin(C) ≤

1

M − L L+1

C⊂C

max

α

x∈X L+1

qC(x) µx(α) Problem: Maximum inside the sum! Subcodes C can have very different maximizing α. Idea: average only on subcode C′ ⊂ C for which all C have same maximizing α. Dmin(C) = min

C⊂C max α

µC(α) ≤ min

C⊂C′ max α

µC(α) = Dmin(C′)

15 / 21

SLIDE 57

3. Averaging the Error Exponents

16 / 21

SLIDE 58

3. Averaging the Error Exponents

Berlekamp & Blinovsky Ordered subcode + Iterative concatenation of codewords

16 / 21

SLIDE 59

3. Averaging the Error Exponents

Berlekamp & Blinovsky Ordered subcode + Iterative concatenation of codewords Ramsey Theory (Blinovsky for L = 1) Symmetric subcode (Ramsey + Kómlos)

16 / 21

SLIDE 60

4. Extraction of Symmetric Subcode

What kind of subcode do we need? Subcode C′ such that for all (L + 1)-subcodes C ⊂ C′ qC(x) ≃ qC(x′) ∀ x′ ∈ S(x) where S(x) = set of permutations of x.

17 / 21

SLIDE 61

4. Extraction of Symmetric Subcode

What kind of subcode do we need? Subcode C′ such that for all (L + 1)-subcodes C ⊂ C′ qC(x) ≃ qC(x′) ∀ x′ ∈ S(x) where S(x) = set of permutations of x. If the property above is satisfied:

17 / 21

SLIDE 62

4. Extraction of Symmetric Subcode

What kind of subcode do we need? Subcode C′ such that for all (L + 1)-subcodes C ⊂ C′ qC(x) ≃ qC(x′) ∀ x′ ∈ S(x) where S(x) = set of permutations of x. If the property above is satisfied: µC(α) =

x qC(x)µx(α)

invariant to permutations of α

17 / 21

SLIDE 63

4. Extraction of Symmetric Subcode

What kind of subcode do we need? Subcode C′ such that for all (L + 1)-subcodes C ⊂ C′ qC(x) ≃ qC(x′) ∀ x′ ∈ S(x) where S(x) = set of permutations of x. If the property above is satisfied: µC(α) =

x qC(x)µx(α)

invariant to permutations of α = ⇒ µC(α) maximized at ˜ α =

1

L+1, . . . , 1 L+1

17 / 21

SLIDE 64

4. Extraction of Symmetric Subcode

Generalization of Komlós (1990): subcode C′ of size M′

18 / 21

SLIDE 65

4. Extraction of Symmetric Subcode

Generalization of Komlós (1990): subcode C′ of size M′

18 / 21

SLIDE 66

4. Extraction of Symmetric Subcode

Generalization of Komlós (1990): subcode C′ of size M′

18 / 21

SLIDE 67

4. Extraction of Symmetric Subcode

Generalization of Komlós (1990): subcode C′ of size M′

18 / 21

SLIDE 68

4. Extraction of Symmetric Subcode

Generalization of Komlós (1990): subcode C′ of size M′

18 / 21

SLIDE 69

4. Extraction of Symmetric Subcode

Generalization of Komlós (1990): subcode C′ of size M′

18 / 21

SLIDE 70

4. Extraction of Symmetric Subcode

Generalization of Komlós (1990): subcode C′ of size M′

18 / 21

SLIDE 71

4. Extraction of Symmetric Subcode

Ramsey’s theorem for graphs:

19 / 21

SLIDE 72

4. Extraction of Symmetric Subcode

Ramsey’s theorem for graphs: Same holds when edges are of L + 1 vertices (hypergraphs).

19 / 21

SLIDE 73

4. Extraction of Symmetric Subcode

Ramsey’s theorem for graphs: Same holds when edges are of L + 1 vertices (hypergraphs). Color edge C with

qC(x)
(quantized).

19 / 21

SLIDE 74

4. Extraction of Symmetric Subcode

Ramsey’s theorem for graphs: Same holds when edges are of L + 1 vertices (hypergraphs). Color edge C with

qC(x)
(quantized).

Monochromatic subgraph C′

19 / 21

SLIDE 75

4. Extraction of Symmetric Subcode

Ramsey’s theorem for graphs: Same holds when edges are of L + 1 vertices (hypergraphs). Color edge C with

qC(x)
(quantized).

Monochromatic subgraph C′ = ⇒ same qC(x) for all C ⊂ C′

19 / 21

SLIDE 76

4. Extraction of Symmetric Subcode

Ramsey’s theorem for graphs: Same holds when edges are of L + 1 vertices (hypergraphs). Color edge C with

qC(x)
(quantized).

Monochromatic subgraph C′ = ⇒ same qC(x) for all C ⊂ C′ = ⇒ qC(x) ≃ qC(x′) ∀ x′ ∈ S(x)

19 / 21

SLIDE 77

4. Extraction of Symmetric Subcode

Ramsey’s theorem for graphs: Same holds when edges are of L + 1 vertices (hypergraphs). Color edge C with

qC(x)
(quantized).

Monochromatic subgraph C′ = ⇒ same qC(x) for all C ⊂ C′ = ⇒ qC(x) ≃ qC(x′) ∀ x′ ∈ S(x) = ⇒ max

α

µC(α) = µC( ˜ α)

19 / 21

SLIDE 78

5. Upper Bound on EL(0)

Dmin(C) ≤

1

M′ − L L+1

C⊂C′

x

qC(x) µx( ˜ α) + o(1)

20 / 21

SLIDE 79

5. Upper Bound on EL(0)

Dmin(C) ≤

1

M′ − L L+1

C⊂C′

x

qC(x) µx( ˜ α) + o(1) ↓ Plotkin double-counting trick

20 / 21

SLIDE 80

5. Upper Bound on EL(0)

Dmin(C) ≤

1

M′ − L L+1

C⊂C′

x

qC(x) µx( ˜ α) + o(1) ↓ Plotkin double-counting trick ↓ Dmin(C) ≤

M′

M′ − L L+1 1 n

n

c=1
x

M′

c(x1)

M′ · · · M′

c(xL+1)

M′ µx( ˜ α) + o(1) where: M′

c(x) = number of times x occurs in column c

20 / 21

SLIDE 81

5. Upper Bound on EL(0)

Dmin(C) ≤

M′

M′ − L L+1 1 n

n

c=1
x

M′

c(x1)

M′ · · · M′

c(xL+1)

M′ µx( ˜ α) + o(1)

21 / 21

SLIDE 82

5. Upper Bound on EL(0)

Dmin(C) ≤

M′

M′ − L L+1 1 n

n

c=1
x

M′

c(x1)

M′ · · · M′

c(xL+1)

M′ µx( ˜ α) + o(1) ↓ M′

c(x)

M′

probability distribution on X

21 / 21

SLIDE 83

5. Upper Bound on EL(0)

Dmin(C) ≤

M′

M′ − L L+1 1 n

n

c=1
x

M′

c(x1)

M′ · · · M′

c(xL+1)

M′ µx( ˜ α) + o(1) ↓ M′

c(x)

M′

probability distribution on X

↓ Dmin(C) ≤

M′

M′ − L L+1 max

Q∈P(X)

x

Q(x1) · · · Q(xL+1) µx( ˜ α) + o(1)

21 / 21