Polar Coding Part 1 - Background Erdal Arkan - - PowerPoint PPT Presentation

polar coding
SMART_READER_LITE
LIVE PREVIEW

Polar Coding Part 1 - Background Erdal Arkan - - PowerPoint PPT Presentation

Polar Coding Part 1 - Background Erdal Arkan Electrical-Electronics Engineering Department, Bilkent University, Ankara, Turkey Algorithmic Coding Theory Workshop June 13 - 17, 2016 ICERM, Providence, RI Outline Sequential decoding and the


slide-1
SLIDE 1

Polar Coding

Part 1 - Background Erdal Arıkan

Electrical-Electronics Engineering Department, Bilkent University, Ankara, Turkey

Algorithmic Coding Theory Workshop June 13 - 17, 2016 ICERM, Providence, RI

slide-2
SLIDE 2

Outline

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker’s scheme Massey’s scheme Polar coding

slide-3
SLIDE 3

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker’s scheme Massey’s scheme Polar coding

Sequential decoding and the cutoff rate 1 / 72

slide-4
SLIDE 4

Tree coding and sequential decoding (SD)

◮ Consider a tree code (of

rate 1/2)

◮ A path is chosen and

transmitted

◮ Given the channel output,

search the tree for the correct (transmitted) path

◮ The tree structure turns

the ML decoding problem into a tree search problem

◮ A depth-first search

algorithm exists called sequential decoding (SD)

1 Transmitted path 00 11 00 11 10 01 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10

Sequential decoding and the cutoff rate 2 / 72

slide-5
SLIDE 5

Tree coding and sequential decoding (SD)

◮ Consider a tree code (of

rate 1/2)

◮ A path is chosen and

transmitted

◮ Given the channel output,

search the tree for the correct (transmitted) path

◮ The tree structure turns

the ML decoding problem into a tree search problem

◮ A depth-first search

algorithm exists called sequential decoding (SD)

1 Transmitted path 00 11 00 11 10 01 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10

Sequential decoding and the cutoff rate 2 / 72

slide-6
SLIDE 6

Tree coding and sequential decoding (SD)

◮ Consider a tree code (of

rate 1/2)

◮ A path is chosen and

transmitted

◮ Given the channel output,

search the tree for the correct (transmitted) path

◮ The tree structure turns

the ML decoding problem into a tree search problem

◮ A depth-first search

algorithm exists called sequential decoding (SD)

1 Transmitted path 00 11 00 11 10 01 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10

Sequential decoding and the cutoff rate 2 / 72

slide-7
SLIDE 7

Tree coding and sequential decoding (SD)

◮ Consider a tree code (of

rate 1/2)

◮ A path is chosen and

transmitted

◮ Given the channel output,

search the tree for the correct (transmitted) path

◮ The tree structure turns

the ML decoding problem into a tree search problem

◮ A depth-first search

algorithm exists called sequential decoding (SD)

1 Transmitted path 00 11 00 11 10 01 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10

Sequential decoding and the cutoff rate 2 / 72

slide-8
SLIDE 8

Tree coding and sequential decoding (SD)

◮ Consider a tree code (of

rate 1/2)

◮ A path is chosen and

transmitted

◮ Given the channel output,

search the tree for the correct (transmitted) path

◮ The tree structure turns

the ML decoding problem into a tree search problem

◮ A depth-first search

algorithm exists called sequential decoding (SD)

1 Transmitted path 00 11 00 11 10 01 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10

Sequential decoding and the cutoff rate 2 / 72

slide-9
SLIDE 9

Search metric

SD uses a “metric” to distinguish the correct path from the incorrect ones Fano’s metric: Γ(yn, xn) = log P(yn|xn) P(yn) − nR path length n candidate path xn received sequence yn code rate R

Sequential decoding and the cutoff rate 3 / 72

slide-10
SLIDE 10

History

◮ Tree codes were introduced by Elias (1955) with the aim of

reducing the complexity of ML decoding (the tree structure makes it possible to use search heuristics for ML decoding)

◮ Sequential decoding was introduced by Wozencraft (1957) as

part of his doctoral thesis

◮ Fano (1963) simplified the search algorithm and introduced

the above metric

Sequential decoding and the cutoff rate 4 / 72

slide-11
SLIDE 11

History

◮ Tree codes were introduced by Elias (1955) with the aim of

reducing the complexity of ML decoding (the tree structure makes it possible to use search heuristics for ML decoding)

◮ Sequential decoding was introduced by Wozencraft (1957) as

part of his doctoral thesis

◮ Fano (1963) simplified the search algorithm and introduced

the above metric

Sequential decoding and the cutoff rate 4 / 72

slide-12
SLIDE 12

History

◮ Tree codes were introduced by Elias (1955) with the aim of

reducing the complexity of ML decoding (the tree structure makes it possible to use search heuristics for ML decoding)

◮ Sequential decoding was introduced by Wozencraft (1957) as

part of his doctoral thesis

◮ Fano (1963) simplified the search algorithm and introduced

the above metric

Sequential decoding and the cutoff rate 4 / 72

slide-13
SLIDE 13

Drift properties of the metric

◮ On the correct path, the expectation of the metric per

channel symbol is

  • y,x

p(x, y)

  • log p(y|x)

P(y) − R

  • = I(X; Y ) − R.

◮ On any incorrect path, the expectation is

  • x,y

p(x)p(y)

  • log p(y|x)

p(y) − R

  • ≤ −R

◮ A properly designed SD scheme – given enough time –

identifies the correct path with probability one at any rate R < I(X; Y ).

Sequential decoding and the cutoff rate 5 / 72

slide-14
SLIDE 14

Drift properties of the metric

◮ On the correct path, the expectation of the metric per

channel symbol is

  • y,x

p(x, y)

  • log p(y|x)

P(y) − R

  • = I(X; Y ) − R.

◮ On any incorrect path, the expectation is

  • x,y

p(x)p(y)

  • log p(y|x)

p(y) − R

  • ≤ −R

◮ A properly designed SD scheme – given enough time –

identifies the correct path with probability one at any rate R < I(X; Y ).

Sequential decoding and the cutoff rate 5 / 72

slide-15
SLIDE 15

Drift properties of the metric

◮ On the correct path, the expectation of the metric per

channel symbol is

  • y,x

p(x, y)

  • log p(y|x)

P(y) − R

  • = I(X; Y ) − R.

◮ On any incorrect path, the expectation is

  • x,y

p(x)p(y)

  • log p(y|x)

p(y) − R

  • ≤ −R

◮ A properly designed SD scheme – given enough time –

identifies the correct path with probability one at any rate R < I(X; Y ).

Sequential decoding and the cutoff rate 5 / 72

slide-16
SLIDE 16

Computation problem in sequential decoding

◮ Computation in sequential decoding is a random quantity,

depending on the code rate R and the noise realization

◮ Bursts of noise create barriers for the depth-first search

algorithm, necessitating excessive backtracking in the search

◮ Still, the average computation per decoded digit in sequential

decoding can be kept bounded provided the code rate R is below the cutoff rate R0

= − log

  • y

x

Q(x)

  • W (y|x)

2

◮ So, SD solves the coding problem for rates below R0 ◮ Indeed, SD was the method of choice in space

communications, albeit briefly

Sequential decoding and the cutoff rate 6 / 72

slide-17
SLIDE 17

Computation problem in sequential decoding

◮ Computation in sequential decoding is a random quantity,

depending on the code rate R and the noise realization

◮ Bursts of noise create barriers for the depth-first search

algorithm, necessitating excessive backtracking in the search

◮ Still, the average computation per decoded digit in sequential

decoding can be kept bounded provided the code rate R is below the cutoff rate R0

= − log

  • y

x

Q(x)

  • W (y|x)

2

◮ So, SD solves the coding problem for rates below R0 ◮ Indeed, SD was the method of choice in space

communications, albeit briefly

Sequential decoding and the cutoff rate 6 / 72

slide-18
SLIDE 18

Computation problem in sequential decoding

◮ Computation in sequential decoding is a random quantity,

depending on the code rate R and the noise realization

◮ Bursts of noise create barriers for the depth-first search

algorithm, necessitating excessive backtracking in the search

◮ Still, the average computation per decoded digit in sequential

decoding can be kept bounded provided the code rate R is below the cutoff rate R0

= − log

  • y

x

Q(x)

  • W (y|x)

2

◮ So, SD solves the coding problem for rates below R0 ◮ Indeed, SD was the method of choice in space

communications, albeit briefly

Sequential decoding and the cutoff rate 6 / 72

slide-19
SLIDE 19

Computation problem in sequential decoding

◮ Computation in sequential decoding is a random quantity,

depending on the code rate R and the noise realization

◮ Bursts of noise create barriers for the depth-first search

algorithm, necessitating excessive backtracking in the search

◮ Still, the average computation per decoded digit in sequential

decoding can be kept bounded provided the code rate R is below the cutoff rate R0

= − log

  • y

x

Q(x)

  • W (y|x)

2

◮ So, SD solves the coding problem for rates below R0 ◮ Indeed, SD was the method of choice in space

communications, albeit briefly

Sequential decoding and the cutoff rate 6 / 72

slide-20
SLIDE 20

Computation problem in sequential decoding

◮ Computation in sequential decoding is a random quantity,

depending on the code rate R and the noise realization

◮ Bursts of noise create barriers for the depth-first search

algorithm, necessitating excessive backtracking in the search

◮ Still, the average computation per decoded digit in sequential

decoding can be kept bounded provided the code rate R is below the cutoff rate R0

= − log

  • y

x

Q(x)

  • W (y|x)

2

◮ So, SD solves the coding problem for rates below R0 ◮ Indeed, SD was the method of choice in space

communications, albeit briefly

Sequential decoding and the cutoff rate 6 / 72

slide-21
SLIDE 21

References on complexity of sequential decoding

◮ Achievability: Wozencraft (1957), Reiffen (1962), Fano

(1963), Stiglitz and Yudkin (1964)

◮ Converse: Jacobs and Berlekamp (1967) ◮ Refinements: Wozencraft and Jacobs (1965), Savage (1966),

Gallager (1968), Jelinek (1968), Forney (1974), Arıkan (1986), Arıkan (1994)

Sequential decoding and the cutoff rate 7 / 72

slide-22
SLIDE 22

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker’s scheme Massey’s scheme Polar coding

Guessing and cutoff rate 8 / 72

slide-23
SLIDE 23

A computational model for sequential decoding

◮ SD visits nodes at level N in a certain order ◮ No “look-ahead” assumption: SD forgets what it saw beyond

level N upon backtracking

◮ Complexity measure GN: The number of nodes searched

(visited) at level N until the correct node is visited for the first time

Guessing and cutoff rate 9 / 72

slide-24
SLIDE 24

A computational model for sequential decoding

◮ SD visits nodes at level N in a certain order ◮ No “look-ahead” assumption: SD forgets what it saw beyond

level N upon backtracking

◮ Complexity measure GN: The number of nodes searched

(visited) at level N until the correct node is visited for the first time

Guessing and cutoff rate 9 / 72

slide-25
SLIDE 25

A computational model for sequential decoding

◮ SD visits nodes at level N in a certain order ◮ No “look-ahead” assumption: SD forgets what it saw beyond

level N upon backtracking

◮ Complexity measure GN: The number of nodes searched

(visited) at level N until the correct node is visited for the first time

Guessing and cutoff rate 9 / 72

slide-26
SLIDE 26

A bound of computational complexity

◮ Let R be a fixed code rate. ◮ There exist tree codes of rate R such that

E[GN] ≤ 1 + 2−N(R0−R).

◮ Conversely, for any tree code of rate R,

E[GN] 1 + 2−N(R0−R)

Guessing and cutoff rate 10 / 72

slide-27
SLIDE 27

A bound of computational complexity

◮ Let R be a fixed code rate. ◮ There exist tree codes of rate R such that

E[GN] ≤ 1 + 2−N(R0−R).

◮ Conversely, for any tree code of rate R,

E[GN] 1 + 2−N(R0−R)

Guessing and cutoff rate 10 / 72

slide-28
SLIDE 28

A bound of computational complexity

◮ Let R be a fixed code rate. ◮ There exist tree codes of rate R such that

E[GN] ≤ 1 + 2−N(R0−R).

◮ Conversely, for any tree code of rate R,

E[GN] 1 + 2−N(R0−R)

Guessing and cutoff rate 10 / 72

slide-29
SLIDE 29

The Guessing Problem

◮ Alice draws a sample of a random variable X ∼ P. ◮ Bob wishes to determine X by asking questions of the form

“Is X equal to x ?” which are answered truthfully by Alice.

◮ Bob’s goal is to minimize the expected number of questions

until he gets a YES answer.

Guessing and cutoff rate 11 / 72

slide-30
SLIDE 30

The Guessing Problem

◮ Alice draws a sample of a random variable X ∼ P. ◮ Bob wishes to determine X by asking questions of the form

“Is X equal to x ?” which are answered truthfully by Alice.

◮ Bob’s goal is to minimize the expected number of questions

until he gets a YES answer.

Guessing and cutoff rate 11 / 72

slide-31
SLIDE 31

The Guessing Problem

◮ Alice draws a sample of a random variable X ∼ P. ◮ Bob wishes to determine X by asking questions of the form

“Is X equal to x ?” which are answered truthfully by Alice.

◮ Bob’s goal is to minimize the expected number of questions

until he gets a YES answer.

Guessing and cutoff rate 11 / 72

slide-32
SLIDE 32

Guessing with Side Information

◮ Alice samples (X, Y ) ∼ P(x, y). ◮ Bob observes Y and is to determine X by asking the same

type of questions “Is X equal to x ?”

◮ The goal is to minimize the expected number of quesses.

Guessing and cutoff rate 12 / 72

slide-33
SLIDE 33

Guessing with Side Information

◮ Alice samples (X, Y ) ∼ P(x, y). ◮ Bob observes Y and is to determine X by asking the same

type of questions “Is X equal to x ?”

◮ The goal is to minimize the expected number of quesses.

Guessing and cutoff rate 12 / 72

slide-34
SLIDE 34

Guessing with Side Information

◮ Alice samples (X, Y ) ∼ P(x, y). ◮ Bob observes Y and is to determine X by asking the same

type of questions “Is X equal to x ?”

◮ The goal is to minimize the expected number of quesses.

Guessing and cutoff rate 12 / 72

slide-35
SLIDE 35

Optimal guessing strategies

◮ Let G be the number of guesses to determine X. ◮ The expected no of guesses is given by

E[G] =

  • x∈X

P(x)G(x)

◮ A guessing strategy minimizes E[G] if

P(x) > P(x′) = ⇒ G(x) < G(x′).

Guessing and cutoff rate 13 / 72

slide-36
SLIDE 36

Optimal guessing strategies

◮ Let G be the number of guesses to determine X. ◮ The expected no of guesses is given by

E[G] =

  • x∈X

P(x)G(x)

◮ A guessing strategy minimizes E[G] if

P(x) > P(x′) = ⇒ G(x) < G(x′).

Guessing and cutoff rate 13 / 72

slide-37
SLIDE 37

Optimal guessing strategies

◮ Let G be the number of guesses to determine X. ◮ The expected no of guesses is given by

E[G] =

  • x∈X

P(x)G(x)

◮ A guessing strategy minimizes E[G] if

P(x) > P(x′) = ⇒ G(x) < G(x′).

Guessing and cutoff rate 13 / 72

slide-38
SLIDE 38

Upper bound on guessing effort

For any optimal guessing function E[G ∗(X)] ≤

x

  • P(x)

2 Proof. G ∗(x) ≤

  • all x′
  • P(x′)/P(x) =

M

  • i=1

ipG(i) E[G ∗(X)] ≤

  • x

P(x)

  • x′
  • P(x′)/P(x) =
  • x
  • P(x)

2 .

Guessing and cutoff rate 14 / 72

slide-39
SLIDE 39

Lower bound on guessing effort

For any guessing function for a target r.v. X with M possible values, E[G(X)] ≥ (1 + ln M)−1

  • x
  • P(x)

2 For the proof we use the following variant of H¨

  • lder’s inequality.

Guessing and cutoff rate 15 / 72

slide-40
SLIDE 40

Lemma

Let ai, pi be positive numbers.

  • i

aipi ≥

  • i

a−1

i

−1

i

√pi 2 .

  • Proof. Let λ = 1/2 and put Ai = a−1

i

, Bi = aλ

i pλ i , in H¨

  • lder’s

inequality

  • i

AiBi ≤

  • i

A1/(1−λ)

i

1−λ

i

B1/λ

i

λ .

Guessing and cutoff rate 16 / 72

slide-41
SLIDE 41

Proof of Lower Bound

E[G(X) =

M

  • i=1

ipG(i) ≥ M

  • i=1

1/i −1 M

  • i=1
  • pG(i)

2 = M

  • i=1

1/i −1

x

  • P(x)

2 ≥ (1 + ln M)−1

  • x
  • P(x)

2

Guessing and cutoff rate 17 / 72

slide-42
SLIDE 42

Essense of the inequalities

For any set of real numbers p1 ≥ p2 ≥ · · · ≥ pM > 0, 1 ≥ M

i=1 i pi

M

i=1

√pi 2 ≥ (1 + ln M)−1

Guessing and cutoff rate 18 / 72

slide-43
SLIDE 43

Guessing Random Vectors

◮ Let X = (X1, . . . , Xn) ∼ P(x1, . . . , xn). ◮ Guessing X means asking questions of the form

“Is X = x ?” for possible values x = (x1, . . . , xn) of X.

◮ Notice that coordinate-wise probes of the type

“Is Xi = xi ?” are not allowed.

Guessing and cutoff rate 19 / 72

slide-44
SLIDE 44

Complexity of Vector Guessing

Suppose Xi has Mi possible values, i = 1, . . . , n. Then, 1 ≥ E[G ∗(X1, . . . , Xn)]

  • x1,...,xn
  • P(x1, . . . , xn)

2 ≥ [1 + ln(M1 · · · Mn)]−1 In particular, if X1, . . . , Xn are i.i.d. ∼ P with a common alphabet X, 1 ≥ E[G ∗(X1, . . . , Xn)]

  • x∈X
  • P(x)

2n ≥ [1 + n ln |X|]−1

Guessing and cutoff rate 20 / 72

slide-45
SLIDE 45

Guessing with Side Information

◮ (X, Y ) a pair of random variables with a joint distribution

P(x, y).

◮ Y known. X to be guessed as before. ◮ G(x|y) the number of guesses when X = x, Y = y.

Guessing and cutoff rate 21 / 72

slide-46
SLIDE 46

Lower Bound

For any guessing strategy and any ρ > 0, E[G(X|Y )] ≥ (1 + ln M)−1

y

  • x
  • P(x, y)

2 where M is the number of possible values of X. Proof. E[G(X|Y )] =

  • y

P(y)E[G(X|Y = y)] ≥

  • y

P(y)(1 + ln M)−1

  • x
  • P(x|y)

2 = (1 + ln M)−1

y

  • x
  • P(x, y)

2

Guessing and cutoff rate 22 / 72

slide-47
SLIDE 47

Upper bound

Optimal guessing functions satisfy E[G ∗(X|Y )] ≤

  • y
  • x
  • P(x, y)

2 . Proof. E[G ∗(X|Y )] =

  • y

P(y)

  • x

P(x|y)G ∗(x|y) ≤

  • y

P(y)

  • x
  • P(x|y)

2 =

  • y
  • x
  • P(x, y)

2 .

Guessing and cutoff rate 23 / 72

slide-48
SLIDE 48

Generalization to Random Vectors

For optimal guessing functions, for ρ > 0, 1 ≥ E[G ∗(X1, . . . , Xk|Y1, . . . , Yn)]

  • y1,...,yn
  • x1,...,xk
  • P(x1, . . . , xk, y1, . . . , yn)

2 ≥ [1 + ln(M1 · · · Mk)]−1 where Mi denotes the number of possible values of Xi.

Guessing and cutoff rate 24 / 72

slide-49
SLIDE 49

A “guessing” decoder

◮ Consider a block code with M codewords x1, . . . , xM of block

length N.

◮ Suppose a codeword is chosen at random and sent over a

channel W

◮ Given the channel output y, a “guessing decoder” decodes by

asking questions of the form “Is the correct codeword the mth one?” to which it receives a truthful YES or NO answer.

◮ On a NO answer it repeats the question with a new m. ◮ The complexity C for this decoder is the number of questions

until a YES answer.

Guessing and cutoff rate 25 / 72

slide-50
SLIDE 50

A “guessing” decoder

◮ Consider a block code with M codewords x1, . . . , xM of block

length N.

◮ Suppose a codeword is chosen at random and sent over a

channel W

◮ Given the channel output y, a “guessing decoder” decodes by

asking questions of the form “Is the correct codeword the mth one?” to which it receives a truthful YES or NO answer.

◮ On a NO answer it repeats the question with a new m. ◮ The complexity C for this decoder is the number of questions

until a YES answer.

Guessing and cutoff rate 25 / 72

slide-51
SLIDE 51

A “guessing” decoder

◮ Consider a block code with M codewords x1, . . . , xM of block

length N.

◮ Suppose a codeword is chosen at random and sent over a

channel W

◮ Given the channel output y, a “guessing decoder” decodes by

asking questions of the form “Is the correct codeword the mth one?” to which it receives a truthful YES or NO answer.

◮ On a NO answer it repeats the question with a new m. ◮ The complexity C for this decoder is the number of questions

until a YES answer.

Guessing and cutoff rate 25 / 72

slide-52
SLIDE 52

A “guessing” decoder

◮ Consider a block code with M codewords x1, . . . , xM of block

length N.

◮ Suppose a codeword is chosen at random and sent over a

channel W

◮ Given the channel output y, a “guessing decoder” decodes by

asking questions of the form “Is the correct codeword the mth one?” to which it receives a truthful YES or NO answer.

◮ On a NO answer it repeats the question with a new m. ◮ The complexity C for this decoder is the number of questions

until a YES answer.

Guessing and cutoff rate 25 / 72

slide-53
SLIDE 53

A “guessing” decoder

◮ Consider a block code with M codewords x1, . . . , xM of block

length N.

◮ Suppose a codeword is chosen at random and sent over a

channel W

◮ Given the channel output y, a “guessing decoder” decodes by

asking questions of the form “Is the correct codeword the mth one?” to which it receives a truthful YES or NO answer.

◮ On a NO answer it repeats the question with a new m. ◮ The complexity C for this decoder is the number of questions

until a YES answer.

Guessing and cutoff rate 25 / 72

slide-54
SLIDE 54

Optimal guessing decoder

An optimal guessing decoder is one that minimizes the expected complexity E[C]. Clearly, E[C] is minimized by generating the guesses in decreasing

  • rder of likelihoods W (y|xm).

xi1 ← 1st guess (the most likely codeword given y) xi2 ← 2nd guess (2nd most likely codeword given y) . . . xL ← correct codeword obtained; guessing stops Complexity C equals the number of guesses L

Guessing and cutoff rate 26 / 72

slide-55
SLIDE 55

Application to the guessing decoder

◮ A block code C = {x1, . . . , xM} with M = eNR codewords of

block length N.

◮ A codeword X chosen at random and sent over a DMC W . ◮ Given the channel output vector Y, the decoder guesses X.

A special case of guessing with side information where P(X = x, Y = y) = e−NR

N

  • i=1

W (yi|xi), x ∈ C

Guessing and cutoff rate 27 / 72

slide-56
SLIDE 56

Cutoff rate bound

E[G ∗(X|Y)] ≥ [1 + NR]−1

y

  • x
  • P(x, y)

2 = [1 + NR]−1 eNR

y

  • x

QN(x)

  • WN(x, y)

2N ≥ [1 + NR]−1 eN(R−R0(W )) where R0(W ) = max

Q

  − ln

  • y
  • x

Q(x)

  • W (y|x)

2   is the channel cutoff rate.

Guessing and cutoff rate 28 / 72

slide-57
SLIDE 57

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker’s scheme Massey’s scheme Polar coding

Boosting the cutoff rate 29 / 72

slide-58
SLIDE 58

Boosting the cutoff rate

◮ It was clear almost from the beginning that R0 was at best

shaky in its role as a limit to practical communications

◮ There were many attempts to boost the cutoff rate by

devising clever schemes for searching a tree

◮ One striking example is Pinsker’s scheme that displayed the

strange nature of R0

Boosting the cutoff rate 30 / 72

slide-59
SLIDE 59

Boosting the cutoff rate

◮ It was clear almost from the beginning that R0 was at best

shaky in its role as a limit to practical communications

◮ There were many attempts to boost the cutoff rate by

devising clever schemes for searching a tree

◮ One striking example is Pinsker’s scheme that displayed the

strange nature of R0

Boosting the cutoff rate 30 / 72

slide-60
SLIDE 60

Boosting the cutoff rate

◮ It was clear almost from the beginning that R0 was at best

shaky in its role as a limit to practical communications

◮ There were many attempts to boost the cutoff rate by

devising clever schemes for searching a tree

◮ One striking example is Pinsker’s scheme that displayed the

strange nature of R0

Boosting the cutoff rate 30 / 72

slide-61
SLIDE 61

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker’s scheme Massey’s scheme Polar coding

Pinsker’s scheme 31 / 72

slide-62
SLIDE 62

Binary Symmetric Channel

We will describe Pinsker’s scheme using the BSC example:

◮ Capacity

C = 1 + ǫ log2(ǫ) + (1 − ǫ) log2(1 − ǫ)

◮ Cutoff rate

R0 = log2 2 1 + 2

  • ǫ(1 − ǫ)

Pinsker’s scheme 32 / 72

slide-63
SLIDE 63

Binary Symmetric Channel

We will describe Pinsker’s scheme using the BSC example:

◮ Capacity

C = 1 + ǫ log2(ǫ) + (1 − ǫ) log2(1 − ǫ)

◮ Cutoff rate

R0 = log2 2 1 + 2

  • ǫ(1 − ǫ)

Pinsker’s scheme 32 / 72

slide-64
SLIDE 64

Capacity and cutoff rate for the BSC

R0 and C R0/C

Pinsker’s scheme 33 / 72

slide-65
SLIDE 65

Pinsker’s scheme

Based on the observations that as ǫ → 0 R0(ǫ) C(ǫ) → 1 and R0(ǫ) → 1, Pinsker (1965) proposed concatenation scheme that achieved capacity within constant average cost per decoded bit irrespective of the level of reliability

Pinsker’s scheme 34 / 72

slide-66
SLIDE 66

Pinsker’s scheme

d1 CE1 u1 ˆ u1 SD1 ˆ d1 x1 W y1 d2 CE2 u2 ˆ u2 SD2 ˆ d2 x2 W y2 dK2 CEK2 uK2 ˆ uK2 SDK2 ˆ dK2 xN2 W yN2 Block encoder Block decoder (ML) K2 identical convolutional encoders N2 independent copies of W K2 independent sequential decoders

b b b b b b b b b b b

The inner block code does the initial clean-up at huge but finite complexity; the outer convolutional encoding (CE) and sequential decoding (SD) boosts the reliability at little extra cost.

Pinsker’s scheme 35 / 72

slide-67
SLIDE 67

Discussion

◮ Although Pinsker’s scheme made a very strong theoretical

point, it was not practical.

◮ There were many more attempts to go around the R0 barrier

in 1960s:

◮ D. Falconer, “A Hybrid Sequential and Algebraic Decoding

Scheme,” Sc.D. thesis, Dept. of Elec. Eng., M.I.T., 1966.

◮ I. Stiglitz, Iterative sequential decoding, IEEE Transactions on

Information Theory, vol. 15, no. 6, pp. 715721, Nov. 1969.

◮ F. Jelinek and J. Cocke, “Bootstrap hybrid decoding for

symmetrical binary input channels,” Inform. Contr., vol. 18,

  • no. 3, pp. 261-298, Apr. 1971.

◮ It is fair to say that none of these schemes had any practical

impact

Pinsker’s scheme 36 / 72

slide-68
SLIDE 68

Discussion

◮ Although Pinsker’s scheme made a very strong theoretical

point, it was not practical.

◮ There were many more attempts to go around the R0 barrier

in 1960s:

◮ D. Falconer, “A Hybrid Sequential and Algebraic Decoding

Scheme,” Sc.D. thesis, Dept. of Elec. Eng., M.I.T., 1966.

◮ I. Stiglitz, Iterative sequential decoding, IEEE Transactions on

Information Theory, vol. 15, no. 6, pp. 715721, Nov. 1969.

◮ F. Jelinek and J. Cocke, “Bootstrap hybrid decoding for

symmetrical binary input channels,” Inform. Contr., vol. 18,

  • no. 3, pp. 261-298, Apr. 1971.

◮ It is fair to say that none of these schemes had any practical

impact

Pinsker’s scheme 36 / 72

slide-69
SLIDE 69

Discussion

◮ Although Pinsker’s scheme made a very strong theoretical

point, it was not practical.

◮ There were many more attempts to go around the R0 barrier

in 1960s:

◮ D. Falconer, “A Hybrid Sequential and Algebraic Decoding

Scheme,” Sc.D. thesis, Dept. of Elec. Eng., M.I.T., 1966.

◮ I. Stiglitz, Iterative sequential decoding, IEEE Transactions on

Information Theory, vol. 15, no. 6, pp. 715721, Nov. 1969.

◮ F. Jelinek and J. Cocke, “Bootstrap hybrid decoding for

symmetrical binary input channels,” Inform. Contr., vol. 18,

  • no. 3, pp. 261-298, Apr. 1971.

◮ It is fair to say that none of these schemes had any practical

impact

Pinsker’s scheme 36 / 72

slide-70
SLIDE 70

Discussion

◮ Although Pinsker’s scheme made a very strong theoretical

point, it was not practical.

◮ There were many more attempts to go around the R0 barrier

in 1960s:

◮ D. Falconer, “A Hybrid Sequential and Algebraic Decoding

Scheme,” Sc.D. thesis, Dept. of Elec. Eng., M.I.T., 1966.

◮ I. Stiglitz, Iterative sequential decoding, IEEE Transactions on

Information Theory, vol. 15, no. 6, pp. 715721, Nov. 1969.

◮ F. Jelinek and J. Cocke, “Bootstrap hybrid decoding for

symmetrical binary input channels,” Inform. Contr., vol. 18,

  • no. 3, pp. 261-298, Apr. 1971.

◮ It is fair to say that none of these schemes had any practical

impact

Pinsker’s scheme 36 / 72

slide-71
SLIDE 71

Discussion

◮ Although Pinsker’s scheme made a very strong theoretical

point, it was not practical.

◮ There were many more attempts to go around the R0 barrier

in 1960s:

◮ D. Falconer, “A Hybrid Sequential and Algebraic Decoding

Scheme,” Sc.D. thesis, Dept. of Elec. Eng., M.I.T., 1966.

◮ I. Stiglitz, Iterative sequential decoding, IEEE Transactions on

Information Theory, vol. 15, no. 6, pp. 715721, Nov. 1969.

◮ F. Jelinek and J. Cocke, “Bootstrap hybrid decoding for

symmetrical binary input channels,” Inform. Contr., vol. 18,

  • no. 3, pp. 261-298, Apr. 1971.

◮ It is fair to say that none of these schemes had any practical

impact

Pinsker’s scheme 36 / 72

slide-72
SLIDE 72

Discussion

◮ Although Pinsker’s scheme made a very strong theoretical

point, it was not practical.

◮ There were many more attempts to go around the R0 barrier

in 1960s:

◮ D. Falconer, “A Hybrid Sequential and Algebraic Decoding

Scheme,” Sc.D. thesis, Dept. of Elec. Eng., M.I.T., 1966.

◮ I. Stiglitz, Iterative sequential decoding, IEEE Transactions on

Information Theory, vol. 15, no. 6, pp. 715721, Nov. 1969.

◮ F. Jelinek and J. Cocke, “Bootstrap hybrid decoding for

symmetrical binary input channels,” Inform. Contr., vol. 18,

  • no. 3, pp. 261-298, Apr. 1971.

◮ It is fair to say that none of these schemes had any practical

impact

Pinsker’s scheme 36 / 72

slide-73
SLIDE 73

R0 as practical capacity

◮ The failure to beat the cutoff rate bound in a meaningful

manner despite intense efforts elevated R0 to the status of a “realistic” limit to reliable communications

◮ R0 appears as the key figure-of-merit for communication

system design in the influential works of the period:

◮ Wozencraft and Jacobs, Principles of Communication

Engineering, 1965

◮ Wozencraft and Kennedy, “Modulation and demodulation for

probabilistic coding,” IT Trans.,1966

◮ Massey, “Coding and modulation in digital communications,”

Z¨ urich, 1974

◮ Forney (1995) gives a first-hand account of this situation in

his Shannon Lecture “Performance and Complexity”

Pinsker’s scheme 37 / 72

slide-74
SLIDE 74

R0 as practical capacity

◮ The failure to beat the cutoff rate bound in a meaningful

manner despite intense efforts elevated R0 to the status of a “realistic” limit to reliable communications

◮ R0 appears as the key figure-of-merit for communication

system design in the influential works of the period:

◮ Wozencraft and Jacobs, Principles of Communication

Engineering, 1965

◮ Wozencraft and Kennedy, “Modulation and demodulation for

probabilistic coding,” IT Trans.,1966

◮ Massey, “Coding and modulation in digital communications,”

Z¨ urich, 1974

◮ Forney (1995) gives a first-hand account of this situation in

his Shannon Lecture “Performance and Complexity”

Pinsker’s scheme 37 / 72

slide-75
SLIDE 75

R0 as practical capacity

◮ The failure to beat the cutoff rate bound in a meaningful

manner despite intense efforts elevated R0 to the status of a “realistic” limit to reliable communications

◮ R0 appears as the key figure-of-merit for communication

system design in the influential works of the period:

◮ Wozencraft and Jacobs, Principles of Communication

Engineering, 1965

◮ Wozencraft and Kennedy, “Modulation and demodulation for

probabilistic coding,” IT Trans.,1966

◮ Massey, “Coding and modulation in digital communications,”

Z¨ urich, 1974

◮ Forney (1995) gives a first-hand account of this situation in

his Shannon Lecture “Performance and Complexity”

Pinsker’s scheme 37 / 72

slide-76
SLIDE 76

R0 as practical capacity

◮ The failure to beat the cutoff rate bound in a meaningful

manner despite intense efforts elevated R0 to the status of a “realistic” limit to reliable communications

◮ R0 appears as the key figure-of-merit for communication

system design in the influential works of the period:

◮ Wozencraft and Jacobs, Principles of Communication

Engineering, 1965

◮ Wozencraft and Kennedy, “Modulation and demodulation for

probabilistic coding,” IT Trans.,1966

◮ Massey, “Coding and modulation in digital communications,”

Z¨ urich, 1974

◮ Forney (1995) gives a first-hand account of this situation in

his Shannon Lecture “Performance and Complexity”

Pinsker’s scheme 37 / 72

slide-77
SLIDE 77

R0 as practical capacity

◮ The failure to beat the cutoff rate bound in a meaningful

manner despite intense efforts elevated R0 to the status of a “realistic” limit to reliable communications

◮ R0 appears as the key figure-of-merit for communication

system design in the influential works of the period:

◮ Wozencraft and Jacobs, Principles of Communication

Engineering, 1965

◮ Wozencraft and Kennedy, “Modulation and demodulation for

probabilistic coding,” IT Trans.,1966

◮ Massey, “Coding and modulation in digital communications,”

Z¨ urich, 1974

◮ Forney (1995) gives a first-hand account of this situation in

his Shannon Lecture “Performance and Complexity”

Pinsker’s scheme 37 / 72

slide-78
SLIDE 78

R0 as practical capacity

◮ The failure to beat the cutoff rate bound in a meaningful

manner despite intense efforts elevated R0 to the status of a “realistic” limit to reliable communications

◮ R0 appears as the key figure-of-merit for communication

system design in the influential works of the period:

◮ Wozencraft and Jacobs, Principles of Communication

Engineering, 1965

◮ Wozencraft and Kennedy, “Modulation and demodulation for

probabilistic coding,” IT Trans.,1966

◮ Massey, “Coding and modulation in digital communications,”

Z¨ urich, 1974

◮ Forney (1995) gives a first-hand account of this situation in

his Shannon Lecture “Performance and Complexity”

Pinsker’s scheme 37 / 72

slide-79
SLIDE 79

Other attempts to boost the cutoff rate

Efforts to beat the cutoff rate continues to this day

◮ D. J. Costello and F. Jelinek, 1972. ◮ P. R. Chevillat and D. J. Costello Jr., 1977. ◮ F. Hemmati, 1990. ◮ B. Radosavljevic, E. Arıkan, B. Hajek, 1992. ◮ J. Belzile and D. Haccoun, 1993. ◮ S. Kallel and K. Li, 1997. ◮ E. Arıkan, 2006 ◮ ...

Pinsker’s scheme 38 / 72

slide-80
SLIDE 80

Other attempts to boost the cutoff rate

Efforts to beat the cutoff rate continues to this day

◮ D. J. Costello and F. Jelinek, 1972. ◮ P. R. Chevillat and D. J. Costello Jr., 1977. ◮ F. Hemmati, 1990. ◮ B. Radosavljevic, E. Arıkan, B. Hajek, 1992. ◮ J. Belzile and D. Haccoun, 1993. ◮ S. Kallel and K. Li, 1997. ◮ E. Arıkan, 2006 ◮ ...

In fact, polar coding originates from such attempts.

Pinsker’s scheme 38 / 72

slide-81
SLIDE 81

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker’s scheme Massey’s scheme Polar coding

Massey’s scheme 39 / 72

slide-82
SLIDE 82

The R0 debate

A case study by McEliece (1980) cast a big doubt on the significance of R0 as a practical limit

◮ McEliece’s study was concerned with a

Pulse Position Modulation (PPM) scheme, modeled as a q-ary erasure channel

◮ Capacity: C(q) = (1 − ǫ) log q ◮ Cutoff rate: R0(q) = log q 1+(q−1)ǫ ◮ As the bandwidth (q) grew,

R0(q) C(q) → 0

◮ Algebraic coding (Reed-Solomon) scored

a big win over probabilistic coding!

2 3 q 1 2 3 q 1 ? 1−ε ε Massey’s scheme 40 / 72

slide-83
SLIDE 83

The R0 debate

A case study by McEliece (1980) cast a big doubt on the significance of R0 as a practical limit

◮ McEliece’s study was concerned with a

Pulse Position Modulation (PPM) scheme, modeled as a q-ary erasure channel

◮ Capacity: C(q) = (1 − ǫ) log q ◮ Cutoff rate: R0(q) = log q 1+(q−1)ǫ ◮ As the bandwidth (q) grew,

R0(q) C(q) → 0

◮ Algebraic coding (Reed-Solomon) scored

a big win over probabilistic coding!

2 3 q 1 2 3 q 1 ? 1−ε ε Massey’s scheme 41 / 72

slide-84
SLIDE 84

Massey meets the challenge

◮ Massey (1981) showed that there was a

different way of doing coding and modulation on a q-ary erasure channel that boosted R0 effortlessly

◮ Paradoxically, as Massey restored the

status of R0, he exhibited the “flaky” nature of this parameter

Massey’s scheme 42 / 72

slide-85
SLIDE 85

Massey meets the challenge

◮ Massey (1981) showed that there was a

different way of doing coding and modulation on a q-ary erasure channel that boosted R0 effortlessly

◮ Paradoxically, as Massey restored the

status of R0, he exhibited the “flaky” nature of this parameter

Massey’s scheme 42 / 72

slide-86
SLIDE 86

Channel splitting to boost cutoff rate (Massey, 1981)

2 3 4 1 2 3 4 1 ? 1−ε ε 1−ε ε 00 01 10 11 11 10 01 00 ?? 1−ε ε 1 1 ? 1−ε ε 1 1 ?

◮ Begin with a quaternary erasure channel (QEC)

Massey’s scheme 43 / 72

slide-87
SLIDE 87

Channel splitting to boost cutoff rate (Massey, 1981)

2 3 4 1 2 3 4 1 ? 1−ε ε 1−ε ε 00 01 10 11 11 10 01 00 ?? 1−ε ε 1 1 ? 1−ε ε 1 1 ?

◮ Relabel the inputs

Massey’s scheme 44 / 72

slide-88
SLIDE 88

Channel splitting to boost cutoff rate (Massey, 1981)

2 3 4 1 2 3 4 1 ? 1−ε ε 1−ε ε 00 01 10 11 11 10 01 00 ?? 1−ε ε 1 1 ? 1−ε ε 1 1 ?

◮ Split the QEC into two binary erasure channels (BEC) ◮ BECs fully correlated: erasures occur jointly

Massey’s scheme 45 / 72

slide-89
SLIDE 89

Capacity, cutoff rate for one QEC vs two BECs

Ordinary coding of QEC C(QEC) = 2(1 − ǫ) R0(QEC) = log

4 1+3ǫ

E QEC D Independent coding of BECs C(BEC) = (1 − ǫ) R0(BEC) = log

2 1+ǫ

E BEC D E BEC D

Massey’s scheme 46 / 72

slide-90
SLIDE 90

Capacity, cutoff rate for one QEC vs two BECs

Ordinary coding of QEC C(QEC) = 2(1 − ǫ) R0(QEC) = log

4 1+3ǫ

E QEC D Independent coding of BECs C(BEC) = (1 − ǫ) R0(BEC) = log

2 1+ǫ

E BEC D E BEC D

◮ C(QEC) = 2 × C(BEC)

Massey’s scheme 46 / 72

slide-91
SLIDE 91

Capacity, cutoff rate for one QEC vs two BECs

Ordinary coding of QEC C(QEC) = 2(1 − ǫ) R0(QEC) = log

4 1+3ǫ

E QEC D Independent coding of BECs C(BEC) = (1 − ǫ) R0(BEC) = log

2 1+ǫ

E BEC D E BEC D

◮ C(QEC) = 2 × C(BEC) ◮ R0(QEC) ≤ 2 × R0(BEC) with equality iff ǫ = 0 or 1.

Massey’s scheme 46 / 72

slide-92
SLIDE 92

Cutoff rate improvement by splitting

erasure probability (ǫ) 1 1 2 capacity and cutoff rate (bits) QEC capacity QEC cutoff rate 2 × BEC cutoff rate

Massey’s scheme 47 / 72

slide-93
SLIDE 93

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-94
SLIDE 94

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-95
SLIDE 95

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-96
SLIDE 96

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-97
SLIDE 97

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-98
SLIDE 98

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-99
SLIDE 99

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-100
SLIDE 100

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-101
SLIDE 101

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-102
SLIDE 102

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-103
SLIDE 103

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-104
SLIDE 104

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-105
SLIDE 105

Comparison of Pinsker’s and Massey’s schemes

◮ Pinsker

◮ Construct a superchannel by combining independent copies of

a given DMC W

◮ Split the superchannel into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Can be used universally ◮ Can achieve capacity ◮ Not practical

◮ Massey

◮ Split the given DMC W into correlated subchannels ◮ Ignore correlations between the subchannels, encode and

decode them independently

◮ Applicable only to specific channels ◮ Cannot achieve capacity ◮ Practical Massey’s scheme 48 / 72

slide-106
SLIDE 106

A conservation law for the cutoff rate

Memoryless Channel W

Derived (Vector) Channel

Block Encoder Block Decoder N N K K Rate K/N

◮ “Parallel channels” theorem (Gallager, 1965)

R0(Derived vector channel) ≤ N R0(W )

◮ “Cleaning up” the channel by pre-/post-processing can only

hurt R0

◮ Shows that boosting cutoff rate requires more than one

sequential decoder

Massey’s scheme 49 / 72

slide-107
SLIDE 107

A conservation law for the cutoff rate

Memoryless Channel W

Derived (Vector) Channel

Block Encoder Block Decoder N N K K Rate K/N

◮ “Parallel channels” theorem (Gallager, 1965)

R0(Derived vector channel) ≤ N R0(W )

◮ “Cleaning up” the channel by pre-/post-processing can only

hurt R0

◮ Shows that boosting cutoff rate requires more than one

sequential decoder

Massey’s scheme 49 / 72

slide-108
SLIDE 108

A conservation law for the cutoff rate

Memoryless Channel W

Derived (Vector) Channel

Block Encoder Block Decoder N N K K Rate K/N

◮ “Parallel channels” theorem (Gallager, 1965)

R0(Derived vector channel) ≤ N R0(W )

◮ “Cleaning up” the channel by pre-/post-processing can only

hurt R0

◮ Shows that boosting cutoff rate requires more than one

sequential decoder

Massey’s scheme 49 / 72

slide-109
SLIDE 109

A conservation law for the cutoff rate

Memoryless Channel W

Derived (Vector) Channel

Block Encoder Block Decoder N N K K Rate K/N

◮ “Parallel channels” theorem (Gallager, 1965)

R0(Derived vector channel) ≤ N R0(W )

◮ “Cleaning up” the channel by pre-/post-processing can only

hurt R0

◮ Shows that boosting cutoff rate requires more than one

sequential decoder

Massey’s scheme 49 / 72

slide-110
SLIDE 110

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker’s scheme Massey’s scheme Polar coding

Polar coding 50 / 72

slide-111
SLIDE 111

Prescription for a new scheme

◮ Consider small constructions ◮ Retain independent encoding for the subchannels ◮ Do not ignore correlations between subchannels at the

expense of capacity

◮ This points to multi-level coding and successive cancellation

decoding

Polar coding 51 / 72

slide-112
SLIDE 112

Multi-stage decoding architecture

d1 CE1 u1 x1 W y1 ℓ1 ˆ u1 SD1 ˆ d1 d2 CE2 u2 x2 W y2 ℓ2 ˆ u2 SD2 ˆ d2 dN CEN uN xN W yN ℓN ˆ uN SDN ˆ dN One-to-one mapper fN Soft-decision generator gN N convolutional encoders N independent copies of W N sequential decoders

b b b b b b b b b b b b b b b

Channel WN

Polar coding 52 / 72

slide-113
SLIDE 113

Prescription for a new scheme

◮ Consider small constructions ◮ Retain independent encoding for the subchannels ◮ Do not ignore correlations between subchannels at the

expense of capacity

◮ This points to multi-level coding and successive cancellation

decoding

Polar coding 53 / 72

slide-114
SLIDE 114

Notation

◮ Let V : F2 ∆

= {0, 1} → Y be an arbitrary binary-input memoryless channel

◮ Let (X, Y ) be an input-output ensemble for channel V with

X uniform on F2

◮ The (symmetric) capacity is defined as

I(V ) ∆ = I(X; Y ) ∆ =

  • y∈Y
  • x∈F2

1 2V (y|x) log

V (y|x)

1 2V (y|0) + 1 2V (y|1) ◮ The (symmetric) cutoff rate is defined as

R0(V ) ∆ = R0(X; Y ) ∆ = − log

  • y∈Y

 

x∈F2 1 2

  • V (y|x)

 

2

Polar coding 54 / 72

slide-115
SLIDE 115

The basic construction

Given two copies of a binary input channel W : F2

= {0, 1} → Y

W W X1 X2 Y1 Y2

consider the transformation above to generate two channels W − : F2 → Y2 and W + : F2 → Y2 × F2 with W −(y1y2|u1) =

  • u2

1 2W (y1|u1 + u2)W (y2|u2)

W +(y1y2u1|u2) = 1

2W (y1|u1 + u2)W (y2|u2)

Polar coding 55 / 72

slide-116
SLIDE 116

The basic construction

Given two copies of a binary input channel W : F2

= {0, 1} → Y

W W U1 U2 + Y1 Y2

consider the transformation above to generate two channels W − : F2 → Y2 and W + : F2 → Y2 × F2 with W −(y1y2|u1) =

  • u2

1 2W (y1|u1 + u2)W (y2|u2)

W +(y1y2u1|u2) = 1

2W (y1|u1 + u2)W (y2|u2)

Polar coding 55 / 72

slide-117
SLIDE 117

The basic construction

Given two copies of a binary input channel W : F2

= {0, 1} → Y

W W U1 U2 + Y1 Y2

consider the transformation above to generate two channels W − : F2 → Y2 and W + : F2 → Y2 × F2 with W −(y1y2|u1) =

  • u2

1 2W (y1|u1 + u2)W (y2|u2)

W +(y1y2u1|u2) = 1

2W (y1|u1 + u2)W (y2|u2)

Polar coding 55 / 72

slide-118
SLIDE 118

The basic construction

Given two copies of a binary input channel W : F2

= {0, 1} → Y

W W U1 U2 + Y1 Y2

consider the transformation above to generate two channels W − : F2 → Y2 and W + : F2 → Y2 × F2 with W −(y1y2|u1) =

  • u2

1 2W (y1|u1 + u2)W (y2|u2)

W +(y1y2u1|u2) = 1

2W (y1|u1 + u2)W (y2|u2)

Polar coding 55 / 72

slide-119
SLIDE 119

The basic construction

Given two copies of a binary input channel W : F2

= {0, 1} → Y

W W U1 U2 + Y1 Y2

consider the transformation above to generate two channels W − : F2 → Y2 and W + : F2 → Y2 × F2 with W −(y1y2|u1) =

  • u2

1 2W (y1|u1 + u2)W (y2|u2)

W +(y1y2u1|u2) = 1

2W (y1|u1 + u2)W (y2|u2)

Polar coding 55 / 72

slide-120
SLIDE 120

The basic construction

Given two copies of a binary input channel W : F2

= {0, 1} → Y

W W U1 U2 + Y1 Y2

consider the transformation above to generate two channels W − : F2 → Y2 and W + : F2 → Y2 × F2 with W −(y1y2|u1) =

  • u2

1 2W (y1|u1 + u2)W (y2|u2)

W +(y1y2u1|u2) = 1

2W (y1|u1 + u2)W (y2|u2)

Polar coding 55 / 72

slide-121
SLIDE 121

The 2x2 transformation is information lossless

◮ With independent, uniform U1, U2,

I(W −) = I(U1; Y1Y2), I(W +) = I(U2; Y1Y2U1).

◮ Thus,

I(W −) + I(W +) = I(U1U2; Y1Y2) = 2I(W ),

◮ and I(W −) ≤ I(W ) ≤ I(W +).

Polar coding 56 / 72

slide-122
SLIDE 122

The 2x2 transformation “creates” cutoff rate

With independent, uniform U1, U2, R0(W −) = R0(U1; Y1Y2), R0(W +) = R0(U2; Y1Y2U1).

Theorem (2005)

Correlation helps create cutoff rate: R0(W −) + R0(W +) ≥ 2R0(W ) with equality iff W is a perfect channel, I(W ) = 1, or a pure noise channel, I(W ) = 0. Cutoff rates start polarizing: R0(W −) ≤ R0(W ) ≤ R0(W +)

Polar coding 57 / 72

slide-123
SLIDE 123

The 2x2 transformation “creates” cutoff rate

With independent, uniform U1, U2, R0(W −) = R0(U1; Y1Y2), R0(W +) = R0(U2; Y1Y2U1).

Theorem (2005)

Correlation helps create cutoff rate: R0(W −) + R0(W +) ≥ 2R0(W ) with equality iff W is a perfect channel, I(W ) = 1, or a pure noise channel, I(W ) = 0. Cutoff rates start polarizing: R0(W −) ≤ R0(W ) ≤ R0(W +)

Polar coding 57 / 72

slide-124
SLIDE 124

The 2x2 transformation “creates” cutoff rate

With independent, uniform U1, U2, R0(W −) = R0(U1; Y1Y2), R0(W +) = R0(U2; Y1Y2U1).

Theorem (2005)

Correlation helps create cutoff rate: R0(W −) + R0(W +) ≥ 2R0(W ) with equality iff W is a perfect channel, I(W ) = 1, or a pure noise channel, I(W ) = 0. Cutoff rates start polarizing: R0(W −) ≤ R0(W ) ≤ R0(W +)

Polar coding 57 / 72

slide-125
SLIDE 125

Recursive continuation

Do the same recursively: Given W ,

◮ Duplicate W and obtain

W − and W +.

◮ Duplicate W − (W +), ◮ and obtain W −− and

W −+ (W +− and W ++).

◮ Duplicate W −− (W −+,

W +−, W ++) and obtain W −−− and W −−+ (W −+−, W −++, W +−−, W +−+, W ++−, W +++).

◮ . . .

Polar coding 58 / 72

slide-126
SLIDE 126

Recursive continuation

Do the same recursively: Given W ,

◮ Duplicate W and obtain

W − and W +.

◮ Duplicate W − (W +), ◮ and obtain W −− and

W −+ (W +− and W ++).

◮ Duplicate W −− (W −+,

W +−, W ++) and obtain W −−− and W −−+ (W −+−, W −++, W +−−, W +−+, W ++−, W +++).

◮ . . .

Polar coding 58 / 72

slide-127
SLIDE 127

Recursive continuation

Do the same recursively: Given W ,

◮ Duplicate W and obtain

W − and W +.

◮ Duplicate W − (W +), ◮ and obtain W −− and

W −+ (W +− and W ++).

◮ Duplicate W −− (W −+,

W +−, W ++) and obtain W −−− and W −−+ (W −+−, W −++, W +−−, W +−+, W ++−, W +++).

◮ . . .

Polar coding 58 / 72

slide-128
SLIDE 128

Recursive continuation

Do the same recursively: Given W ,

◮ Duplicate W and obtain

W − and W +.

◮ Duplicate W − (W +), ◮ and obtain W −− and

W −+ (W +− and W ++).

◮ Duplicate W −− (W −+,

W +−, W ++) and obtain W −−− and W −−+ (W −+−, W −++, W +−−, W +−+, W ++−, W +++).

◮ . . .

Polar coding 58 / 72

slide-129
SLIDE 129

Recursive continuation

Do the same recursively: Given W ,

◮ Duplicate W and obtain

W − and W +.

◮ Duplicate W − (W +), ◮ and obtain W −− and

W −+ (W +− and W ++).

◮ Duplicate W −− (W −+,

W +−, W ++) and obtain W −−− and W −−+ (W −+−, W −++, W +−−, W +−+, W ++−, W +++).

◮ . . .

Polar coding 58 / 72

slide-130
SLIDE 130

Recursive continuation

Do the same recursively: Given W ,

◮ Duplicate W and obtain

W − and W +.

◮ Duplicate W − (W +), ◮ and obtain W −− and

W −+ (W +− and W ++).

◮ Duplicate W −− (W −+,

W +−, W ++) and obtain W −−− and W −−+ (W −+−, W −++, W +−−, W +−+, W ++−, W +++).

◮ . . .

Polar coding 58 / 72

slide-131
SLIDE 131

Polarization Process

Evolution of I = I(W ), I + = I(W +), I − = I(W −), etc.

1 I

Polar coding 59 / 72

slide-132
SLIDE 132

Polarization Process

Evolution of I = I(W ), I + = I(W +), I − = I(W −), etc.

1 1 I I + I −

Polar coding 60 / 72

slide-133
SLIDE 133

Polarization Process

Evolution of I = I(W ), I + = I(W +), I − = I(W −), etc.

1 1 2 I I + I − I ++ I −+ I +− I −−

Polar coding 61 / 72

slide-134
SLIDE 134

Polarization Process

Evolution of I = I(W ), I + = I(W +), I − = I(W −), etc.

1 1 2 3 I I + I − I ++ I −+ I +− I −−

Polar coding 62 / 72

slide-135
SLIDE 135

Polarization Process

Evolution of I = I(W ), I + = I(W +), I − = I(W −), etc.

1 1 2 3 4 I I + I − I ++ I −+ I +− I −−

Polar coding 63 / 72

slide-136
SLIDE 136

Polarization Process

Evolution of I = I(W ), I + = I(W +), I − = I(W −), etc.

1 1 2 3 4 5 I I + I − I ++ I −+ I +− I −−

Polar coding 64 / 72

slide-137
SLIDE 137

Polarization Process

Evolution of I = I(W ), I + = I(W +), I − = I(W −), etc.

1 1 2 3 4 5 6 I I + I − I ++ I −+ I +− I −−

Polar coding 65 / 72

slide-138
SLIDE 138

Polarization Process

Evolution of I = I(W ), I + = I(W +), I − = I(W −), etc.

1 1 2 3 4 5 6 7 I I + I − I ++ I −+ I +− I −−

Polar coding 66 / 72

slide-139
SLIDE 139

Polarization Process

Evolution of I = I(W ), I + = I(W +), I − = I(W −), etc.

1 1 2 3 4 5 6 7 8 I I + I − I ++ I −+ I +− I −−

Polar coding 67 / 72

slide-140
SLIDE 140

Cutoff Rate Polarization

Theorem (2006)

The cutoff rates {R0(Ui; Y NUi−1)} of the channels created by the recursive transformation converge to their extremal values, i.e., 1 N #

  • i : R0(Ui; Y NUi−1) ≈ 1
  • → I(W )

and 1 N #

  • i : R0(Ui; Y NUi−1) ≈ 0
  • → 1 − I(W ).

Remark: {I(Ui; Y NUi−1)} also polarize.

Polar coding 68 / 72

slide-141
SLIDE 141

Cutoff Rate Polarization

Theorem (2006)

The cutoff rates {R0(Ui; Y NUi−1)} of the channels created by the recursive transformation converge to their extremal values, i.e., 1 N #

  • i : R0(Ui; Y NUi−1) ≈ 1
  • → I(W )

and 1 N #

  • i : R0(Ui; Y NUi−1) ≈ 0
  • → 1 − I(W ).

Remark: {I(Ui; Y NUi−1)} also polarize.

Polar coding 68 / 72

slide-142
SLIDE 142

Cutoff Rate Polarization

Theorem (2006)

The cutoff rates {R0(Ui; Y NUi−1)} of the channels created by the recursive transformation converge to their extremal values, i.e., 1 N #

  • i : R0(Ui; Y NUi−1) ≈ 1
  • → I(W )

and 1 N #

  • i : R0(Ui; Y NUi−1) ≈ 0
  • → 1 − I(W ).

Remark: {I(Ui; Y NUi−1)} also polarize.

Polar coding 68 / 72

slide-143
SLIDE 143

Cutoff Rate Polarization

Theorem (2006)

The cutoff rates {R0(Ui; Y NUi−1)} of the channels created by the recursive transformation converge to their extremal values, i.e., 1 N #

  • i : R0(Ui; Y NUi−1) ≈ 1
  • → I(W )

and 1 N #

  • i : R0(Ui; Y NUi−1) ≈ 0
  • → 1 − I(W ).

Remark: {I(Ui; Y NUi−1)} also polarize.

Polar coding 68 / 72

slide-144
SLIDE 144

Sequential decoding with successive cancellation

◮ Use the recursive construction to generate N bit-channels

with cutoff rates R0(Ui; Y NUi−1), 1 ≤ i ≤ N.

◮ Encode the bit-channels independently using convolutional

coding

◮ Decode the bit-channels one by one using sequential decoding

and successive cancellation

◮ Achievable sum cutoff rate is N

  • i=1

R0(Ui; Y NUi−1) which approaches N I(W ) as N increases.

Polar coding 69 / 72

slide-145
SLIDE 145

Final step: Doing away with sequential decoding

◮ Due to polarization, rate loss is negligible if one does not use

the “bad” bit-channels

◮ Rate of polarization is strong enough that a vanishing frame

error rate can be achieved even if the “good” bit-channels are used uncoded

◮ The resulting system has no convolutional encoding and

sequential decoding, only successive cancellation decoding

Polar coding 70 / 72

slide-146
SLIDE 146

Polar coding

To communicate at rate R < I(W ):

◮ Pick N, and K = NR good indices i such that I(Ui; Y NUi−1)

is high,

◮ let the transmitter set Ui to be uncoded binary data for good

indices, and set Ui to random but publicly known values for the rest,

◮ let the receiver decode the Ui successively: U1 from Y N; Ui

from Y N ˆ Ui−1.

Polar coding 71 / 72

slide-147
SLIDE 147

Polar coding

To communicate at rate R < I(W ):

◮ Pick N, and K = NR good indices i such that I(Ui; Y NUi−1)

is high,

◮ let the transmitter set Ui to be uncoded binary data for good

indices, and set Ui to random but publicly known values for the rest,

◮ let the receiver decode the Ui successively: U1 from Y N; Ui

from Y N ˆ Ui−1.

Polar coding 71 / 72

slide-148
SLIDE 148

Polar coding

To communicate at rate R < I(W ):

◮ Pick N, and K = NR good indices i such that I(Ui; Y NUi−1)

is high,

◮ let the transmitter set Ui to be uncoded binary data for good

indices, and set Ui to random but publicly known values for the rest,

◮ let the receiver decode the Ui successively: U1 from Y N; Ui

from Y N ˆ Ui−1.

Polar coding 71 / 72

slide-149
SLIDE 149

Polar coding

To communicate at rate R < I(W ):

◮ Pick N, and K = NR good indices i such that I(Ui; Y NUi−1)

is high,

◮ let the transmitter set Ui to be uncoded binary data for good

indices, and set Ui to random but publicly known values for the rest,

◮ let the receiver decode the Ui successively: U1 from Y N; Ui

from Y N ˆ Ui−1.

Polar coding 71 / 72

slide-150
SLIDE 150

Polar coding complexity and performance

Theorem (2007)

With the particular one-to-one mapping described here and with the successive cancellation decoding, polar codes achieve the capacity I(W ) with

◮ encoding complexity N log N, ◮ decoding complexity N log N, ◮ and probability of frame error better than 2−N0.49

Polar coding 72 / 72

slide-151
SLIDE 151

Polar coding complexity and performance

Theorem (2007)

With the particular one-to-one mapping described here and with the successive cancellation decoding, polar codes achieve the capacity I(W ) with

◮ encoding complexity N log N, ◮ decoding complexity N log N, ◮ and probability of frame error better than 2−N0.49

Polar coding 72 / 72

slide-152
SLIDE 152

Polar coding complexity and performance

Theorem (2007)

With the particular one-to-one mapping described here and with the successive cancellation decoding, polar codes achieve the capacity I(W ) with

◮ encoding complexity N log N, ◮ decoding complexity N log N, ◮ and probability of frame error better than 2−N0.49

Polar coding 72 / 72

slide-153
SLIDE 153

Polar coding complexity and performance

Theorem (2007)

With the particular one-to-one mapping described here and with the successive cancellation decoding, polar codes achieve the capacity I(W ) with

◮ encoding complexity N log N, ◮ decoding complexity N log N, ◮ and probability of frame error better than 2−N0.49

Polar coding 72 / 72

slide-154
SLIDE 154

Polar coding complexity and performance

Theorem (2007)

With the particular one-to-one mapping described here and with the successive cancellation decoding, polar codes achieve the capacity I(W ) with

◮ encoding complexity N log N, ◮ decoding complexity N log N, ◮ and probability of frame error better than 2−N0.49

Polar coding 72 / 72

slide-155
SLIDE 155

Next lecture

◮ Details of the construction, encoding and decoding algorithms ◮ Survey of important results about polar codes ◮ Potential for applications

Polar coding 73 / 72

slide-156
SLIDE 156

Next lecture

◮ Details of the construction, encoding and decoding algorithms ◮ Survey of important results about polar codes ◮ Potential for applications

Polar coding 73 / 72

slide-157
SLIDE 157

Next lecture

◮ Details of the construction, encoding and decoding algorithms ◮ Survey of important results about polar codes ◮ Potential for applications

Polar coding 73 / 72

slide-158
SLIDE 158

Thank you!

Polar coding 74 / 72