Lecture 6 Polar Coding I-Hsiang Wang Department of Electrical - - PowerPoint PPT Presentation

lecture 6 polar coding
SMART_READER_LITE
LIVE PREVIEW

Lecture 6 Polar Coding I-Hsiang Wang Department of Electrical - - PowerPoint PPT Presentation

Lecture 6 Polar Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 5, 2016 1 / 63 I-Hsiang Wang IT Lecture 6 In Pursuit of Shannon's Limit Since 1948, Shannon's theory has drawn


slide-1
SLIDE 1

Lecture 6 Polar Coding

I-Hsiang Wang

Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw

December 5, 2016

1 / 63 I-Hsiang Wang IT Lecture 6

slide-2
SLIDE 2

In Pursuit of Shannon's Limit

Since 1948, Shannon's theory has drawn the sharp boundary between the possible and the impossible in data compression and data transmission. Once fundamental limits are characterized, the next natural question is:

How to achieve these limits with acceptable complexity?

For lossless source coding, it did not take us too long to find optimal schemes with low complexity: Huffman Code (1952): optimal for memoryless source Lempel-Ziv (1977): optimal for stationary ergodic source On the other hand, for channel coding and lossy source coding, it turns out to be much harder. It has been the holy grail for coding theorist to find codes that achieve Shannon's limit with low complexity.

2 / 63 I-Hsiang Wang IT Lecture 6

slide-3
SLIDE 3

In Pursuit of Capacity-Achieving Codes

Two barriers in pursuing low-complexity capacity-achieving codes:

1 Lack of explicit construction. In Shannon's proof, it is only proved that there exists coding

schemes that achieve capacity.

2 Lack of structure to reduce complexity. In the proof of coding theorems, complexity issues

are often neglected, while codes with structures are hard to prove to achieve capacity. Since 90's, several practical codes were found to approach capacity – turbo code, low-density parity-check (LDPC) code, etc. They perform well empirically, but lack rigorous proof of optimality. The first provably capacity-achieving coding scheme with acceptable complexity is polar code, introduced by Erdal Arıkan in 2007. Later in 2012, spatially coupled LDPC codes were also shown to achieve capacity (Shrinivas Kudekar, Tom Richardson, and Rüediger Urbanke).

3 / 63 I-Hsiang Wang IT Lecture 6

slide-4
SLIDE 4

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009 3051

Channel Polarization: A Method for Constructing Capacity-Achieving Codes for Symmetric Binary-Input Memoryless Channels

Erdal Arıkan, Senior Member, IEEE

The paper wins the 2010 Information Theory Society Best Paper Award.

4 / 63 I-Hsiang Wang IT Lecture 6

slide-5
SLIDE 5

Overview

When Arıkan introduced polar codes in 2007, he focus on achieving capacity for the general binary-input memoryless symmetric channels (BMSC), including BSC, BEC, etc. Later, polar codes are shown to be optimal in many other settings, including lossy source coding, non-binary-input channels, multiple access channels, channel coding with encoder side information (Gelfand-Pinsker), source coding with side information (Wyner-Ziv), etc. Instead of giving a comprehensive introduction, we shall focus on polar coding for channel coding. The outline is as follows:

1 First we introduce the concept of channel polarization. 2 Second we explore polar coding for binary input channels. 3 Finally we briefly talk about polar coding for source coding (source polarization).

5 / 63 I-Hsiang Wang IT Lecture 6

slide-6
SLIDE 6

Notations

In channel coding, we use the DMC N times where N is the blocklength of the coding scheme. Since the channel is the main focus, we use the following notations throughout this lecture:

W to denote the channel PY |X P to denote the input distribution PX I (P, W ) to denote I (X ; Y ).

Since we focus on BMSC, and X ∼ Ber

( 1

2

)

achieves the channel capacity of any BMSC, we shall use I (W ) (slight abuse of notation) to denote I (P, W ) when the input P is Ber

(1

2

)

. In other words, the channel capacity of a BMSC W is I (W ).

6 / 63 I-Hsiang Wang IT Lecture 6

slide-7
SLIDE 7

Polarization

1

Polarization Basic Channel Transformation Channel Polarization

2

Polar Coding Encoding and Decoding Architectures Performance Analysis

7 / 63 I-Hsiang Wang IT Lecture 6

slide-8
SLIDE 8

Polarization

Single Usage of Channel W

X Y W

N Usage of Channel W

. . . ENC DEC W W W M ˆ M X1 X2 XN Y1 Y2 YN

8 / 63 I-Hsiang Wang IT Lecture 6

slide-9
SLIDE 9

Polarization

Arıkan's Idea

. . . Pre- Processing W W W X1 X2 XN Y1 Y2 YN UN U2 U1 Post- Processing V1 V2 VN

Apply special transforms to both input and output

9 / 63 I-Hsiang Wang IT Lecture 6

slide-10
SLIDE 10

Polarization

Arıkan's Idea

W1 . . . W2 WN UN U2 U1 V1 V2 VN

10 / 63 I-Hsiang Wang IT Lecture 6

slide-11
SLIDE 11

Polarization

Arıkan's Idea

W1 . . . W2 WN UN U2 U1 V1 V2 VN

Roughly NI (W ) channels with capacity ≈ 1

11 / 63 I-Hsiang Wang IT Lecture 6

slide-12
SLIDE 12

Polarization

Arıkan's Idea

W1 . . . W2 WN UN U2 U1 V1 V2 VN

Roughly NI (W ) channels with capacity ≈ 1 Roughly N (1 − I (W )) channels with capacity ≈ 0

Equivalently some perfect channels and some useless channels −

→ Polarization

Coding becomes extremely simple: simply use those perfect channels for uncoded transmission, and throw those useless channels away.

12 / 63 I-Hsiang Wang IT Lecture 6

slide-13
SLIDE 13

Polarization Basic Channel Transformation

1

Polarization Basic Channel Transformation Channel Polarization

2

Polar Coding Encoding and Decoding Architectures Performance Analysis

13 / 63 I-Hsiang Wang IT Lecture 6

slide-14
SLIDE 14

Polarization Basic Channel Transformation

Arıkan's Basic Channel Transformation

Consider two channel uses of W:

14 / 63 I-Hsiang Wang IT Lecture 6

X1 X2 W W Y1 Y2

slide-15
SLIDE 15

Polarization Basic Channel Transformation

Arıkan's Basic Channel Transformation

Consider two channel uses of W: Apply the pre-processor: X1 = U1 ⊕ U2, X2 = U2, where U1 ⊥

⊥ U2, U1, U2 ∼ Ber ( 1

2

)

. We now have two synthetic channels induced by the above procedure:

W− : U1 → V1 ≜ (Y1, Y2) W+ : U2 → V2 ≜ (Y1, Y2, U1)

The above transform yields the following two crucial phenomenon:

I (W− ) ≤ I (W ) ≤ I (W+ )

(Polarization)

I (W− ) + I (W+ ) = 2I (W )

(Conservation of Information)

15 / 63 I-Hsiang Wang IT Lecture 6

W W Y1 Y2 U2 U1

slide-16
SLIDE 16

Polarization Basic Channel Transformation

Example: Binary Erasure Channel

Example 1 Let W be a BEC with erasure probability ε ∈ (0, 1), and I (W ) = 1 − ε. Find the values of I (W− ) and I (W+ ), and verify the above properties. sol: Intuitively W− is worse than W and W+ is better than W: For W−, input is U1, output is (Y1, Y2): Only when both Y1 and Y2 are not erased, one can figure out U1! = ⇒ W− is BEC with erasure probability 1 − (1 − ε)2 = 2ε − ε2. For W+, input is U2, output is (Y1, Y2, U1): As long as one of Y1 and Y2 are not erased, one can figure

  • ut U2! =

⇒ W+ is BEC with erasure probability ε2. Hence, I (W− ) = 1 − 2ε + ε2 and I (W+ ) = 1 − ε2.

16 / 63 I-Hsiang Wang IT Lecture 6

slide-17
SLIDE 17

Polarization Basic Channel Transformation

Example: Binary Symmetric Channel

Example 2 Let W be a BSC with crossover probability p ∈ (0, 1), and I (W ) = 1 − Hb (p). Find the values of

I (W− ) and I (W+ ).

17 / 63 I-Hsiang Wang IT Lecture 6

slide-18
SLIDE 18

Polarization Basic Channel Transformation

Basic Properties

Theorem 1 For any BMSC W and the induced channels {W−, W+} from Arıkan's basic transformation, we have

I (W− ) ≤ I (W ) ≤ I (W+ ) with equality iff I (W ) = 0 or 1. I (W− ) + I (W+ ) = 2I (W )

pf: We prove the conservation of information first:

I ( W− ) + I ( W+ ) = I (U1 ; Y1, Y2 ) + I (U2 ; Y1, Y2, U1 ) = I (U1 ; Y1, Y2 ) + I (U2 ; Y1, Y2 |U1 ) = I (U1, U2 ; Y1, Y2 ) = I (X1, X2 ; Y1, Y2 ) = I (X1 ; Y1 ) + I (X2 ; Y2 ) = 2I (W ) . I (W+ ) = I (X2 ; Y1, Y2, U1 ) ≥ I (X2 ; Y2 ) = I (W ), and hence the first property holds.

(Proof of the condition for equality is left as exercise.)

18 / 63 I-Hsiang Wang IT Lecture 6

slide-19
SLIDE 19

Polarization Basic Channel Transformation

Extremal Channels

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5

I(W+) − I(W−) [bits] I(W) [bits] BEC BSC

(Taken from Chap. 12.1 of Moser[4].)

If we plot the "information stretch" I (W+ ) − I (W− )

  • vs. the original I (W ), it turns out among all BMSC:

BEC maximizes the stretch BSC minimizes the stretch Lower boundary: 2Hb (2p(1 − p)) − 2Hb (p) , where p = Hb

−1 (1 − I (W )).

Upper boundary: 2I (W ) (1 − I (W )) .

19 / 63 I-Hsiang Wang IT Lecture 6

slide-20
SLIDE 20

Polarization Channel Polarization

1

Polarization Basic Channel Transformation Channel Polarization

2

Polar Coding Encoding and Decoding Architectures Performance Analysis

20 / 63 I-Hsiang Wang IT Lecture 6

slide-21
SLIDE 21

Polarization Channel Polarization

Recursive Application of Arıkan's Transformation

Duplicate W, apply the transformation, and get W− and W+.

21 / 63 I-Hsiang Wang IT Lecture 6

W W

slide-22
SLIDE 22

Polarization Channel Polarization

Recursive Application of Arıkan's Transformation

Duplicate W, apply the transformation, and get W− and W+. Duplicate W− (and W+).

22 / 63 I-Hsiang Wang IT Lecture 6

W W W W

slide-23
SLIDE 23

Polarization Channel Polarization

Recursive Application of Arıkan's Transformation

Duplicate W, apply the transformation, and get W− and W+. Duplicate W− (and W+). Apply the transformation on W−, and get W−− and W−+.

23 / 63 I-Hsiang Wang IT Lecture 6

W W W W

slide-24
SLIDE 24

Polarization Channel Polarization

Recursive Application of Arıkan's Transformation

Duplicate W, apply the transformation, and get W− and W+. Duplicate W− (and W+). Apply the transformation on W−, and get W−− and W−+. Apply the transformation on W+, and get W+− and W++.

24 / 63 I-Hsiang Wang IT Lecture 6

W W W W

slide-25
SLIDE 25

Polarization Channel Polarization

Recursive Application of Arıkan's Transformation

Duplicate W, apply the transformation, and get W− and W+. Duplicate W− (and W+). Apply the transformation on W−, and get W−− and W−+. Apply the transformation on W+, and get W+− and W++.

. . .

We can keep going and going, until the desired blocklength is reached.

25 / 63 I-Hsiang Wang IT Lecture 6 W W W W W W W W

slide-26
SLIDE 26

Polarization Channel Polarization

Polarized Channels after Recursive Application

After one recursion and getting W− and W+, let us duplicate them.

26 / 63 I-Hsiang Wang IT Lecture 6

W W W W Y1 Y2 Y3 Y4

slide-27
SLIDE 27

Polarization Channel Polarization

Polarized Channels after Recursive Application

Apply the transformation on W−:

W−− : U1 → ((Y1, Y2) , (Y3, Y4)) = Y 4 W−+ : U2 → ((Y1, Y2) , (Y3, Y4) , U1) = ( Y 4, U1 )

27 / 63 I-Hsiang Wang IT Lecture 6

W W W W U1 U2 Y1 Y2 Y3 Y4

slide-28
SLIDE 28

Polarization Channel Polarization

Polarized Channels after Recursive Application

Apply the transformation on W+:

W+− : U3 → ((Y1, Y2, U1 ⊕ U2) , (Y3, Y4, U2)) = ( Y 4, U 2) W++ : U4 → ((Y1, Y2, U1 ⊕ U2) , (Y3, Y4, U2) , U3) = ( Y 4, U 3)

28 / 63 I-Hsiang Wang IT Lecture 6

W W W W U3 U4 Y1 Y2 Y3 Y4 U1 ⊕ U2 U2

slide-29
SLIDE 29

Polarization Channel Polarization

Polarized Channels after Recursive Application

Putting things together, we have:

W−− : U1 → ( Y 4, ∅ ) W−+ : U2 → ( Y 4, U 1) W+− : U3 → ( Y 4, U 2) W++ : U4 → ( Y 4, U 3)

29 / 63 I-Hsiang Wang IT Lecture 6

W W W W U1 U3 U2 U4 Y1 Y2 Y3 Y4

slide-30
SLIDE 30

Polarization Channel Polarization

Recursive Application of Arıkan's Transformation

30 / 63 I-Hsiang Wang IT Lecture 6

With proper naming of the inputs (do it yourself), ℓ-times recursion generates a system with N = 2ℓ channel uses. The N polarized channels are Ws1,...,sℓ, sj ∈ {+, −}, where

Ws1,...,sℓ : Ui → ( Y N, U i−1)

. If we set − ↔ 0 and + ↔ 1, then the index i of channel

Ws1,...,sℓ is i = 1 + ∑ℓ

j=1 sj2ℓ−j, one plus the number

with binary representation of (s1, . . . , sℓ) (MSB → LSB). In the following we use W(i)

N to denote the Ws1,...,sℓ

generated above, for i = 1, . . . , N, N = 2ℓ.

W W W W W W W W

slide-31
SLIDE 31

Polarization Channel Polarization

Channel Polarization

Theorem 2 (Channel Polarization) For any BMSC W, the polarized channels

{ W(i)

N

  • i = 1, . . . , N

}

(N = 2ℓ) satisfy the following:

For all a, b such that 0 < a < b < 1,

lim

N→∞ 1 N

  • {

i : I ( W(i)

N

) ∈ [0, a) }

  • = 1 − I (W ) and

lim

N→∞ 1 N

  • {

i : I ( W(i)

N

) ∈ (b, 1] }

  • = I (W )

lim

N→∞ 1 N

  • {

i : I ( W(i)

N

) ∈ [a, b] }

  • = 0

Interpretation: When N is sufficiently large, roughly NI (W ) of them are noiseless (capacity = 1), and N (1 − I (W )) of them are useless (capacity = 0).

31 / 63 I-Hsiang Wang IT Lecture 6

slide-32
SLIDE 32

Polarization Channel Polarization 1 1

1 N

  • i : I
  • W(i)

N

  • ≤ ε
  • ε

N = 20

32 / 63 I-Hsiang Wang IT Lecture 6

slide-33
SLIDE 33

Polarization Channel Polarization 1 1

1 N

  • i : I
  • W(i)

N

  • ≤ ε
  • ε

N = 21

33 / 63 I-Hsiang Wang IT Lecture 6

slide-34
SLIDE 34

Polarization Channel Polarization 1 1

1 N

  • i : I
  • W(i)

N

  • ≤ ε
  • ε

N = 22

34 / 63 I-Hsiang Wang IT Lecture 6

slide-35
SLIDE 35

Polarization Channel Polarization 1 1

1 N

  • i : I
  • W(i)

N

  • ≤ ε
  • ε

N = 24

35 / 63 I-Hsiang Wang IT Lecture 6

slide-36
SLIDE 36

Polarization Channel Polarization 1 1

1 N

  • i : I
  • W(i)

N

  • ≤ ε
  • ε

N = 28

36 / 63 I-Hsiang Wang IT Lecture 6

slide-37
SLIDE 37

Polarization Channel Polarization 1 1

1 N

  • i : I
  • W(i)

N

  • ≤ ε
  • ε

N = 212

37 / 63 I-Hsiang Wang IT Lecture 6

slide-38
SLIDE 38

Polarization Channel Polarization 1 1

1 N

  • i : I
  • W(i)

N

  • ≤ ε
  • ε

N = 220

38 / 63 I-Hsiang Wang IT Lecture 6

slide-39
SLIDE 39

Polarization Channel Polarization 1 1

1 N

  • i : I
  • W(i)

N

  • ≤ ε
  • ε

N = ∞

39 / 63 I-Hsiang Wang IT Lecture 6

slide-40
SLIDE 40

Polarization Channel Polarization

Proof of Channel Polarization

pf: Define the averaged first and second moment of

{ W(i)

2ℓ

  • i = 1, . . . , 2ℓ}

as follows: (2ℓ = N)

µℓ ≜ 1

2ℓ

∑2ℓ

i=1 I

( W(i)

2ℓ

) , νℓ ≜ 1

2ℓ

∑2ℓ

i=1

( I ( W(i)

2ℓ

))2

Due to the conservation of information of Arıkan's transformaion (Theorem 1), µℓ = I (W ) for all ℓ. As for the averaged second moment, note that

1 2

( (I (W+ ))2 + (I (W− ))2) = ( 1

2 (I (W+ ) + I (W− ))

)2 + ( 1

2 (I (W+ ) − I (W− ))

)2 = I (W )2 + ( 1

2 (I (W+ ) − I (W− ))

)2 = I (W )2 + ∆ (W)2 ,

where ∆ (W) ≜ 1

2 (I (W+ ) − I (W− )).

40 / 63 I-Hsiang Wang IT Lecture 6

slide-41
SLIDE 41

Polarization Channel Polarization

νℓ+1 = 1

2ℓ 2ℓ

i=1

I ( W(i)

2ℓ

)2 + ∆ ( W(i)

2ℓ

)2 ≥ νℓ + κ (a, b)2 θℓ (a, b) ,

(1) where

κ(a, b) ≜ min {∆ (WBSCa) , ∆ (WBSCb) , } θℓ(a, b) ≜ 1

2ℓ

  • {

i : I ( W(i)

2ℓ

) ∈ [a, b] }

  • .

Hence, {νℓ} form a non-decreasing sequence. Meanwhile, since all channels are binary-input,

I ( W(i)

2ℓ

) ≤ 1, and therefore νℓ ≤ 1.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5

I(W+) − I(W−) [bits] I(W) [bits] BEC BSC

  • (Modified from Chap. 12.1 of Moser[4].)

41 / 63 I-Hsiang Wang IT Lecture 6

slide-42
SLIDE 42

Polarization Channel Polarization

Hence, ν0 ≤ ν1 ≤ . . . ≤ νℓ ≤ . . . ≤ 1 =

⇒ limℓ→∞ νℓ exists.

By (1), we have

θℓ(a, b) ≤ νℓ+1 − νℓ κ(a, b)2 = ⇒ lim

ℓ→∞ θℓ(a, b) = 0.

(since limℓ→∞ νℓ exists.) Finally, define αℓ(a) ≜ 1

2ℓ

  • {

i : I ( W(i)

2ℓ

) ∈ [0, a) }

  • and βℓ(b) ≜ 1

2ℓ

  • {

i : I ( W(i)

2ℓ

) ∈ (b, 1] }

  • .

Observe that

I (W ) = µℓ ≤ a · αℓ(a) + b · θℓ(a, b) + 1 · βℓ(b) = a + (b − a)θℓ(a, b) + (1 − a)βℓ(b) 1 − I (W ) = 1 − µℓ ≤ 1 − 0 · αℓ(a) − a · θℓ(a, b) − b · βℓ(b) = (1 − b) + (b − a)θℓ(a, b) + bαℓ(a)

It is then not hard to show that lim infℓ→∞ βℓ(b) ≥ I (W ) and lim infℓ→∞ αℓ(b) ≥ 1 − I (W ). Proof is complete by sandwich principle.

42 / 63 I-Hsiang Wang IT Lecture 6

slide-43
SLIDE 43

Polarization Channel Polarization

From Channel Polarization to Polar Coding

Recall the original goal:

. . . Pre- Processing W W W X1 X2 XN Y1 Y2 YN UN U2 U1 Post- Processing V1 V2 VN

Caveat: What we have done, however, it the following: we created N = 2ℓ polarized channels

W(i)

N : Ui → Vi =

( Y N, U i−1) .

However, we cannot obtain the true

U i−1 from the channel output Y N.

This issue can be fixed by successive decoding, where Vi ≜ (Y N, ˆ

U i−1) instead of (Y N, U i−1).

Encoding is based on those "synthetic" polarized channels {W(i)

N }, and the i-th synthetic channel is

a good approximation as long as ˆ

U i−1 = U i−1 with high probability.

43 / 63 I-Hsiang Wang IT Lecture 6

slide-44
SLIDE 44

Polar Coding

1

Polarization Basic Channel Transformation Channel Polarization

2

Polar Coding Encoding and Decoding Architectures Performance Analysis

44 / 63 I-Hsiang Wang IT Lecture 6

slide-45
SLIDE 45

Polar Coding Encoding and Decoding Architectures

1

Polarization Basic Channel Transformation Channel Polarization

2

Polar Coding Encoding and Decoding Architectures Performance Analysis

45 / 63 I-Hsiang Wang IT Lecture 6

slide-46
SLIDE 46

Polar Coding Encoding and Decoding Architectures

Overview of Polar Coding Architecture

1 Preparation

Generate the N = 2ℓ synthetic polarized channels {W(i)

N | i = 1, 2, . . . , N}.

2 Encoding

To encode K information bits into an N-bit codeword, the encoder picks a subset A ⊆ [1 : N]

  • f synthetic polarized channels from the N channels above, based on the qualities of them:

For each i ∈ A, use Ui to sent an information bit. For each i ∈ F ≜ Ac, fix Ui to a dummy bit u∗

i (frozen bits). 3 Decoding is based on successive cancellation, where

the decoded ˆ

Ui is determined by (Y N, ˆ U i−1) if i ∈ A.

the decoded ˆ

Ui = u∗

i , the pre-fixed dummy frozen bit, if i ∈ F.

46 / 63 I-Hsiang Wang IT Lecture 6

slide-47
SLIDE 47

Polar Coding Encoding and Decoding Architectures

Synthetic Polarized Channels

  • For the i-th synthesized channel, its input is Ui, output is

( Y N, U i−1)

, and the channel law is

W(i)

N

( yN, ui−1 ui ) =

1 2N−1 1

ui+1=0

...

1

uN=0

P ( yN uN)

, where P

( yN uN) = ∏N

i=1 W (yi|xi). The relationship between xN and uN is

described in the next slide. Recursive Relation of Channel Laws W(2k−1)

2N

( y2N, u2k−2 u2k−1 ) = ∑

u2k=0,1 1 2W(k) N

( y1:N, u2k−2

  • dd

⊕ u2k−2

even

  • u2k−1 ⊕ u2k

) W(k)

N

( yN+1:2N, u2k−2

even

  • u2k

) W(2k)

2N

( y2N, u2k−1 u2k ) = 1

2W(k) N

( y1:N, u2k−2

  • dd

⊕ u2k−2

even

  • u2k−1 ⊕ u2k

) W(k)

N

( yN+1:2N, u2k−2

even

  • u2k

)

47 / 63 I-Hsiang Wang IT Lecture 6

slide-48
SLIDE 48

Polar Coding Encoding and Decoding Architectures

Relation between U N and XN As mentioned before, with a proper "bit-reversal permutation" of indices of Ui's, one can obtain ˜

Ui's,

where the relationship between XN and ˜

U N can be characterized by the ℓ-times Kronecker product: XN = ˜ U N · GN,

where GN = G⊗ℓ

2

≜ G2 ⊗ . . . ⊗

ℓ times

G2, N = 2ℓ, and G2 = [1 1 1 ]

. Easy to check: (GN)−1 = GN. The "bit-reversal permutation" σbr is described as follows: for i = 1 + ∑ℓ

j=1 sj2ℓ−j,

σbr(i) = 1 + ∑ℓ

j=1 sj2j−1.

In other words, the binary representation of σbr(i) − 1 is the reverse of that of i − 1, and vice versa. We shall use RN to denote the matrix representation of σbr. Easy to check: (RN)−1 = RN. Hence,

XN = U N · RNGN

and

UN = XN · GNRN.

48 / 63 I-Hsiang Wang IT Lecture 6

slide-49
SLIDE 49

Polar Coding Encoding and Decoding Architectures

Encoding

Two things to be specified for polar encoding:

1 Determine the active set A and the frozen set F. 2 Determine what to send on the indices of the frozen set.

Selection of the Frozen Set Let K denote the number of information bits to be delivered. Then, in principle, one should choose A and F such that |A| = K and ∀ i ∈ A, j ∈ F, channel W(i)

N has "better quality" than channel W(j) N .

*How to evaluate "quality" of the synthetic polarized channels {W(i)

N }? Discussed later.

Setting Values of the Frozen Bits The values of the frozen bits are known to both encoder and decoder – part of the codebook design.

49 / 63 I-Hsiang Wang IT Lecture 6

slide-50
SLIDE 50

Polar Coding Encoding and Decoding Architectures

Encoding Architecture

x = u RNGN

where

u ≜ [ u1 u2 . . . uN ]

denotes the uncoded bits (union of information and frozen bits).

x ≜ [ x1 x2 . . . xN ]

denotes the coded bits (codeword).

GN ≜ G2 ⊗ . . . ⊗

ℓ times

G2 is the encoding matrix

(N = 2ℓ and G2 =

[1 1 1 ]

).

RN is the bit-reversal permutation matrix. X1 X2 XN UN U2 U1

  • Rate R = |A|

N .

50 / 63 I-Hsiang Wang IT Lecture 6

slide-51
SLIDE 51

Polar Coding Encoding and Decoding Architectures

Encoding Architecture

x = u RNGN

where

u ≜ [ u1 u2 . . . uN ]

denotes the uncoded bits (union of information and frozen bits).

x ≜ [ x1 x2 . . . xN ]

denotes the coded bits (codeword).

GN ≜ G2 ⊗ . . . ⊗

ℓ times

G2 is the encoding matrix

(N = 2ℓ and G2 =

[1 1 1 ]

).

RN is the bit-reversal permutation matrix. X1 X2 XN

  • Info

bits

Rate R = |A|

N .

51 / 63 I-Hsiang Wang IT Lecture 6

slide-52
SLIDE 52

Polar Coding Encoding and Decoding Architectures

Decoding

Successive Cancellation Decoding (SC Decoding) Upon receiving yN, the decoder start to decode ui from i = 1 to i = N in a sequential manner, following the rule below:

ˆ ui = u∗

i if i ∈ F.

ˆ ui = arg max

u∈{0,1}

W(i)

N

( yN, ˆ ui−1 u )

if i ∈ A. In words, the decoder performs bit-wise sequential decoding. Note: For i ∈ A, the decoding rule for that bit is not maximum likelihood decoding because it does not make use of all frozen

  • bits. In particular, {u∗

j : j ∈ F, j > i} are not harnessed when

decoding Ui.

  • DEC

DEC DEC

  • 52 / 63

I-Hsiang Wang IT Lecture 6

slide-53
SLIDE 53

Polar Coding Performance Analysis

1

Polarization Basic Channel Transformation Channel Polarization

2

Polar Coding Encoding and Decoding Architectures Performance Analysis

53 / 63 I-Hsiang Wang IT Lecture 6

slide-54
SLIDE 54

Polar Coding Performance Analysis

Probability of Error

Under SC decoding, probability of error of the proposed polar coding scheme depends on (1) channel W, (2) blocklength N, (3) code rate K

N , (4) frozen set F ⊂ [1 : N], (5) frozen bits uF.

Notation: here we use uF and uA to denote the frozen bits and information bits respectively. First, define the average (over all codewords) probability of error with given frozen bits uF:

P(N)

e

( K

N , F, uF

) ≜ P { U N ̸= ˆ UN} = ∑

uA∈{0,1}K 2−K · P

{ ∃ i ∈ A s.t. ˆ Ui ̸= Ui

  • U A = uA

}

. Next, we further average over uniformly randomly chosen frozen bits uF and define

P(N)

e

( K

N , F

) ≜ ∑

uF∈{0,1}N−K 2−(N−K) · P(N) e

(K

N , F, uF

)

.

54 / 63 I-Hsiang Wang IT Lecture 6

slide-55
SLIDE 55

Polar Coding Performance Analysis

Upper Bounding the Probability of Error of Polar Coding (1)

Observe that

P(N)

e

(K

N , F

) = P {∪

i∈A{ ˆ

Ui ̸= Ui, ˆ U i−1 = U i−1} } ≤ ∑

i∈A P

{ ˆ Ui(Y N, ˆ U i−1) ̸= Ui, ˆ Ui−1 = U i−1} ,

(2) where Ui

i.i.d.

∼ Ber ( 1

2

)

, for all i = 1, 2, . . . , N. The inequality is due to union bound. Recall that decoding function ˆ

Ui(Y N, ˆ U i−1) = arg max

u∈{0,1}

W(i)

N

( Y N, ˆ U i−1

  • u

)

. Hence,

P { ˆ Ui(Y N, ˆ Ui−1) ̸= Ui, ˆ U i−1 = Ui−1} = P { ˆ Ui(Y N, U i−1) ̸= Ui, ˆ U i−1 = U i−1} ≤ P { ˆ Ui(Y N, U i−1) ̸= Ui } .

(3)

55 / 63 I-Hsiang Wang IT Lecture 6

slide-56
SLIDE 56

Polar Coding Performance Analysis

Now things boil down to upper bounding P

{ ˆ Ui(Y N, U i−1) ̸= Ui }

, where

ˆ Ui(yN, ui−1) = arg max

u∈{0,1}

W(i)

N

( yN, ui−1 u ) .

(4) Key Observation: P

{ ˆ Ui(Y N, U i−1) ̸= Ui }

is the optimal error probability of error of a binary detection problem, since the bitwise decoder (4) above is the corresponding MAP/ML detection rule!

Ui W(i)

N

  • Y N, U i−1

MAP

( ML)

  • ∼ Ber

1

2

  • Next, we introduce Z (W) as an error probability upper bound for a binary detection problem with

input X ∼ Ber

( 1

2

)

and observation Y , following the probability transition law W(y|x). Naturally it measures the reliability of a channel W.

56 / 63 I-Hsiang Wang IT Lecture 6

slide-57
SLIDE 57

Polar Coding Performance Analysis

Bit-wise Decoding Error Probability

Lemma 1 For a binary detection problem with input X ∼ Ber

( 1

2

)

and observation Y following the probability transition law W(y|x), the optimal probability of error (note: ML is optimal)

P { ˆ XML (Y ) ̸= X } ≤ Z (W) , where Z (W) ≜ ∑

y∈Y

√ W(y|0) · W(y|1).

(5) pf: Recall that the ML detection rule: ˆ

XML (y) = x if W(y|x) ≥ W(y|x ⊕ 1). Hence, P { ˆ XML (Y ) ̸= X } = EX,Y [1 {W(Y |X) < W(Y |X ⊕ 1)}] ≤ EX,Y [√

W(Y |X⊕1) W(Y |X)

]

. It is not hard to verify that Z (W) = EX,Y

[√

W(Y |X⊕1) W(Y |X)

]

(left as exercise).

57 / 63 I-Hsiang Wang IT Lecture 6

slide-58
SLIDE 58

Polar Coding Performance Analysis

Properties of the Reliability Function Z (·)

Proofs of the following properties are neglected here.

1 Range of Z: 0 ≤ Z (W) ≤ 1.

(By Cauchy-Schwarz)

2 Polarization: under Arıkan's transformation,

Z (W+) = (Z (W))2, Z (W−) ≤ 2Z (W) − (Z (W))2 Z (W+) + Z (W−) ≤ 2Z (W).

(Reliability is improved after the polarization)

Z (W+) ≤ Z (W) ≤ Z (W−).

3 Relation with I (W ): 1 − Z(W) ≤ I (W ) ≤ 1 − (Z(W))2.

I (W ) ≈ 1 ⇐ ⇒ Z(W) ≈ 0 I (W ) ≈ 0 ⇐ ⇒ Z(W) ≈ 1 Hence, one can expect that channel polarization (Theorem 2) still holds if we change the measure of "goodness" from capacity to reliability function.

58 / 63 I-Hsiang Wang IT Lecture 6

slide-59
SLIDE 59

Polar Coding Performance Analysis

Upper Bounding the Probability of Error of Polar Coding (2)

Combining (2), (3), and Lemma 1, we arrive at a nice upper bound on P(N)

e

( K

N , F

)

:

P(N)

e

(K

N , F

) ≤ ∑

i∈A Z

( W(i)

N

) .

(6)

Implications of Upper Bound (6):

1 How to choose the frozen set F? If we would like to minimize (6), we should choose A and F

such that Z

( W(i)

N

) ≤ Z ( W(j)

N

)

for all i ∈ A and j ∈ F. In other words, we use Z (·) to evaluate the quality of the synthetic polarized channels.

2 Suppose we can compute the asymptotic limit of the proportion of synthetic polarized channels

whose Z (·) is smaller than some δN = o

( N −1)

. If R is less than this limit, for sufficiently large

N, we can further upper bound (6) by NR · δN which will vanish as N → ∞.

59 / 63 I-Hsiang Wang IT Lecture 6

slide-60
SLIDE 60

Polar Coding Performance Analysis

Speed of Channel Polarization

In other words, we would like to have some theorem which gives us the following result:

lim

N→∞ 1 N

  • {

i : Z ( W(i)

N

) < δN }

  • = I (W ).

This is a stronger version of channel polarization than Theorem 2. To see this, note that we can easily replace I

( W(i)

N

)

by 1 − Z

( W(i)

N

)

in Theorem 2 and the results remain to hold for constants a and b, where a, b = Θ(1), invariant to N. However, the desired theorem requires replacing a and b by δN and 1 − δN respectively, where

δN = o ( N−1)

. The proof of Theorem 2 presented before cannot be extended to this case.

60 / 63 I-Hsiang Wang IT Lecture 6

slide-61
SLIDE 61

Polar Coding Performance Analysis

Nevertheless, Arıkan and Telatar proved an even stronger result, where δN = 2−Nβ, β ∈

( 0, 1

2

)

. Below we present this result without proving it. Theorem 3 (Rate of Channel Polarization [Arıkan-Telatar ISIT09]) Direct Part: For β ∈

( 0, 1

2

)

,

lim

N→∞

1 N

  • {

i : Z ( W(i)

N

) < 2−Nβ}

  • = I (W )

(7)

lim

N→∞

1 N

  • {

i : Z ( W(i)

N

) > 1 − 2−Nβ}

  • = 1 − I (W )

(8) Converse Part: For β > 1

2, if I (W ) < 1,

lim

N→∞

1 N

  • {

i : Z ( W(i)

N

) < 2−Nβ}

  • = 0.

61 / 63 I-Hsiang Wang IT Lecture 6

slide-62
SLIDE 62

Polar Coding Performance Analysis

Coding Theorem for Polar Coding

Theorem 4 (Polar Coding Achieves Capacity of BMSC) Suppose the frozen set F is chosen such that Z

( W(i)

N

) ≤ Z ( W(j)

N

)

for all i ∈ A and j ∈ F. Then,

lim

N→∞ P(N) e

(R, F) · 2Nβ = 0

(9) for any rate R < I (W ) and β ∈

( 0, 1

2

)

. In other words, P(N)

e

(R, F) = o ( 2−Nβ)

. Note: (9) guarantees that the probability of error vanishes as N → ∞ for some choice of frozen bits

uF, as long as R < I (W ), the channel capacity. Hence, it shows that polar code can achieve the

capacity of the channel W. Remark: In fact, for symmetric channels, it can be shown that (9) remains true even if we replace

P(N)

e

(R, F) by P(N)

e

(R, F, uF) for any uF ∈ {0, 1}N(1−R). This will be explored in HW4.

62 / 63 I-Hsiang Wang IT Lecture 6

slide-63
SLIDE 63

Polar Coding Performance Analysis

pf: Fix some β′ ∈

( β, 1

2

)

. Since R < I (W ), by Lemma 3, for N sufficiently large,

  • {

i : Z ( W(i)

N

) < 2−Nβ′}

  • > NR.

Since we pick A and F such that |A| = NR and all synthetic polarized channels with indices in A have smaller Z (·) than those in F, we have

Z ( W(i)

N

) < 2−Nβ′ , ∀ i ∈ A.

Hence, by the upper bound (6), we conclude that

P(N)

e

(R, F) < NR · 2−Nβ′ = ⇒ P(N)

e

(R, F) · 2Nβ < NR · 2−(Nβ′−Nβ).

Proof is complete by observing lim

N→∞ NR · 2−(Nβ′−Nβ) = 0.

63 / 63 I-Hsiang Wang IT Lecture 6