Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits - - PowerPoint PPT Presentation

learned scheduling of ldpc decoders based on multi armed
SMART_READER_LITE
LIVE PREVIEW

Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits - - PowerPoint PPT Presentation

Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits Salman Habib, Allison Beemer, and J org Kliewer The Center for Wireless Information Processing, New Jersey Institute of Technology June 2020 IEEE International Symposium on


slide-1
SLIDE 1

Learned Scheduling of LDPC Decoders Based

  • n Multi-armed Bandits

Salman Habib, Allison Beemer, and J¨

  • rg Kliewer

The Center for Wireless Information Processing, New Jersey Institute of Technology

June 2020 IEEE International Symposium on Information Theory

slide-2
SLIDE 2

Background: Reinforcement Learning (RL)

  • RL is a framework for learning sequential decision making tasks

[Sutton, 84], [Sutton, Barto, 18]

1 / 16

slide-3
SLIDE 3

Background: Reinforcement Learning (RL)

  • RL is a framework for learning sequential decision making tasks

[Sutton, 84], [Sutton, Barto, 18]

  • Typical applications include robotics, resource management in

computer clusters, video games, etc.

1 / 16

slide-4
SLIDE 4

Background: The MAB Problem

[Jeremy Zhang: Reinforcement Learning — Multi-Arm Bandit Implementation]

  • The MAB problem refers to a special RL task

2 / 16

slide-5
SLIDE 5

Background: The MAB Problem

[Jeremy Zhang: Reinforcement Learning — Multi-Arm Bandit Implementation]

  • The MAB problem refers to a special RL task
  • A gambler (learner) has to decide which arm of a multi-armed slot

machine to pull next with the goal of achieving the highest total reward in a sequence of pulls [Gittins, 79]

2 / 16

slide-6
SLIDE 6

Background: LDPC Decoding

CNs VNs

  • The traditional flooding scheme first updates all the check nodes

(CNs) and then all the variable nodes (VNs) in the same iteration

3 / 16

slide-7
SLIDE 7

Background: LDPC Decoding

CNs VNs

  • The traditional flooding scheme first updates all the check nodes

(CNs) and then all the variable nodes (VNs) in the same iteration

  • In comparison, sequential decoding schemes update a single

node per iteration and converges faster than flooding [Kfir, Kanter, 03]

3 / 16

slide-8
SLIDE 8

Background: Sequential Scheduling

CNs VNs iteration 1 iteration 2

  • Sequential LDPC decoding: only one CN (and its neighboring

VNs) is scheduled per iteration

4 / 16

slide-9
SLIDE 9

Background: Sequential Scheduling

CNs VNs iteration 1 iteration 2

  • Sequential LDPC decoding: only one CN (and its neighboring

VNs) is scheduled per iteration

  • Node-wise scheduling (NS) uses CN residual

rma→v |m′

a→v − ma→v| as scheduling criterion [Casado et al., 10]

4 / 16

slide-10
SLIDE 10

Background: Sequential Scheduling

CNs VNs iteration 1 iteration 2

  • Sequential LDPC decoding: only one CN (and its neighboring

VNs) is scheduled per iteration

  • Node-wise scheduling (NS) uses CN residual

rma→v |m′

a→v − ma→v| as scheduling criterion [Casado et al., 10]

  • The higher the residual, the less reliable the message and hence

propagating it first leads to faster decoder convergence

4 / 16

slide-11
SLIDE 11

Background: Sequential Scheduling

CNs VNs iteration 1 iteration 2

  • Sequential LDPC decoding: only one CN (and its neighboring

VNs) is scheduled per iteration

  • Node-wise scheduling (NS) uses CN residual

rma→v |m′

a→v − ma→v| as scheduling criterion [Casado et al., 10]

  • The higher the residual, the less reliable the message and hence

propagating it first leads to faster decoder convergence

  • Disadvantage: residual calculation makes NS more complex

than flooding scheme for the same total messages propagated

4 / 16

slide-12
SLIDE 12

Motivation

  • [Nachmani, Marciano, Lugosch, Gross, Burshtein, Be’ery 17]

Deep learning for improved decoding of linear codes

5 / 16

slide-13
SLIDE 13

Motivation

  • [Nachmani, Marciano, Lugosch, Gross, Burshtein, Be’ery 17]

Deep learning for improved decoding of linear codes

  • [Carpi, Hager, Martalo, Raheli, and Pfister, 19]

Deep-RL for channel coding based on hard-decision decoding

5 / 16

slide-14
SLIDE 14

Motivation

  • [Nachmani, Marciano, Lugosch, Gross, Burshtein, Be’ery 17]

Deep learning for improved decoding of linear codes

  • [Carpi, Hager, Martalo, Raheli, and Pfister, 19]

Deep-RL for channel coding based on hard-decision decoding In this work: MAB-based sequential CN scheduling (MAB-NS) scheme for soft-decoding of LDPC codes

  • Obviates real-time calculation of CN residuals
  • Utilizes a novel clustering scheme to significantly reduce the

learning complexity induced by soft-decoding

5 / 16

slide-15
SLIDE 15

The Proposed MAB Framework

CNs VNs

  • NS scheme is modeled as a finite Markov decision process (MDP)

6 / 16

slide-16
SLIDE 16

The Proposed MAB Framework

CNs VNs

  • NS scheme is modeled as a finite Markov decision process (MDP)
  • Action Aℓ denotes the index of a scheduled CN out of m CNs

(arms) in iteration ℓ

6 / 16

slide-17
SLIDE 17

The Proposed MAB Framework

CNs VNs

  • NS scheme is modeled as a finite Markov decision process (MDP)
  • Action Aℓ denotes the index of a scheduled CN out of m CNs

(arms) in iteration ℓ

  • A quantized syndrome vector Sℓ = [S(0)

, . . . , S(m−1)

] represents the state of the MDP in iteration ℓ

6 / 16

slide-18
SLIDE 18

The Proposed MAB Framework

CNs VNs

  • NS scheme is modeled as a finite Markov decision process (MDP)
  • Action Aℓ denotes the index of a scheduled CN out of m CNs

(arms) in iteration ℓ

  • A quantized syndrome vector Sℓ = [S(0)

, . . . , S(m−1)

] represents the state of the MDP in iteration ℓ

  • Decision-making process leads to a future state s′ ∈ S(M) and

reward Ra = maxv∈N(a) rma→v that relies on s and a

6 / 16

slide-19
SLIDE 19

Solving the MAB Problem

  • Compute an action-value called Gittins indices (GIs), where all

CNs are assumed to be independent [Gittins, 79]

7 / 16

slide-20
SLIDE 20

Solving the MAB Problem

  • Compute an action-value called Gittins indices (GIs), where all

CNs are assumed to be independent [Gittins, 79]

  • Utilize Q-learning, a Monte Carlo approach for estimating the

action-value of a CN [Watkins, 89]

7 / 16

slide-21
SLIDE 21

Solving the MAB Problem

  • Compute an action-value called Gittins indices (GIs), where all

CNs are assumed to be independent [Gittins, 79]

  • Utilize Q-learning, a Monte Carlo approach for estimating the

action-value of a CN [Watkins, 89]

  • Learning complexity in this method grows exponentially with

the number of CNs

  • Solution: group CNs as clusters with separate state and

action spaces

7 / 16

slide-22
SLIDE 22

The MAB-NS Algorithm

Input : L, H Output: reconstructed codeword

1 Initialization: 2

ℓ ← 0

3

mc→v ← 0

// for all CN to VN messages

4

mvi→c ← Li

// for all VN to CN messages

5

ˆ Lℓ ← L

6

ˆ Sℓ ← Hˆ Lℓ

7 foreach a ∈ [[m]] do 8

s(a)

← gM(ˆ s(a)

ℓ )

// M-level quantization

9 end

// decoding starts

10 if stopping condition not satisfied or ℓ < ℓmax then 11

s ← index of Sℓ

12

update CN a according to an optimum scheduling policy

13

foreach vk ∈ N(a) do

14

compute and propagate ma→vk

15

foreach cj ∈ N(vk) \ a do

16

compute and propagate mvk→cj

17

end

18

ˆ L(k)

c∈N (vk) mc→vk + Lk

// update LLR of vk

19

end

20

foreach CN j that is a neighbor of vk ∈ N(a) do

21

ˆ s(j)

vi∈N (j) ˆ

L(i)

ℓ 22

s(j)

← gM(ˆ s(j)

ℓ )

// update syndrome Sℓ

23

end

24

ℓ ← ℓ + 1

// update iteration

25 end

8 / 16

  • L is a vector of

log-likelihood ratios

  • H is the parity-check

matrix of an LDPC code

slide-23
SLIDE 23

The MAB-NS Algorithm

Input : L, H Output: reconstructed codeword

1 Initialization: 2

ℓ ← 0

3

mc→v ← 0

// for all CN to VN messages

4

mvi→c ← Li

// for all VN to CN messages

5

ˆ Lℓ ← L

6

ˆ Sℓ ← Hˆ Lℓ

7 foreach a ∈ [[m]] do 8

s(a)

← gM(ˆ s(a)

ℓ )

// M-level quantization

9 end

// decoding starts

10 if stopping condition not satisfied or ℓ < ℓmax then 11

s ← index of Sℓ

12

update CN a according to an optimum scheduling policy

13

foreach vk ∈ N(a) do

14

compute and propagate ma→vk

15

foreach cj ∈ N(vk) \ a do

16

compute and propagate mvk→cj

17

end

18

ˆ L(k)

c∈N (vk) mc→vk + Lk

// update LLR of vk

19

end

20

foreach CN j that is a neighbor of vk ∈ N(a) do

21

ˆ s(j)

vi∈N (j) ˆ

L(i)

ℓ 22

s(j)

← gM(ˆ s(j)

ℓ )

// update syndrome Sℓ

23

end

24

ℓ ← ℓ + 1

// update iteration

25 end

8 / 16

  • L is a vector of

log-likelihood ratios

  • H is the parity-check

matrix of an LDPC code

  • Steps 10-25 represent NS

(no residuals computed)

slide-24
SLIDE 24

The MAB-NS Algorithm

Input : L, H Output: reconstructed codeword

1 Initialization: 2

ℓ ← 0

3

mc→v ← 0

// for all CN to VN messages

4

mvi→c ← Li

// for all VN to CN messages

5

ˆ Lℓ ← L

6

ˆ Sℓ ← Hˆ Lℓ

7 foreach a ∈ [[m]] do 8

s(a)

← gM(ˆ s(a)

ℓ )

// M-level quantization

9 end

// decoding starts

10 if stopping condition not satisfied or ℓ < ℓmax then 11

s ← index of Sℓ

12

update CN a according to an optimum scheduling policy

13

foreach vk ∈ N(a) do

14

compute and propagate ma→vk

15

foreach cj ∈ N(vk) \ a do

16

compute and propagate mvk→cj

17

end

18

ˆ L(k)

c∈N (vk) mc→vk + Lk

// update LLR of vk

19

end

20

foreach CN j that is a neighbor of vk ∈ N(a) do

21

ˆ s(j)

vi∈N (j) ˆ

L(i)

ℓ 22

s(j)

← gM(ˆ s(j)

ℓ )

// update syndrome Sℓ

23

end

24

ℓ ← ℓ + 1

// update iteration

25 end

8 / 16

  • L is a vector of

log-likelihood ratios

  • H is the parity-check

matrix of an LDPC code

  • Steps 10-25 represent NS

(no residuals computed)

  • An optimized CN

scheduling policy invoked in Step 12

  • Learned by solving the

MAB problem

slide-25
SLIDE 25

MAB-NS and Sequential Decoding Performance

Remark: Sequential scheduling techniques such as the pro- posed MAB-NS scheme are more likely to correct errors asso- ciated with (3, 3) ABSs than a flooding-based scheme.

  • A correct (blue) belief propagated by a scheduled (black) CN and

a wrong (red) belief propagated in addition by a flooding scheme

  • The MAB-NS scheme employs a more global decoding approach

than NS and is more likely to overcome undetected errors

9 / 16

slide-26
SLIDE 26

Learning by Estimating GIs

  • The action-value to be determined is given as

G(ˆ s, a) = max

pτ∈P

Eτ,ˆ

s′

τ−1

  • t=0

βtRt(ˆ St, At, ˆ s′)|ˆ S0 = ˆ s, At = a

τ−1

  • t=0

βt|ˆ S0 = ˆ s, At = a

  • 10 / 16
slide-27
SLIDE 27

Learning by Estimating GIs

  • The action-value to be determined is given as

G(ˆ s, a) = max

pτ∈P

Eτ,ˆ

s′

τ−1

  • t=0

βtRt(ˆ St, At, ˆ s′)|ˆ S0 = ˆ s, At = a

τ−1

  • t=0

βt|ˆ S0 = ˆ s, At = a

  • A r.v. τ ∈ {1, 2, ...} is the number of times arm a is played, pτ is

the distribution of τ, P represents the collection of all distributions determined by the allowed stopping time policies

10 / 16

slide-28
SLIDE 28

Learning by Estimating GIs

  • The action-value to be determined is given as

G(ˆ s, a) = max

pτ∈P

Eτ,ˆ

s′

τ−1

  • t=0

βtRt(ˆ St, At, ˆ s′)|ˆ S0 = ˆ s, At = a

τ−1

  • t=0

βt|ˆ S0 = ˆ s, At = a

  • A r.v. τ ∈ {1, 2, ...} is the number of times arm a is played, pτ is

the distribution of τ, P represents the collection of all distributions determined by the allowed stopping time policies

  • Rt(ˆ

St, At, ˆ s′) is a reward obtained at time t after scheduling CN a

10 / 16

slide-29
SLIDE 29

Learning by Estimating GIs

  • The action-value to be determined is given as

G(ˆ s, a) = max

pτ∈P

Eτ,ˆ

s′

τ−1

  • t=0

βtRt(ˆ St, At, ˆ s′)|ˆ S0 = ˆ s, At = a

τ−1

  • t=0

βt|ˆ S0 = ˆ s, At = a

  • A r.v. τ ∈ {1, 2, ...} is the number of times arm a is played, pτ is

the distribution of τ, P represents the collection of all distributions determined by the allowed stopping time policies

  • Rt(ˆ

St, At, ˆ s′) is a reward obtained at time t after scheduling CN a

  • After computing an average GI ˜

G(ˆ s, a) ∀ˆ s and a, the optimized CN scheduling policy is ˆ πG = argmaxa ˜ G(ˆ s, a)

10 / 16

slide-30
SLIDE 30

Clustered Q-Learning

CNs VNs

...

...

clusters

  • Action-value for learning rate α is given by

Qℓ+1(su, au) = (1 − α)Qℓ(su, au)+ α

  • Rℓ(su, au, f(su, au)) + β max

u′,au′ Qℓ(f(su′, au′), au′)

  • ,

11 / 16

slide-31
SLIDE 31

Clustered Q-Learning

CNs VNs

...

...

clusters

  • Action-value for learning rate α is given by

Qℓ+1(su, au) = (1 − α)Qℓ(su, au)+ α

  • Rℓ(su, au, f(su, au)) + β max

u′,au′ Qℓ(f(su′, au′), au′)

  • ,
  • New state s′

u = f(au, su) determined by scheduling of CN au in

cluster u with state su

11 / 16

slide-32
SLIDE 32

Clustered Q-Learning

CNs VNs

...

...

clusters

  • The action in optimization step ℓ is selected via an ǫ-greedy

approach according to au =

  • uniformly random over u and Au w.p. ǫ,

π(ℓ)

Q w.p. 1 − ǫ,

11 / 16

slide-33
SLIDE 33

Clustered Q-Learning

CNs VNs

...

...

clusters

  • The action in optimization step ℓ is selected via an ǫ-greedy

approach according to au =

  • uniformly random over u and Au w.p. ǫ,

π(ℓ)

Q w.p. 1 − ǫ,

  • π(ℓ)

Q = argmaxau s.t. u∈{0,...,⌈ m

z ⌉−1} Qℓ(su, au) 11 / 16

slide-34
SLIDE 34

Clustered Q-Learning

CNs VNs

...

...

clusters

  • The action in optimization step ℓ is selected via an ǫ-greedy

approach according to au =

  • uniformly random over u and Au w.p. ǫ,

π(ℓ)

Q w.p. 1 − ǫ,

  • π(ℓ)

Q = argmaxau s.t. u∈{0,...,⌈ m

z ⌉−1} Qℓ(su, au)

  • Prediction: π(ℓmax)

Q

yields an optimized CN scheduling policy with cluster size z

11 / 16

slide-35
SLIDE 35

Clustered Q-learning Algorithm

Input : L , H Output: Estimated Qℓmax(su, au) for all u

1 Initialization: Q0(su, au) ← 0 for all su, au and u 2 for each L ∈ L do 3

ℓ ← 0

4

ˆ Lℓ ← L

5

ˆ Sℓ ← Hˆ Lℓ

6

foreach a ∈ [[m]] do

7

s(a)

← gM(ˆ s(a)

ℓ )

// M-level quantization

8

end

9

while ℓ < ℓmax do

10

schedule CN au according to ǫ-greedy approach

11

S(u,z)

← s(uz)

, . . . , s(uz+z−1)

ℓ 12

su ← index of S(u,z)

ℓ 13

foreach vi ∈ N(au) do

14

compute and propagate mau→vi

15

foreach cj ∈ N(vi) \ au do

16

compute and propagate mvi→cj

17

end

18

ˆ L(i)

ℓ ← c∈N (vi) mc→vi + Li

// update LLR

19

end

20

foreach CN j that is a neighbor of vk ∈ N(au) do

21

ˆ s(j)

vi∈N (j) ˆ

L(i)

ℓ 22

s(j)

← gM(ˆ s(j)

ℓ )

// update syndrome Sℓ

23

end

24

s′

u ← index of updated S(u,z) ℓ 25

Rℓ(su, au, s′

u) ← highest residual of CN au 26

compute Qℓ+1(su, au)

27

ℓ ← ℓ + 1

// update iteration

28

end

29 end

12 / 16

  • L is a set of L vectors
  • H is the parity-check

matrix of an LDPC code for RL

slide-36
SLIDE 36

Clustered Q-learning Algorithm

Input : L , H Output: Estimated Qℓmax(su, au) for all u

1 Initialization: Q0(su, au) ← 0 for all su, au and u 2 for each L ∈ L do 3

ℓ ← 0

4

ˆ Lℓ ← L

5

ˆ Sℓ ← Hˆ Lℓ

6

foreach a ∈ [[m]] do

7

s(a)

← gM(ˆ s(a)

ℓ )

// M-level quantization

8

end

9

while ℓ < ℓmax do

10

schedule CN au according to ǫ-greedy approach

11

S(u,z)

← s(uz)

, . . . , s(uz+z−1)

ℓ 12

su ← index of S(u,z)

ℓ 13

foreach vi ∈ N(au) do

14

compute and propagate mau→vi

15

foreach cj ∈ N(vi) \ au do

16

compute and propagate mvi→cj

17

end

18

ˆ L(i)

ℓ ← c∈N (vi) mc→vi + Li

// update LLR

19

end

20

foreach CN j that is a neighbor of vk ∈ N(au) do

21

ˆ s(j)

vi∈N (j) ˆ

L(i)

ℓ 22

s(j)

← gM(ˆ s(j)

ℓ )

// update syndrome Sℓ

23

end

24

s′

u ← index of updated S(u,z) ℓ 25

Rℓ(su, au, s′

u) ← highest residual of CN au 26

compute Qℓ+1(su, au)

27

ℓ ← ℓ + 1

// update iteration

28

end

29 end

12 / 16

  • L is a set of L vectors
  • H is the parity-check

matrix of an LDPC code for RL

  • z CN per cluster
  • Q-table consists of ⌈m/z⌉

subtables, each of dimension Mz × z

slide-37
SLIDE 37

Experimental Setup

  • We apply the scheduling policies ˆ

πG and π(ℓmax)

Q

, respectively, in Step 12 of our MAB-NS algorithm, resulting in schemes denoted as GI and Q-learning, respectively

// decoding starts

10 if stopping condition not satisfied or ℓ < ℓmax then 11

s ← index of Sℓ

12

update CN a according to an optimum scheduling policy

13

foreach vk ∈ N(a) do

14

compute and propagate ma→vk

15

foreach cj ∈ N(vk) \ a do

16

compute and propagate mvk→cj

17

end

18

ˆ L(k)

c∈N (vk) mc→vk + Lk

// update LLR of vk

19

end

20

foreach CN j that is a neighbor of vk ∈ N(a) do

21

ˆ s(j)

vi∈N (j) ˆ

L(i)

ℓ 22

s(j)

← gM(ˆ s(j)

ℓ )

// update syndrome Sℓ

23

end

24

ℓ ← ℓ + 1

// update iteration

25 end

13 / 16

slide-38
SLIDE 38

Experimental Setup

  • We apply the scheduling policies ˆ

πG and π(ℓmax)

Q

, respectively, in Step 12 of our MAB-NS algorithm, resulting in schemes denoted as GI and Q-learning, respectively

  • We then utilize GI and Q-learning for sequential decoding of both

random (3, 6)-regular and (3, 7)-array-based (AB) LDPC codes

13 / 16

slide-39
SLIDE 39

Experimental Setup

  • We apply the scheduling policies ˆ

πG and π(ℓmax)

Q

, respectively, in Step 12 of our MAB-NS algorithm, resulting in schemes denoted as GI and Q-learning, respectively

  • We then utilize GI and Q-learning for sequential decoding of both

random (3, 6)-regular and (3, 7)-array-based (AB) LDPC codes

  • For learning, we consider α = 0.1, β = 0.9, ǫ = 0.6, z = 7, M = 4,

ℓmax = 25, for both codes, and |L | = 2.5 × 107 (resp. 5 × 107) for the (3, 6)-regular (resp. (3, 7)-AB) code

13 / 16

slide-40
SLIDE 40

Experimental Setup

  • We apply the scheduling policies ˆ

πG and π(ℓmax)

Q

, respectively, in Step 12 of our MAB-NS algorithm, resulting in schemes denoted as GI and Q-learning, respectively

  • We then utilize GI and Q-learning for sequential decoding of both

random (3, 6)-regular and (3, 7)-array-based (AB) LDPC codes

  • For learning, we consider α = 0.1, β = 0.9, ǫ = 0.6, z = 7, M = 4,

ℓmax = 25, for both codes, and |L | = 2.5 × 107 (resp. 5 × 107) for the (3, 6)-regular (resp. (3, 7)-AB) code

  • We compare the performances of GI and Q-learning with existing

BP decoding schemes of flooding and NS for ℓmax = 25

13 / 16

slide-41
SLIDE 41

Simulation Results

0.5 1 1.5 2

Eb/N0 (dB)

10 -2 10 -1

BER

  • Performance of (3, 6)-regular LDPC Codes, block length 196
  • Q-learning is superior to the other decoding schemes in terms of

BER performance

14 / 16

slide-42
SLIDE 42

Simulation Results

0.5 1 1.5 2

Eb/N0 (dB)

10 -2 10 -1

BER

  • Performance of (3, 7)-AB LDPC Codes, block length 196
  • Q-learning is superior to the other decoding schemes in terms of

BER performance

14 / 16

slide-43
SLIDE 43

Enumeration Results

SNR 0.6 1.2 1.4 1.8 flooding 16070 (29257) 13923 (27684) 9274 (22339) 7657 (17367) 4523 (10401) GI 337 (402) 299 (367) 254 (314) 253 (286) 220 (254) NS 324 (389) 290 (358) 256 (299) 238 (278) 222 (251) Q-learning 280 (374) 236 (308) 181 (281) 162 (276) 138 (234)

  • Average number of CN to VN messages propagated for a

(3, 6)-regular ((3, 7)-AB) LDPC code

15 / 16

slide-44
SLIDE 44

Enumeration Results

SNR 0.6 1.2 1.4 1.8 flooding 16070 (29257) 13923 (27684) 9274 (22339) 7657 (17367) 4523 (10401) GI 337 (402) 299 (367) 254 (314) 253 (286) 220 (254) NS 324 (389) 290 (358) 256 (299) 238 (278) 222 (251) Q-learning 280 (374) 236 (308) 181 (281) 162 (276) 138 (234)

  • Average number of CN to VN messages propagated for a

(3, 6)-regular ((3, 7)-AB) LDPC code

  • Q-learning generates a lower number of CN to VN messages

compared to the other schemes

15 / 16

slide-45
SLIDE 45

Enumeration Results

SNR 0.6 1.2 1.4 1.8 flooding 16070 (29257) 13923 (27684) 9274 (22339) 7657 (17367) 4523 (10401) GI 337 (402) 299 (367) 254 (314) 253 (286) 220 (254) NS 324 (389) 290 (358) 256 (299) 238 (278) 222 (251) Q-learning 280 (374) 236 (308) 181 (281) 162 (276) 138 (234)

  • Average number of CN to VN messages propagated for a

(3, 6)-regular ((3, 7)-AB) LDPC code

  • Q-learning generates a lower number of CN to VN messages

compared to the other schemes Q-learning significantly reduces message-passing complexity for short LDPC codes by avoiding real-time residual calculation

15 / 16

slide-46
SLIDE 46

Takeaways

  • RL based sequential scheduling scheme (MAB-NS) for

soft-decoding of LDPC codes

16 / 16

slide-47
SLIDE 47

Takeaways

  • RL based sequential scheduling scheme (MAB-NS) for

soft-decoding of LDPC codes

  • MAB-NS outperforms all existing sequential decoding schemes

16 / 16

slide-48
SLIDE 48

Takeaways

  • RL based sequential scheduling scheme (MAB-NS) for

soft-decoding of LDPC codes

  • MAB-NS outperforms all existing sequential decoding schemes
  • MAB-NS obviates the need for computing residuals in real time

16 / 16

slide-49
SLIDE 49

Takeaways

  • RL based sequential scheduling scheme (MAB-NS) for

soft-decoding of LDPC codes

  • MAB-NS outperforms all existing sequential decoding schemes
  • MAB-NS obviates the need for computing residuals in real time
  • Future work will focus on optimized clustering based sequential

LDPC decoding

16 / 16

slide-50
SLIDE 50

Thank you!