learned scheduling of ldpc decoders based on multi armed
play

Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits - PowerPoint PPT Presentation

Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits Salman Habib, Allison Beemer, and J org Kliewer The Center for Wireless Information Processing, New Jersey Institute of Technology June 2020 IEEE International Symposium on


  1. Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits Salman Habib, Allison Beemer, and J¨ org Kliewer The Center for Wireless Information Processing, New Jersey Institute of Technology June 2020 IEEE International Symposium on Information Theory

  2. Background: Reinforcement Learning (RL) • RL is a framework for learning sequential decision making tasks [Sutton, 84], [Sutton, Barto, 18] 1 / 16

  3. Background: Reinforcement Learning (RL) • RL is a framework for learning sequential decision making tasks [Sutton, 84], [Sutton, Barto, 18] • Typical applications include robotics, resource management in computer clusters, video games, etc. 1 / 16

  4. Background: The MAB Problem [Jeremy Zhang: Reinforcement Learning — Multi-Arm Bandit Implementation] • The MAB problem refers to a special RL task 2 / 16

  5. Background: The MAB Problem [Jeremy Zhang: Reinforcement Learning — Multi-Arm Bandit Implementation] • The MAB problem refers to a special RL task • A gambler (learner) has to decide which arm of a multi-armed slot machine to pull next with the goal of achieving the highest total reward in a sequence of pulls [Gittins, 79] 2 / 16

  6. Background: LDPC Decoding CNs VNs • The traditional flooding scheme first updates all the check nodes (CNs) and then all the variable nodes (VNs) in the same iteration 3 / 16

  7. Background: LDPC Decoding CNs VNs • The traditional flooding scheme first updates all the check nodes (CNs) and then all the variable nodes (VNs) in the same iteration • In comparison, sequential decoding schemes update a single node per iteration and converges faster than flooding [Kfir, Kanter, 03] 3 / 16

  8. Background: Sequential Scheduling CNs VNs iteration 1 iteration 2 • Sequential LDPC decoding: only one CN (and its neighboring VNs) is scheduled per iteration 4 / 16

  9. Background: Sequential Scheduling CNs VNs iteration 1 iteration 2 • Sequential LDPC decoding: only one CN (and its neighboring VNs) is scheduled per iteration • Node-wise scheduling (NS) uses CN residual r m a → v � | m ′ a → v − m a → v | as scheduling criterion [Casado et al., 10] 4 / 16

  10. Background: Sequential Scheduling CNs VNs iteration 1 iteration 2 • Sequential LDPC decoding: only one CN (and its neighboring VNs) is scheduled per iteration • Node-wise scheduling (NS) uses CN residual r m a → v � | m ′ a → v − m a → v | as scheduling criterion [Casado et al., 10] • The higher the residual, the less reliable the message and hence propagating it first leads to faster decoder convergence 4 / 16

  11. Background: Sequential Scheduling CNs VNs iteration 1 iteration 2 • Sequential LDPC decoding: only one CN (and its neighboring VNs) is scheduled per iteration • Node-wise scheduling (NS) uses CN residual r m a → v � | m ′ a → v − m a → v | as scheduling criterion [Casado et al., 10] • The higher the residual, the less reliable the message and hence propagating it first leads to faster decoder convergence • Disadvantage: residual calculation makes NS more complex than flooding scheme for the same total messages propagated 4 / 16

  12. Motivation • [Nachmani, Marciano, Lugosch, Gross, Burshtein, Be’ery 17] Deep learning for improved decoding of linear codes 5 / 16

  13. Motivation • [Nachmani, Marciano, Lugosch, Gross, Burshtein, Be’ery 17] Deep learning for improved decoding of linear codes • [Carpi, Hager, Martalo, Raheli, and Pfister, 19] Deep-RL for channel coding based on hard-decision decoding 5 / 16

  14. Motivation • [Nachmani, Marciano, Lugosch, Gross, Burshtein, Be’ery 17] Deep learning for improved decoding of linear codes • [Carpi, Hager, Martalo, Raheli, and Pfister, 19] Deep-RL for channel coding based on hard-decision decoding In this work: MAB-based sequential CN scheduling (MAB-NS) scheme for soft-decoding of LDPC codes • Obviates real-time calculation of CN residuals • Utilizes a novel clustering scheme to significantly reduce the learning complexity induced by soft-decoding 5 / 16

  15. The Proposed MAB Framework CNs VNs • NS scheme is modeled as a finite Markov decision process (MDP) 6 / 16

  16. The Proposed MAB Framework CNs VNs • NS scheme is modeled as a finite Markov decision process (MDP) • Action A ℓ denotes the index of a scheduled CN out of m CNs (arms) in iteration ℓ 6 / 16

  17. The Proposed MAB Framework CNs VNs • NS scheme is modeled as a finite Markov decision process (MDP) • Action A ℓ denotes the index of a scheduled CN out of m CNs (arms) in iteration ℓ • A quantized syndrome vector S ℓ = [ S ( 0 ) , . . . , S ( m − 1 ) ] represents ℓ ℓ the state of the MDP in iteration ℓ 6 / 16

  18. The Proposed MAB Framework CNs VNs • NS scheme is modeled as a finite Markov decision process (MDP) • Action A ℓ denotes the index of a scheduled CN out of m CNs (arms) in iteration ℓ • A quantized syndrome vector S ℓ = [ S ( 0 ) , . . . , S ( m − 1 ) ] represents ℓ ℓ the state of the MDP in iteration ℓ • Decision-making process leads to a future state s ′ ∈ S ( M ) and reward R a = max v ∈N ( a ) r m a → v that relies on s and a 6 / 16

  19. Solving the MAB Problem • Compute an action-value called Gittins indices (GIs), where all CNs are assumed to be independent [Gittins, 79] 7 / 16

  20. Solving the MAB Problem • Compute an action-value called Gittins indices (GIs), where all CNs are assumed to be independent [Gittins, 79] • Utilize Q-learning , a Monte Carlo approach for estimating the action-value of a CN [Watkins, 89] 7 / 16

  21. Solving the MAB Problem • Compute an action-value called Gittins indices (GIs), where all CNs are assumed to be independent [Gittins, 79] • Utilize Q-learning , a Monte Carlo approach for estimating the action-value of a CN [Watkins, 89] • Learning complexity in this method grows exponentially with the number of CNs • Solution: group CNs as clusters with separate state and action spaces 7 / 16

  22. The MAB-NS Algorithm Input : L , H Output: reconstructed codeword 1 Initialization: • L is a vector of ℓ ← 0 2 m c → v ← 0 // for all CN to VN messages 3 log-likelihood ratios m v i → c ← L i // for all VN to CN messages 4 ˆ L ℓ ← L • H is the parity-check 5 ˆ S ℓ ← H ˆ L ℓ 6 matrix of an LDPC code 7 foreach a ∈ [[ m ]] do s ( a ) s ( a ) ← g M (ˆ ℓ ) // M -level quantization 8 ℓ 9 end // decoding starts 10 if stopping condition not satisfied or ℓ < ℓ max then s ← index of S ℓ 11 update CN a according to an optimum scheduling policy 12 foreach v k ∈ N ( a ) do 13 compute and propagate m a → v k 14 foreach c j ∈ N ( v k ) \ a do 15 compute and propagate m v k → c j 16 end 17 L ( k ) ˆ ← � c ∈N ( v k ) m c → v k + L k // update LLR of v k 18 ℓ end 19 foreach CN j that is a neighbor of v k ∈ N ( a ) do 20 s ( j ) L ( i ) v i ∈N ( j ) ˆ ˆ ← � 21 ℓ ℓ s ( j ) s ( j ) ← g M (ˆ ℓ ) // update syndrome S ℓ 22 ℓ end 23 ℓ ← ℓ + 1 // update iteration 24 25 end 8 / 16

  23. The MAB-NS Algorithm Input : L , H Output: reconstructed codeword 1 Initialization: • L is a vector of ℓ ← 0 2 m c → v ← 0 // for all CN to VN messages 3 log-likelihood ratios m v i → c ← L i // for all VN to CN messages 4 ˆ L ℓ ← L • H is the parity-check 5 ˆ S ℓ ← H ˆ L ℓ 6 matrix of an LDPC code 7 foreach a ∈ [[ m ]] do s ( a ) s ( a ) ← g M (ˆ ℓ ) // M -level quantization 8 ℓ • Steps 10-25 represent NS 9 end // decoding starts (no residuals computed) 10 if stopping condition not satisfied or ℓ < ℓ max then s ← index of S ℓ 11 update CN a according to an optimum scheduling policy 12 foreach v k ∈ N ( a ) do 13 compute and propagate m a → v k 14 foreach c j ∈ N ( v k ) \ a do 15 compute and propagate m v k → c j 16 end 17 L ( k ) ˆ ← � c ∈N ( v k ) m c → v k + L k // update LLR of v k 18 ℓ end 19 foreach CN j that is a neighbor of v k ∈ N ( a ) do 20 s ( j ) L ( i ) v i ∈N ( j ) ˆ ˆ ← � 21 ℓ ℓ s ( j ) s ( j ) ← g M (ˆ ℓ ) // update syndrome S ℓ 22 ℓ end 23 ℓ ← ℓ + 1 // update iteration 24 25 end 8 / 16

  24. The MAB-NS Algorithm Input : L , H Output: reconstructed codeword 1 Initialization: • L is a vector of ℓ ← 0 2 m c → v ← 0 // for all CN to VN messages 3 log-likelihood ratios m v i → c ← L i // for all VN to CN messages 4 ˆ L ℓ ← L • H is the parity-check 5 ˆ S ℓ ← H ˆ L ℓ 6 matrix of an LDPC code 7 foreach a ∈ [[ m ]] do s ( a ) s ( a ) ← g M (ˆ ℓ ) // M -level quantization 8 ℓ • Steps 10-25 represent NS 9 end // decoding starts (no residuals computed) 10 if stopping condition not satisfied or ℓ < ℓ max then s ← index of S ℓ 11 • An optimized CN update CN a according to an optimum scheduling policy 12 foreach v k ∈ N ( a ) do 13 scheduling policy invoked compute and propagate m a → v k 14 foreach c j ∈ N ( v k ) \ a do 15 in Step 12 compute and propagate m v k → c j 16 end 17 • Learned by solving the L ( k ) ˆ ← � c ∈N ( v k ) m c → v k + L k // update LLR of v k 18 ℓ end MAB problem 19 foreach CN j that is a neighbor of v k ∈ N ( a ) do 20 s ( j ) L ( i ) v i ∈N ( j ) ˆ ˆ ← � 21 ℓ ℓ s ( j ) s ( j ) ← g M (ˆ ℓ ) // update syndrome S ℓ 22 ℓ end 23 ℓ ← ℓ + 1 // update iteration 24 25 end 8 / 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend