Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits - PowerPoint PPT Presentation

Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits Salman Habib, Allison Beemer, and J¨ org Kliewer The Center for Wireless Information Processing, New Jersey Institute of Technology June 2020 IEEE International Symposium on Information Theory

Background: Reinforcement Learning (RL) • RL is a framework for learning sequential decision making tasks [Sutton, 84], [Sutton, Barto, 18] 1 / 16

Background: Reinforcement Learning (RL) • RL is a framework for learning sequential decision making tasks [Sutton, 84], [Sutton, Barto, 18] • Typical applications include robotics, resource management in computer clusters, video games, etc. 1 / 16

Background: The MAB Problem [Jeremy Zhang: Reinforcement Learning — Multi-Arm Bandit Implementation] • The MAB problem refers to a special RL task 2 / 16

Background: The MAB Problem [Jeremy Zhang: Reinforcement Learning — Multi-Arm Bandit Implementation] • The MAB problem refers to a special RL task • A gambler (learner) has to decide which arm of a multi-armed slot machine to pull next with the goal of achieving the highest total reward in a sequence of pulls [Gittins, 79] 2 / 16

Background: LDPC Decoding CNs VNs • The traditional flooding scheme first updates all the check nodes (CNs) and then all the variable nodes (VNs) in the same iteration 3 / 16

Background: LDPC Decoding CNs VNs • The traditional flooding scheme first updates all the check nodes (CNs) and then all the variable nodes (VNs) in the same iteration • In comparison, sequential decoding schemes update a single node per iteration and converges faster than flooding [Kfir, Kanter, 03] 3 / 16

Background: Sequential Scheduling CNs VNs iteration 1 iteration 2 • Sequential LDPC decoding: only one CN (and its neighboring VNs) is scheduled per iteration 4 / 16

Background: Sequential Scheduling CNs VNs iteration 1 iteration 2 • Sequential LDPC decoding: only one CN (and its neighboring VNs) is scheduled per iteration • Node-wise scheduling (NS) uses CN residual r m a → v � | m ′ a → v − m a → v | as scheduling criterion [Casado et al., 10] 4 / 16

Background: Sequential Scheduling CNs VNs iteration 1 iteration 2 • Sequential LDPC decoding: only one CN (and its neighboring VNs) is scheduled per iteration • Node-wise scheduling (NS) uses CN residual r m a → v � | m ′ a → v − m a → v | as scheduling criterion [Casado et al., 10] • The higher the residual, the less reliable the message and hence propagating it first leads to faster decoder convergence 4 / 16

Background: Sequential Scheduling CNs VNs iteration 1 iteration 2 • Sequential LDPC decoding: only one CN (and its neighboring VNs) is scheduled per iteration • Node-wise scheduling (NS) uses CN residual r m a → v � | m ′ a → v − m a → v | as scheduling criterion [Casado et al., 10] • The higher the residual, the less reliable the message and hence propagating it first leads to faster decoder convergence • Disadvantage: residual calculation makes NS more complex than flooding scheme for the same total messages propagated 4 / 16

Motivation • [Nachmani, Marciano, Lugosch, Gross, Burshtein, Be’ery 17] Deep learning for improved decoding of linear codes 5 / 16

Motivation • [Nachmani, Marciano, Lugosch, Gross, Burshtein, Be’ery 17] Deep learning for improved decoding of linear codes • [Carpi, Hager, Martalo, Raheli, and Pfister, 19] Deep-RL for channel coding based on hard-decision decoding 5 / 16

Motivation • [Nachmani, Marciano, Lugosch, Gross, Burshtein, Be’ery 17] Deep learning for improved decoding of linear codes • [Carpi, Hager, Martalo, Raheli, and Pfister, 19] Deep-RL for channel coding based on hard-decision decoding In this work: MAB-based sequential CN scheduling (MAB-NS) scheme for soft-decoding of LDPC codes • Obviates real-time calculation of CN residuals • Utilizes a novel clustering scheme to significantly reduce the learning complexity induced by soft-decoding 5 / 16

The Proposed MAB Framework CNs VNs • NS scheme is modeled as a finite Markov decision process (MDP) 6 / 16

The Proposed MAB Framework CNs VNs • NS scheme is modeled as a finite Markov decision process (MDP) • Action A ℓ denotes the index of a scheduled CN out of m CNs (arms) in iteration ℓ 6 / 16

The Proposed MAB Framework CNs VNs • NS scheme is modeled as a finite Markov decision process (MDP) • Action A ℓ denotes the index of a scheduled CN out of m CNs (arms) in iteration ℓ • A quantized syndrome vector S ℓ = [ S ( 0 ) , . . . , S ( m − 1 ) ] represents ℓ ℓ the state of the MDP in iteration ℓ 6 / 16

The Proposed MAB Framework CNs VNs • NS scheme is modeled as a finite Markov decision process (MDP) • Action A ℓ denotes the index of a scheduled CN out of m CNs (arms) in iteration ℓ • A quantized syndrome vector S ℓ = [ S ( 0 ) , . . . , S ( m − 1 ) ] represents ℓ ℓ the state of the MDP in iteration ℓ • Decision-making process leads to a future state s ′ ∈ S ( M ) and reward R a = max v ∈N ( a ) r m a → v that relies on s and a 6 / 16

Solving the MAB Problem • Compute an action-value called Gittins indices (GIs), where all CNs are assumed to be independent [Gittins, 79] 7 / 16

Solving the MAB Problem • Compute an action-value called Gittins indices (GIs), where all CNs are assumed to be independent [Gittins, 79] • Utilize Q-learning , a Monte Carlo approach for estimating the action-value of a CN [Watkins, 89] 7 / 16

Solving the MAB Problem • Compute an action-value called Gittins indices (GIs), where all CNs are assumed to be independent [Gittins, 79] • Utilize Q-learning , a Monte Carlo approach for estimating the action-value of a CN [Watkins, 89] • Learning complexity in this method grows exponentially with the number of CNs • Solution: group CNs as clusters with separate state and action spaces 7 / 16

The MAB-NS Algorithm Input : L , H Output: reconstructed codeword 1 Initialization: • L is a vector of ℓ ← 0 2 m c → v ← 0 // for all CN to VN messages 3 log-likelihood ratios m v i → c ← L i // for all VN to CN messages 4 ˆ L ℓ ← L • H is the parity-check 5 ˆ S ℓ ← H ˆ L ℓ 6 matrix of an LDPC code 7 foreach a ∈ [[ m ]] do s ( a ) s ( a ) ← g M (ˆ ℓ ) // M -level quantization 8 ℓ 9 end // decoding starts 10 if stopping condition not satisfied or ℓ < ℓ max then s ← index of S ℓ 11 update CN a according to an optimum scheduling policy 12 foreach v k ∈ N ( a ) do 13 compute and propagate m a → v k 14 foreach c j ∈ N ( v k ) \ a do 15 compute and propagate m v k → c j 16 end 17 L ( k ) ˆ ← � c ∈N ( v k ) m c → v k + L k // update LLR of v k 18 ℓ end 19 foreach CN j that is a neighbor of v k ∈ N ( a ) do 20 s ( j ) L ( i ) v i ∈N ( j ) ˆ ˆ ← � 21 ℓ ℓ s ( j ) s ( j ) ← g M (ˆ ℓ ) // update syndrome S ℓ 22 ℓ end 23 ℓ ← ℓ + 1 // update iteration 24 25 end 8 / 16

The MAB-NS Algorithm Input : L , H Output: reconstructed codeword 1 Initialization: • L is a vector of ℓ ← 0 2 m c → v ← 0 // for all CN to VN messages 3 log-likelihood ratios m v i → c ← L i // for all VN to CN messages 4 ˆ L ℓ ← L • H is the parity-check 5 ˆ S ℓ ← H ˆ L ℓ 6 matrix of an LDPC code 7 foreach a ∈ [[ m ]] do s ( a ) s ( a ) ← g M (ˆ ℓ ) // M -level quantization 8 ℓ • Steps 10-25 represent NS 9 end // decoding starts (no residuals computed) 10 if stopping condition not satisfied or ℓ < ℓ max then s ← index of S ℓ 11 update CN a according to an optimum scheduling policy 12 foreach v k ∈ N ( a ) do 13 compute and propagate m a → v k 14 foreach c j ∈ N ( v k ) \ a do 15 compute and propagate m v k → c j 16 end 17 L ( k ) ˆ ← � c ∈N ( v k ) m c → v k + L k // update LLR of v k 18 ℓ end 19 foreach CN j that is a neighbor of v k ∈ N ( a ) do 20 s ( j ) L ( i ) v i ∈N ( j ) ˆ ˆ ← � 21 ℓ ℓ s ( j ) s ( j ) ← g M (ˆ ℓ ) // update syndrome S ℓ 22 ℓ end 23 ℓ ← ℓ + 1 // update iteration 24 25 end 8 / 16

The MAB-NS Algorithm Input : L , H Output: reconstructed codeword 1 Initialization: • L is a vector of ℓ ← 0 2 m c → v ← 0 // for all CN to VN messages 3 log-likelihood ratios m v i → c ← L i // for all VN to CN messages 4 ˆ L ℓ ← L • H is the parity-check 5 ˆ S ℓ ← H ˆ L ℓ 6 matrix of an LDPC code 7 foreach a ∈ [[ m ]] do s ( a ) s ( a ) ← g M (ˆ ℓ ) // M -level quantization 8 ℓ • Steps 10-25 represent NS 9 end // decoding starts (no residuals computed) 10 if stopping condition not satisfied or ℓ < ℓ max then s ← index of S ℓ 11 • An optimized CN update CN a according to an optimum scheduling policy 12 foreach v k ∈ N ( a ) do 13 scheduling policy invoked compute and propagate m a → v k 14 foreach c j ∈ N ( v k ) \ a do 15 in Step 12 compute and propagate m v k → c j 16 end 17 • Learned by solving the L ( k ) ˆ ← � c ∈N ( v k ) m c → v k + L k // update LLR of v k 18 ℓ end MAB problem 19 foreach CN j that is a neighbor of v k ∈ N ( a ) do 20 s ( j ) L ( i ) v i ∈N ( j ) ˆ ˆ ← � 21 ℓ ℓ s ( j ) s ( j ) ← g M (ˆ ℓ ) // update syndrome S ℓ 22 ℓ end 23 ℓ ← ℓ + 1 // update iteration 24 25 end 8 / 16

Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits - PowerPoint PPT Presentation

Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits Salman Habib, Allison Beemer, and J org Kliewer The Center for Wireless Information Processing, New Jersey Institute of Technology June 2020 IEEE International Symposium on

Design of Energy-Efficient LDPC Codes and Decoders Elsa Dupraz 16/04/2019 Section 1:

Combinational Logic Building Blocks Chapter 6 Combinational Logic Introduction Decoders Basic

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long LDPC Decoder for Long

Low-latency software LDPC decoders for x86 multi-core devices Bertrand LE GAL and Christophe JEGO

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

One Armed Bandit source: http://dogbeforewicket.blogspot.ca EECS 1030 moodle.yorku.ca One Armed

Multicore Implementation of LDPC Decoders based on ADMM Algorithm Imen DEBBABI 1 , Nadia KHOUJA 1

Multicore Implementation of LDPC Decoders based on ADMM Algorithm Imen DEBBABI 1 , Nadia KHOUJA 1

A Reaction Attack on the QC-LDPC McEliece Cryptosystem Tomas Fabsic 1 , Viliam Hromada 1 , Paul

Design and Analysis of LDPC for MIMO-OFDM Guosen Yue NEC Labs Research Princeton, NJ Joint work

- tunnel-effect ( "micro-convergence" ) for SC-LDPC [ 1 ] [ 1 ] Schmalen, ten Brink,

COLOR CODE DECODERS FROM TORIC CODE DECODERS Aleksander Kubica work w/ N. Delfosse

Unit 7 Fundamental Digital Building Blocks: Decoders & Multiplexers 7.2 CHECKERS / DECODERS

Mixes Mixes - state of the art Enables the user to communicate with each other without

Evading Cellular Data Monitoring With Human Movement Networks Adam J. Aviv, Micah Sherr*, Matt

Project 5 Soumya Basu Department of Computer Science Cornell University September 18, 2015

HDMQ :Towards In-Order and Exactly-Once Delivery using H ierarchical D istributed M essage Q ueues

An Abstraction Technique for Parameterized Model Checking of Leader Election Protocols:

Self-Organization in Autonomous Sensor/Actuator Networks [SelfOrg] Dr.-Ing. Falko Dressler

for Inter Mesh Networks Masaaki Ohnishi, Kazuyuki Shudo Tokyo Institute of Technology Tokyo Tech 1

Tree Algorithms Stefan Schmid @ T-Labs, 2011 Broadcast Why trees? E.g., efficient broadcast,

Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits - PowerPoint PPT Presentation

Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits Salman Habib, Allison Beemer, and J org Kliewer The Center for Wireless Information Processing, New Jersey Institute of Technology June 2020 IEEE International Symposium on

Design of Energy-Efficient LDPC Codes and Decoders Elsa Dupraz 16/04/2019 Section 1:

Combinational Logic Building Blocks Chapter 6 Combinational Logic Introduction Decoders Basic

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long LDPC Decoder for Long

Low-latency software LDPC decoders for x86 multi-core devices Bertrand LE GAL and Christophe JEGO

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

One Armed Bandit source: http://dogbeforewicket.blogspot.ca EECS 1030 moodle.yorku.ca One Armed

Multicore Implementation of LDPC Decoders based on ADMM Algorithm Imen DEBBABI 1 , Nadia KHOUJA 1

Multicore Implementation of LDPC Decoders based on ADMM Algorithm Imen DEBBABI 1 , Nadia KHOUJA 1

A Reaction Attack on the QC-LDPC McEliece Cryptosystem Tomas Fabsic 1 , Viliam Hromada 1 , Paul

Design and Analysis of LDPC for MIMO-OFDM Guosen Yue NEC Labs Research Princeton, NJ Joint work

- tunnel-effect ( &quot;micro-convergence&quot; ) for SC-LDPC [ 1 ] [ 1 ] Schmalen, ten Brink,

COLOR CODE DECODERS FROM TORIC CODE DECODERS Aleksander Kubica work w/ N. Delfosse

Unit 7 Fundamental Digital Building Blocks: Decoders &amp; Multiplexers 7.2 CHECKERS / DECODERS

Mixes Mixes - state of the art Enables the user to communicate with each other without

Evading Cellular Data Monitoring With Human Movement Networks Adam J. Aviv, Micah Sherr*, Matt

Project 5 Soumya Basu Department of Computer Science Cornell University September 18, 2015

HDMQ :Towards In-Order and Exactly-Once Delivery using H ierarchical D istributed M essage Q ueues

An Abstraction Technique for Parameterized Model Checking of Leader Election Protocols:

Self-Organization in Autonomous Sensor/Actuator Networks [SelfOrg] Dr.-Ing. Falko Dressler

for Inter Mesh Networks Masaaki Ohnishi, Kazuyuki Shudo Tokyo Institute of Technology Tokyo Tech 1

Tree Algorithms Stefan Schmid @ T-Labs, 2011 Broadcast Why trees? E.g., efficient broadcast,

- tunnel-effect ( "micro-convergence" ) for SC-LDPC [ 1 ] [ 1 ] Schmalen, ten Brink,

Unit 7 Fundamental Digital Building Blocks: Decoders & Multiplexers 7.2 CHECKERS / DECODERS