mab learning in iot networks
play

MAB Learning in IoT Networks Learning helps even in non-stationary - PowerPoint PPT Presentation

MAB Learning in IoT Networks Learning helps even in non-stationary settings! Rmi Bonnefoi Lilian Besson milie Kaufmann Christophe Moy Jacques Palicot PhD Student in France Team SCEE, IETR, CentraleSuplec, Rennes & Team SequeL,


  1. MAB Learning in IoT Networks Learning helps even in non-stationary settings! Rémi Bonnefoi Lilian Besson Émilie Kaufmann Christophe Moy Jacques Palicot PhD Student in France Team SCEE, IETR, CentraleSupélec, Rennes & Team SequeL, CRIStAL, Inria, Lille 20-21 Sept - CROWNCOM 2017

  2. 1. Introduction and motivation 1.a. Objective We want A lot of IoT devices want to access to a gateway of base station. Insert them in a crowded wireless network . With a protocol slotted in time and frequency . Each device has a low duty cycle (a few messages per day). Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 2 / 18

  3. 1. Introduction and motivation 1.a. Objective We want A lot of IoT devices want to access to a gateway of base station. Insert them in a crowded wireless network . With a protocol slotted in time and frequency . Each device has a low duty cycle (a few messages per day). Goal Maintain a good Quality of Service . Without centralized supervision! Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 2 / 18

  4. 1. Introduction and motivation 1.a. Objective We want A lot of IoT devices want to access to a gateway of base station. Insert them in a crowded wireless network . With a protocol slotted in time and frequency . Each device has a low duty cycle (a few messages per day). Goal Maintain a good Quality of Service . Without centralized supervision! How? Use learning algorithms : devices will learn on which frequency they should talk! Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 2 / 18

  5. 1. Introduction and motivation 1.b. Outline Outline 1 Introduction and motivation 2 Model and hypotheses 3 Baseline algorithms : to compare against naive and efficient centralized approaches 4 Two Multi-Armed Bandit algorithms : UCB, Thompson sampling 5 Experimental results 6 Perspectives and future works 7 Conclusion Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 3 / 18

  6. 2. Model and hypotheses 2.a. Model Model Discrete time t ≥ 1 and N c radio channels ( e.g. , 10) ( known ) Figure 1: Protocol in time and frequency, with an Acknowledgement . D dynamic devices try to access the network independently S = S 1 + · · · + S N c static devices occupy the network : S 1 , . . . , S N c in each channel ( unknown ). Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 4 / 18

  7. 2. Model and hypotheses 2.b. Hypotheses Hypotheses I Emission model Each device has the same low emission probability: each step, each device sends a packet with probability p . (this gives a duty cycle proportional to 1 /p ) Background traffic Each static device uses only one channel. Their repartition is fixed in time. = ⇒ Background traffic, bothering the dynamic devices! Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 5 / 18

  8. 2. Model and hypotheses 2.b. Hypotheses Hypotheses II Dynamic radio reconfiguration Each dynamic device decides the channel it uses to send every packet . It has memory and computational capacity to implement basic decision algorithm. Problem Goal : maximize packet loss ratio ( = number of received Ack ) in a finite-space discrete-time Decision Making Problem . Solution ? Multi-Armed Bandit algorithms , decentralized and used independently by each device. Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 6 / 18

  9. 3. Baseline algorithms 3.a. A naive strategy : uniformly random access A naive strategy : uniformly random access Uniformly random access : dynamic devices choose uniformly their channel in the pull of N c channels. Natural strategy, dead simple to implement. Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 7 / 18

  10. 3. Baseline algorithms 3.a. A naive strategy : uniformly random access A naive strategy : uniformly random access Uniformly random access : dynamic devices choose uniformly their channel in the pull of N c channels. Natural strategy, dead simple to implement. Simple analysis, in term of successful transmission probability (for every message from dynamic devices) : N c � × 1 (1 − p/N c ) D − 1 (1 − p ) S i P ( success | sent ) = × . � �� � � �� � N c i =1 No other dynamic device No static device Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 7 / 18

  11. 3. Baseline algorithms 3.a. A naive strategy : uniformly random access A naive strategy : uniformly random access Uniformly random access : dynamic devices choose uniformly their channel in the pull of N c channels. Natural strategy, dead simple to implement. Simple analysis, in term of successful transmission probability (for every message from dynamic devices) : N c � × 1 (1 − p/N c ) D − 1 (1 − p ) S i P ( success | sent ) = × . � �� � � �� � N c i =1 No other dynamic device No static device Works fine only if all channels are similarly occupied, but it cannot learn to exploit the best (more free) channels. Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 7 / 18

  12. 3. Baseline algorithms 3.b. Optimal centralized strategy Optimal centralized strategy I If an oracle can decide to affect D i dynamic devices to channel i , the successful transmission probability is: N c � P ( success | sent ) = (1 − p ) D i − 1 (1 − p ) S i × × D i /D . � �� � � �� � � �� � i =1 D i − 1 others No static device Sent in channel i The oracle has to solve this optimization problem :  � N c  i =1 D i (1 − p ) S i + D i − 1 arg max  D 1 ,...,D Nc � N c   such that i =1 D i = D and D i ≥ 0 , ∀ 1 ≤ i ≤ N c . We solved this quasi-convex optimization problem with Lagrange multipliers , only numerically. Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 8 / 18

  13. 3. Baseline algorithms 3.b. Optimal centralized strategy Optimal centralized strategy II ⇒ Very good performance, maximizing the transmission = rate of all the D dynamic devices But unrealistic But not achievable in practice : no centralized oracle! Let see realistic decentralized approaches → Machine Learning ? ֒ → Reinforcement Learning ? ֒ → Multi-Armed Bandit ! ֒ Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 9 / 18

  14. 4. Multi-Armed Bandit algorithm : UCB 4.1. Multi-Armed Bandit formulation Multi-Armed Bandit formulation A dynamic device tries to collect rewards when transmitting : it transmits following a Bernoulli process (probability p of transmitting at each time step τ ), chooses a channel A ( τ ) ∈ { 1 , . . . , N c } , if Ack (no collision) ⇒ reward r A ( τ ) = 1 , = if collision (no Ack ) ⇒ reward r A ( τ ) = 0 . = Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 10 / 18

  15. 4. Multi-Armed Bandit algorithm : UCB 4.1. Multi-Armed Bandit formulation Multi-Armed Bandit formulation A dynamic device tries to collect rewards when transmitting : it transmits following a Bernoulli process (probability p of transmitting at each time step τ ), chooses a channel A ( τ ) ∈ { 1 , . . . , N c } , if Ack (no collision) ⇒ reward r A ( τ ) = 1 , = if collision (no Ack ) ⇒ reward r A ( τ ) = 0 . = Reinforcement Learning interpretation Maximize transmission rate ≡ maximize cumulated rewards horizon � max r A ( τ ) . algorithm A τ =1 Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 10 / 18

  16. 4. Multi-Armed Bandit algorithm : UCB 4.2. Upper Confidence Bound algorithm : UCB Upper Confidence Bound algorithm ( UCB 1 ) A dynamic device keeps τ number of sent packets, T k ( t ) selections of channel k , X k ( t ) successful transmission in channel k . 1 For the first N c steps ( τ = 1 , . . . , N c ), try each channel once . 2 Then for the next steps t ≥ N c : � X k ( τ ) log( τ ) Compute the index g k ( τ ) := + 2 N k ( τ ) , N k ( τ ) � �� � � �� � Mean � Upper Confidence Bound µ k ( τ ) Choose channel A ( τ ) = arg max g k ( τ ) , k Update T k ( τ + 1) and X k ( τ + 1) . References: [Lai & Robbins, 1985], [Auer et al, 2002], [Bubeck & Cesa-Bianchi, 2012] Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 11 / 18

  17. 5. Experimental results 5.1. Experiment setting Experimental setting Simulation parameters N c = 10 channels, S + D = 10000 devices in total, p = 10 − 3 probability of emission, horizon = 10 5 time slots ( ≃ 100 messages / device), The proportion of dynamic devices D/ ( S + D ) varies, Various settings for ( S 1 , . . . , S N c ) static devices repartition. What do we show After a short learning time, MAB algorithms are almost as efficient as the oracle solution. Never worse than the naive solution. Thompson sampling is even more efficient than UCB. Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 12 / 18

  18. 5. Experimental results 5.2. First result: 10% 10% of dynamic devices 0.91 0.9 Successful transmission rate 0.89 0.88 0.87 0.86 UCB 0.85 Thompson-sampling Optimal 0.84 Good sub-optimal Random 0.83 0.82 2 4 6 8 10 Number of slots × 10 5 Figure 2: 10% of dynamic devices. 7% of gain. Lilian Besson (CentraleSupélec & Inria) MAB Learning in IoT Networks CROWNCOM 2017 13 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend