multi player bandits revisited
play

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm - PowerPoint PPT Presentation

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint work with milie Kaufmann PhD Student Team SCEE, IETR, CentraleSuplec, Rennes & Team SequeL, CRIStAL, Inria, Lille CMAP Seminar 31 st


  1. Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint work with Émilie Kaufmann PhD Student Team SCEE, IETR, CentraleSupélec, Rennes & Team SequeL, CRIStAL, Inria, Lille CMAP Seminar – 31 st October 2018

  2. Insert them in a crowded wireless network. With a protocol slotted in both time and frequency. Goal Maintain a good Quality of Service. With no centralized control as it costs network overhead. How? Devices can choose a different radio channel at each time learn the best one with a sequential algorithm! 1. Introduction and motivation 1.a. Objective Motivation We control some communicating devices, they want to use a wireless access point. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 2 / 45

  3. Goal Maintain a good Quality of Service. With no centralized control as it costs network overhead. How? Devices can choose a different radio channel at each time learn the best one with a sequential algorithm! 1. Introduction and motivation 1.a. Objective Motivation We control some communicating devices, they want to use a wireless access point. Insert them in a crowded wireless network. With a protocol slotted in both time and frequency. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 2 / 45

  4. How? Devices can choose a different radio channel at each time learn the best one with a sequential algorithm! 1. Introduction and motivation 1.a. Objective Motivation We control some communicating devices, they want to use a wireless access point. Insert them in a crowded wireless network. With a protocol slotted in both time and frequency. Goal Maintain a good Quality of Service. With no centralized control as it costs network overhead. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 2 / 45

  5. 1. Introduction and motivation 1.a. Objective Motivation We control some communicating devices, they want to use a wireless access point. Insert them in a crowded wireless network. With a protocol slotted in both time and frequency. Goal Maintain a good Quality of Service. With no centralized control as it costs network overhead. How? Devices can choose a different radio channel at each time � → learn the best one with a sequential algorithm! Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 2 / 45

  6. and reference Quick reminder on single-player MAB algorithms 4 New multi-player non-coordinated decentralized algorithms 5 Our upper bound on regret for 6 Experimental results 7 Review of two more recent articles 8 Conclusion 9 Based on “Multi-Player Bandits Revisited”, by Lilian Besson & Émilie Kaufmann. arXiv:1711.02317 , presented at ALT 2018 (Lanzarote, Spain) in April. 1. Introduction and motivation 1.b. Outline and references Outline Introduction 1 Our model: 3 different feedback levels 2 Regret of the system, and our lower bound on regret 3 Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 3 / 45

  7. and reference Experimental results 7 Review of two more recent articles 8 Conclusion 9 Based on “Multi-Player Bandits Revisited”, by Lilian Besson & Émilie Kaufmann. arXiv:1711.02317 , presented at ALT 2018 (Lanzarote, Spain) in April. 1. Introduction and motivation 1.b. Outline and references Outline Introduction 1 Our model: 3 different feedback levels 2 Regret of the system, and our lower bound on regret 3 Quick reminder on single-player MAB algorithms 4 New multi-player non-coordinated decentralized algorithms 5 Our upper bound on regret for MCTopM 6 Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 3 / 45

  8. and reference Conclusion 9 Based on “Multi-Player Bandits Revisited”, by Lilian Besson & Émilie Kaufmann. arXiv:1711.02317 , presented at ALT 2018 (Lanzarote, Spain) in April. 1. Introduction and motivation 1.b. Outline and references Outline Introduction 1 Our model: 3 different feedback levels 2 Regret of the system, and our lower bound on regret 3 Quick reminder on single-player MAB algorithms 4 New multi-player non-coordinated decentralized algorithms 5 Our upper bound on regret for MCTopM 6 Experimental results 7 Review of two more recent articles 8 Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 3 / 45

  9. and reference Based on “Multi-Player Bandits Revisited”, by Lilian Besson & Émilie Kaufmann. arXiv:1711.02317 , presented at ALT 2018 (Lanzarote, Spain) in April. 1. Introduction and motivation 1.b. Outline and references Outline Introduction 1 Our model: 3 different feedback levels 2 Regret of the system, and our lower bound on regret 3 Quick reminder on single-player MAB algorithms 4 New multi-player non-coordinated decentralized algorithms 5 Our upper bound on regret for MCTopM 6 Experimental results 7 Review of two more recent articles 8 Conclusion 9 Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 3 / 45

  10. 1. Introduction and motivation 1.b. Outline and references Outline and reference Introduction 1 Our model: 3 different feedback levels 2 Regret of the system, and our lower bound on regret 3 Quick reminder on single-player MAB algorithms 4 New multi-player non-coordinated decentralized algorithms 5 Our upper bound on regret for MCTopM 6 Experimental results 7 Review of two more recent articles 8 Conclusion 9 Based on “Multi-Player Bandits Revisited”, by Lilian Besson & Émilie Kaufmann. arXiv:1711.02317 , presented at ALT 2018 (Lanzarote, Spain) in April. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 3 / 45

  11. 2. Our model: 3 different feedback levels Our model Our communication model 1 With or without sensing 2 Background traffic, and rewards 3 Different feedback levels 4 Goal 5 Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 4 / 45

  12. Dynamic device dynamic radio reconfiguration It decides each time the channel it uses to send each packet. It can implement a simple decision algorithm. 2. Our model: 3 different feedback levels 2.a. Our communication model Our communication model K radio channels (e.g., 10). Discrete and synchronized time t ≥ 1 . Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 5 / 45

  13. 2. Our model: 3 different feedback levels 2.a. Our communication model Our communication model K radio channels (e.g., 10). Discrete and synchronized time t ≥ 1 . Dynamic device = dynamic radio reconfiguration It decides each time the channel it uses to send each packet. It can implement a simple decision algorithm. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 5 / 45

  14. Two variants : with or without sensing With sensing: Device first senses for presence of Primary Users that have strict 1 priority (background traffic), then use Ack to detect collisions. Without sensing: same background traffic, but cannot sense, so only Ack is used. 2 2. Our model: 3 different feedback levels 2.b. With or without sensing Our model “Easy” case M ≤ K devices always communicate and try to access the network, independently without centralized supervision, Background traffic is i.i.d.. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 6 / 45

  15. 2. Our model: 3 different feedback levels 2.b. With or without sensing Our model “Easy” case M ≤ K devices always communicate and try to access the network, independently without centralized supervision, Background traffic is i.i.d.. Two variants : with or without sensing With sensing: Device first senses for presence of Primary Users that have strict 1 priority (background traffic), then use Ack to detect collisions. Without sensing: same background traffic, but cannot sense, so only Ack is used. 2 Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 6 / 45

  16. Rewards 1 uplink & Ack 1 iid with sensing information , collision for device : 1 alone on arm . combined binary reward but not from two Bernoulli! 2. Our model: 3 different feedback levels 2.c. Background traffic, and rewards Background traffic, and rewards i.i.d. background traffic K channels, modeled as Bernoulli ( 0/1 ) distributions of mean µ k = background traffic from Primary Users, bothering the dynamic devices, M devices, each uses channel A j ( t ) ∈ {1,..., K } at time t . Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 7 / 45

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend