Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm - PowerPoint PPT Presentation

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint work with Émilie Kaufmann PhD Student Team SCEE, IETR, CentraleSupélec, Rennes & Team SequeL, CRIStAL, Inria, Lille CMAP Seminar – 31 st October 2018

Insert them in a crowded wireless network. With a protocol slotted in both time and frequency. Goal Maintain a good Quality of Service. With no centralized control as it costs network overhead. How? Devices can choose a different radio channel at each time learn the best one with a sequential algorithm! 1. Introduction and motivation 1.a. Objective Motivation We control some communicating devices, they want to use a wireless access point. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 2 / 45

Goal Maintain a good Quality of Service. With no centralized control as it costs network overhead. How? Devices can choose a different radio channel at each time learn the best one with a sequential algorithm! 1. Introduction and motivation 1.a. Objective Motivation We control some communicating devices, they want to use a wireless access point. Insert them in a crowded wireless network. With a protocol slotted in both time and frequency. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 2 / 45

How? Devices can choose a different radio channel at each time learn the best one with a sequential algorithm! 1. Introduction and motivation 1.a. Objective Motivation We control some communicating devices, they want to use a wireless access point. Insert them in a crowded wireless network. With a protocol slotted in both time and frequency. Goal Maintain a good Quality of Service. With no centralized control as it costs network overhead. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 2 / 45

1. Introduction and motivation 1.a. Objective Motivation We control some communicating devices, they want to use a wireless access point. Insert them in a crowded wireless network. With a protocol slotted in both time and frequency. Goal Maintain a good Quality of Service. With no centralized control as it costs network overhead. How? Devices can choose a different radio channel at each time � → learn the best one with a sequential algorithm! Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 2 / 45

and reference Quick reminder on single-player MAB algorithms 4 New multi-player non-coordinated decentralized algorithms 5 Our upper bound on regret for 6 Experimental results 7 Review of two more recent articles 8 Conclusion 9 Based on “Multi-Player Bandits Revisited”, by Lilian Besson & Émilie Kaufmann. arXiv:1711.02317 , presented at ALT 2018 (Lanzarote, Spain) in April. 1. Introduction and motivation 1.b. Outline and references Outline Introduction 1 Our model: 3 different feedback levels 2 Regret of the system, and our lower bound on regret 3 Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 3 / 45

and reference Experimental results 7 Review of two more recent articles 8 Conclusion 9 Based on “Multi-Player Bandits Revisited”, by Lilian Besson & Émilie Kaufmann. arXiv:1711.02317 , presented at ALT 2018 (Lanzarote, Spain) in April. 1. Introduction and motivation 1.b. Outline and references Outline Introduction 1 Our model: 3 different feedback levels 2 Regret of the system, and our lower bound on regret 3 Quick reminder on single-player MAB algorithms 4 New multi-player non-coordinated decentralized algorithms 5 Our upper bound on regret for MCTopM 6 Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 3 / 45

and reference Conclusion 9 Based on “Multi-Player Bandits Revisited”, by Lilian Besson & Émilie Kaufmann. arXiv:1711.02317 , presented at ALT 2018 (Lanzarote, Spain) in April. 1. Introduction and motivation 1.b. Outline and references Outline Introduction 1 Our model: 3 different feedback levels 2 Regret of the system, and our lower bound on regret 3 Quick reminder on single-player MAB algorithms 4 New multi-player non-coordinated decentralized algorithms 5 Our upper bound on regret for MCTopM 6 Experimental results 7 Review of two more recent articles 8 Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 3 / 45

and reference Based on “Multi-Player Bandits Revisited”, by Lilian Besson & Émilie Kaufmann. arXiv:1711.02317 , presented at ALT 2018 (Lanzarote, Spain) in April. 1. Introduction and motivation 1.b. Outline and references Outline Introduction 1 Our model: 3 different feedback levels 2 Regret of the system, and our lower bound on regret 3 Quick reminder on single-player MAB algorithms 4 New multi-player non-coordinated decentralized algorithms 5 Our upper bound on regret for MCTopM 6 Experimental results 7 Review of two more recent articles 8 Conclusion 9 Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 3 / 45

1. Introduction and motivation 1.b. Outline and references Outline and reference Introduction 1 Our model: 3 different feedback levels 2 Regret of the system, and our lower bound on regret 3 Quick reminder on single-player MAB algorithms 4 New multi-player non-coordinated decentralized algorithms 5 Our upper bound on regret for MCTopM 6 Experimental results 7 Review of two more recent articles 8 Conclusion 9 Based on “Multi-Player Bandits Revisited”, by Lilian Besson & Émilie Kaufmann. arXiv:1711.02317 , presented at ALT 2018 (Lanzarote, Spain) in April. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 3 / 45

2. Our model: 3 different feedback levels Our model Our communication model 1 With or without sensing 2 Background traffic, and rewards 3 Different feedback levels 4 Goal 5 Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 4 / 45

Dynamic device dynamic radio reconfiguration It decides each time the channel it uses to send each packet. It can implement a simple decision algorithm. 2. Our model: 3 different feedback levels 2.a. Our communication model Our communication model K radio channels (e.g., 10). Discrete and synchronized time t ≥ 1 . Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 5 / 45

2. Our model: 3 different feedback levels 2.a. Our communication model Our communication model K radio channels (e.g., 10). Discrete and synchronized time t ≥ 1 . Dynamic device = dynamic radio reconfiguration It decides each time the channel it uses to send each packet. It can implement a simple decision algorithm. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 5 / 45

Two variants : with or without sensing With sensing: Device first senses for presence of Primary Users that have strict 1 priority (background traffic), then use Ack to detect collisions. Without sensing: same background traffic, but cannot sense, so only Ack is used. 2 2. Our model: 3 different feedback levels 2.b. With or without sensing Our model “Easy” case M ≤ K devices always communicate and try to access the network, independently without centralized supervision, Background traffic is i.i.d.. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 6 / 45

2. Our model: 3 different feedback levels 2.b. With or without sensing Our model “Easy” case M ≤ K devices always communicate and try to access the network, independently without centralized supervision, Background traffic is i.i.d.. Two variants : with or without sensing With sensing: Device first senses for presence of Primary Users that have strict 1 priority (background traffic), then use Ack to detect collisions. Without sensing: same background traffic, but cannot sense, so only Ack is used. 2 Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 6 / 45

Rewards 1 uplink & Ack 1 iid with sensing information , collision for device : 1 alone on arm . combined binary reward but not from two Bernoulli! 2. Our model: 3 different feedback levels 2.c. Background traffic, and rewards Background traffic, and rewards i.i.d. background traffic K channels, modeled as Bernoulli ( 0/1 ) distributions of mean µ k = background traffic from Primary Users, bothering the dynamic devices, M devices, each uses channel A j ( t ) ∈ {1,..., K } at time t . Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited CMAP Seminar – 31 Oct 2018 7 / 45

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm - PowerPoint PPT Presentation

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint work with milie Kaufmann PhD Student Team SCEE, IETR, CentraleSuplec, Rennes & Team SequeL, CRIStAL, Inria, Lille CMAP Seminar 31 st

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

ARTigo Tag Cluster tags of player 2 player 4 player 1 player 3 1 russian 1 army 1

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

The Player Agent The Player Agent Are they the most important league official right now? right

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff Outline

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Security Framework for Decentralized Shared Calendars Jagdish Prasad Achara Research Master of

Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters Pavel Dvurechensky,

Cellular DFC Design: Technological Perspectives Prateek Saxena Asst. Professor of Computer

Coexistence or Downfall of Bitcoin Cash? Yujin Kwon* , Hyoungshick Kim , Jinwoo Shin*, Yongdae

Walkman: A Communication-Efficient Random-Walk Algorithm for Decentralized Optimization Xianghui

Decentralized Document Delivery Who am I? Were hiring!! What I do @ hypothes.is an

132243 Business & Social Responsibilities Advertising Ethics and Consumer Privacy 1

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm - PowerPoint PPT Presentation

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint work with milie Kaufmann PhD Student Team SCEE, IETR, CentraleSuplec, Rennes & Team SequeL, CRIStAL, Inria, Lille CMAP Seminar 31 st

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

ARTigo Tag Cluster tags of player 2 player 4 player 1 player 3 1 russian 1 army 1

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

The Player Agent The Player Agent Are they the most important league official right now? right

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff Outline

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Security Framework for Decentralized Shared Calendars Jagdish Prasad Achara Research Master of

Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters Pavel Dvurechensky,

Cellular DFC Design: Technological Perspectives Prateek Saxena Asst. Professor of Computer

Coexistence or Downfall of Bitcoin Cash? Yujin Kwon* , Hyoungshick Kim , Jinwoo Shin*, Yongdae

Walkman: A Communication-Efficient Random-Walk Algorithm for Decentralized Optimization Xianghui

Decentralized Document Delivery Who am I? Were hiring!! What I do @ hypothes.is an

132243 Business &amp; Social Responsibilities Advertising Ethics and Consumer Privacy 1

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/

132243 Business & Social Responsibilities Advertising Ethics and Consumer Privacy 1