Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm - PowerPoint PPT Presentation

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint work with Émilie Kaufmann PhD Student Team SCEE, IETR, CentraleSupélec, Rennes & Team SequeL, CRIStAL, Inria, Lille ALT Conference – 08 - 04 - 2018

1. Introduction and motivation Maintain a good Quality of Service . Multi-Player Bandits Revisited Lilian Besson (CentraleSupélec & Inria) Devices can choose a difgerent radio channel at each time How? 1.a. Objective With no centralized control as it costs network overhead. Goal Insert them in a crowded wireless network . wireless access point. We control some communicating devices, they want to use a Motivation 2 / 30 With a protocol slotted in both time and frequency . ֒ → learn the best one with a sequential algorithm ! ALT Conference – 08 - 04 - 2018

2.a. Our communication model Our communication model It decides each time the channel it uses to send each packet . It can implement a simple decision algorithm . Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited 3 / 30 2. Our model: 3 difgerent feedback levels K radio channels ( e.g. , 10). Discrete and synchronized time t ≥ 1 . Dynamic device = dynamic radio reconfjguration ALT Conference – 08 - 04 - 2018

4 / 30 2.b. With or without sensing Multi-Player Bandits Revisited Lilian Besson (CentraleSupélec & Inria) Without sensing : same background traffjc, but cannot sense, so 2 detect collisions. With sensing : Device fjrst senses for presence of Primary Users 1 Two variants : with or without sensing Background traffjc is i.i.d. . network, independently without centralized supervision, “Easy” case Our model 2. Our model: 3 difgerent feedback levels M ≤ K devices always communicate and try to access the that have strict priority (background traffjc), then use Ack to only Ack is used. ALT Conference – 08 - 04 - 2018

5 / 30 with sensing information Background traffjc, and rewards i.i.d. background traffjc Multi-Player Bandits Revisited Lilian Besson (CentraleSupélec & Inria) dynamic devices, iid Rewards 2.c. Background traffjc, and rewards 2. Our model: 3 difgerent feedback levels K channels, modeled as Bernoulli ( 0 / 1 ) distributions of mean µ k = background traffjc from Primary Users , bothering the M devices, each uses channel A j ( t ) ∈ { 1 , . . . , K } at time t . r j ( t ) := Y A j ( t ) ,t × 1 ( C j ( t )) = 1 ( uplink & Ack ) ∀ k, Y k,t ∼ Bern( µ k ) ∈ { 0 , 1 } , C j ( t ) = 1 ( alone on arm A j ( t )) . collision for device j : → r j ( t ) combined binary reward but not from two Bernoulli! ֒ ALT Conference – 08 - 04 - 2018

But all consider the same instantaneous reward 1 6 / 30 Models licensed protocols (ex. ZigBee), our main focus. Multi-Player Bandits Revisited Lilian Besson (CentraleSupélec & Inria) . Unlicensed protocols (ex. LoRaWAN), harder to analyze ! , “No sensing”: observe only the combined 3 , 2.d. Difgerent feedback levels only if , then “Sensing”: fjrst observe 2 1 3 feedback levels 2. Our model: 3 difgerent feedback levels r j ( t ) := Y A j ( t ) ,t × 1 ( C j ( t )) “Full feedback”: observe both Y A j ( t ) ,t and C j ( t ) separately, ֒ → Not realistic enough, we don’t focus on it. ALT Conference – 08 - 04 - 2018

But all consider the same instantaneous reward 1 6 / 30 3 feedback levels Multi-Player Bandits Revisited 1 Lilian Besson (CentraleSupélec & Inria) . 2 Unlicensed protocols (ex. LoRaWAN), harder to analyze ! , 2.d. Difgerent feedback levels 3 “No sensing”: observe only the combined 2. Our model: 3 difgerent feedback levels r j ( t ) := Y A j ( t ) ,t × 1 ( C j ( t )) “Full feedback”: observe both Y A j ( t ) ,t and C j ( t ) separately, ֒ → Not realistic enough, we don’t focus on it. “Sensing”: fjrst observe Y A j ( t ) ,t , then C j ( t ) only if Y A j ( t ) ,t ̸ = 0 , ֒ → Models licensed protocols (ex. ZigBee), our main focus. ALT Conference – 08 - 04 - 2018

But all consider the same instantaneous reward 6 / 30 2.d. Difgerent feedback levels 3 feedback levels Multi-Player Bandits Revisited 1 Lilian Besson (CentraleSupélec & Inria) . 2 3 2. Our model: 3 difgerent feedback levels r j ( t ) := Y A j ( t ) ,t × 1 ( C j ( t )) “Full feedback”: observe both Y A j ( t ) ,t and C j ( t ) separately, ֒ → Not realistic enough, we don’t focus on it. “Sensing”: fjrst observe Y A j ( t ) ,t , then C j ( t ) only if Y A j ( t ) ,t ̸ = 0 , ֒ → Models licensed protocols (ex. ZigBee), our main focus. “No sensing”: observe only the combined Y A j ( t ) ,t × 1 ( C j ( t )) , ֒ → Unlicensed protocols (ex. LoRaWAN), harder to analyze ! ALT Conference – 08 - 04 - 2018

6 / 30 2 3 feedback levels Multi-Player Bandits Revisited 1 Lilian Besson (CentraleSupélec & Inria) 3 2.d. Difgerent feedback levels 2. Our model: 3 difgerent feedback levels r j ( t ) := Y A j ( t ) ,t × 1 ( C j ( t )) “Full feedback”: observe both Y A j ( t ) ,t and C j ( t ) separately, ֒ → Not realistic enough, we don’t focus on it. “Sensing”: fjrst observe Y A j ( t ) ,t , then C j ( t ) only if Y A j ( t ) ,t ̸ = 0 , ֒ → Models licensed protocols (ex. ZigBee), our main focus. “No sensing”: observe only the combined Y A j ( t ) ,t × 1 ( C j ( t )) , ֒ → Unlicensed protocols (ex. LoRaWAN), harder to analyze ! But all consider the same instantaneous reward r j ( t ) . ALT Conference – 08 - 04 - 2018

2.e. Goal Goal Goal Minimize packet loss ratio in a fjnite-space discrete-time Decision Making Problem . Solution ? Multi-Armed Bandit algorithms decentralized and used independently by each dynamic device. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited 7 / 30 2. Our model: 3 difgerent feedback levels ( = maximize nb of received Ack ) ALT Conference – 08 - 04 - 2018

8 / 30 2.f. Centralized regret Centralized regret A measure of success Not the network throughput or collision probability, We study the centralized (expected) regret : Multi-Player Bandits Revisited Lilian Besson (CentraleSupélec & Inria) Upper Bound on the regret, for one algorithm ! How good is my decentralized algorithm in this setting? Lower Bound on the regret, for any algorithm ! How good a decentralized algorithm can be in this setting? Two directions of analysis Ref: [Lai & Robbins, 1985], [Liu & Zhao, 2009], [Anandkumar et al, 2010] etc. 2. Our model: 3 difgerent feedback levels   ( M ) T M ∑ ∑ ∑  r j ( t )  . µ ∗ R T ( µ , M, ρ ) := T − E µ k t =1 j =1 k =1 Notation: µ ∗ k is the mean of the k -best arm ( k -th largest in µ ): µ ∗ 1 := max µ , µ ∗ 2 := max µ \ { µ ∗ 1 } , ALT Conference – 08 - 04 - 2018

8 / 30 2.f. Centralized regret Centralized regret A measure of success Not the network throughput or collision probability, We study the centralized (expected) regret: Multi-Player Bandits Revisited Lilian Besson (CentraleSupélec & Inria) How good is my decentralized algorithm in this setting? How good a decentralized algorithm can be in this setting? Two directions of analysis 2. Our model: 3 difgerent feedback levels   ( M ) T M ∑ ∑ ∑ r j ( t )  .  R T ( µ , M, ρ ) := µ ∗ T − E µ k k =1 t =1 j =1 ֒ → Lower Bound on the regret, for any algorithm ! ֒ → Upper Bound on the regret, for one algorithm ! ALT Conference – 08 - 04 - 2018

3. Lower bound Lower bound 1 2 Asymptotic lower bound on one term, 3 And for the regret. Lilian Besson (CentraleSupélec & Inria) Multi-Player Bandits Revisited 9 / 30 Decomposition of the regret in 3 terms, ALT Conference – 08 - 04 - 2018

3. Lower bound 3.a. Lower bound on the regret Multi-Player Bandits Revisited Lilian Besson (CentraleSupélec & Inria) Devices can use orthogonal channels ( number of collisions ). 3 them ( number of optimal non-selections ), Devices can quickly identify the best arms, and most surely play 2 them too much ( number of sub-optimal selections ), , and not play - Devices can quickly identify the bad arms 1 Small regret can be attained if… 10 / 30 Decomposition on the regret Decomposition For any algorithm, decentralized or not, we have ∑ ( µ ∗ R T ( µ , M, ρ ) = M − µ k ) E µ [ T k ( T )] k ∈ M - worst ∑ ∑ K ( µ k − µ ∗ + M ) ( T − E µ [ T k ( T )]) + µ k E µ [ C k ( T )] . k ∈ M - best k =1 Notations for an arm k ∈ { 1 , . . . , K } : k ( T ) := ∑ T T j t =1 1 ( A j ( t ) = k ) , counts selections by the player j ∈ { 1 , . . . , M } , T k ( T ) := ∑ M j =1 T j k ( T ) , counts selections by all M players, C k ( T ) := ∑ T t =1 1 ( ∃ j 1 ̸ = j 2 , A j 1 ( t ) = k = A j 2 ( t )) , counts collisions. ALT Conference – 08 - 04 - 2018

3. Lower bound 3.a. Lower bound on the regret Multi-Player Bandits Revisited Lilian Besson (CentraleSupélec & Inria) Devices can use orthogonal channels ( number of collisions ). 3 play them ( number of optimal non-selections ), Devices can quickly identify the best arms, and most surely 2 play them too much ( number of sub-optimal selections ), 1 Small regret can be attained if… 10 / 30 Decomposition on the regret Decomposition For any algorithm, decentralized or not, we have ∑ R T ( µ , M, ρ ) = ( µ ∗ M − µ k ) E µ [ T k ( T )] k ∈ M - worst ∑ ∑ K ( µ k − µ ∗ + M ) ( T − E µ [ T k ( T )]) + µ k E µ [ C k ( T )] . k ∈ M - best k =1 Devices can quickly identify the bad arms M - worst , and not ALT Conference – 08 - 04 - 2018

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm - PowerPoint PPT Presentation

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint work with milie Kaufmann PhD Student Team SCEE, IETR, CentraleSuplec, Rennes & Team SequeL, CRIStAL, Inria, Lille ALT Conference 08 -

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

ARTigo Tag Cluster tags of player 2 player 4 player 1 player 3 1 russian 1 army 1

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

The Player Agent The Player Agent Are they the most important league official right now? right

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff Outline

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Security Framework for Decentralized Shared Calendars Jagdish Prasad Achara Research Master of

Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters Pavel Dvurechensky,

Cellular DFC Design: Technological Perspectives Prateek Saxena Asst. Professor of Computer

Walkman: A Communication-Efficient Random-Walk Algorithm for Decentralized Optimization Xianghui

Decentralized Document Delivery Who am I? Were hiring!! What I do @ hypothes.is an

132243 Business & Social Responsibilities Advertising Ethics and Consumer Privacy 1

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/ Today's

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm - PowerPoint PPT Presentation

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint work with milie Kaufmann PhD Student Team SCEE, IETR, CentraleSuplec, Rennes & Team SequeL, CRIStAL, Inria, Lille ALT Conference 08 -

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

ARTigo Tag Cluster tags of player 2 player 4 player 1 player 3 1 russian 1 army 1

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

The Player Agent The Player Agent Are they the most important league official right now? right

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff Outline

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Security Framework for Decentralized Shared Calendars Jagdish Prasad Achara Research Master of

Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters Pavel Dvurechensky,

Cellular DFC Design: Technological Perspectives Prateek Saxena Asst. Professor of Computer

Walkman: A Communication-Efficient Random-Walk Algorithm for Decentralized Optimization Xianghui

Decentralized Document Delivery Who am I? Were hiring!! What I do @ hypothes.is an

132243 Business &amp; Social Responsibilities Advertising Ethics and Consumer Privacy 1

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/ Today's

132243 Business & Social Responsibilities Advertising Ethics and Consumer Privacy 1