Outline Restless Bandits 1 Overview Problem Description - PowerPoint PPT Presentation

Whittle’s index for Markovian bandits A UNIFYING COMPUTATION OF W HITTLE ’ S INDEX FOR M ARKOVIAN BANDITS Manu K. Gupta 2 Joint work with U. Ayesta 1 , 2 & I.M. Verloop 1 , 2 1 Centre National de la Recherche Scientifique (CNRS), 2 Institut de Recherche en Informatique de Toulouse (IRIT), Toulouse Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 1 / 47

Whittle’s index for Markovian bandits Outline Restless Bandits 1 Overview Problem Description Decomposition Applications 2 Machine Repairman Problem Content Delivery Problem Congestion Control Problem Summary and Future Directions 3 Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 2 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Background and overview A particular case of constrained Markov Decision Process. Stochastic resource allocation problem. Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 3 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Background and overview A particular case of constrained Markov Decision Process. Stochastic resource allocation problem. A generalization of multi-armed bandit problem (MABP). Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 3 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Background and overview A particular case of constrained Markov Decision Process. Stochastic resource allocation problem. A generalization of multi-armed bandit problem (MABP). Powerful modeling technique for diverse applications: Routing in clusters (Ni˜ no-Mora, 2012a), sensor scheduling (Ni˜ no-Mora and Villar, 2011). Machine repairman problem (Glazebrook et al., 2005), content delivery problem (Larra˜ naga et al., 2015) Minimum job loss routing (Ni˜ no-Mora, 2012b), inventory routing (Archibald et al., 2009), processor sharing queues (Borkar and Pattathil, 2017), congestion control in TCP (Avrachenkov et al., 2013) etc. Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 3 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Background and overview A particular case of constrained Markov Decision Process. Stochastic resource allocation problem. A generalization of multi-armed bandit problem (MABP). Powerful modeling technique for diverse applications: Routing in clusters (Ni˜ no-Mora, 2012a), sensor scheduling (Ni˜ no-Mora and Villar, 2011). Machine repairman problem (Glazebrook et al., 2005), content delivery problem (Larra˜ naga et al., 2015) Minimum job loss routing (Ni˜ no-Mora, 2012b), inventory routing (Archibald et al., 2009), processor sharing queues (Borkar and Pattathil, 2017), congestion control in TCP (Avrachenkov et al., 2013) etc. Major challenges Establishing indexability and computations of Whittle’s index. Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 3 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Multi-armed bandit problem (MABP) A particular case of MDP. At each decision epoch, scheduler selects one bandit . Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 4 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Multi-armed bandit problem (MABP) A particular case of MDP. At each decision epoch, scheduler selects one bandit . Selected bandit evolves stochastically , while the remaining bandits are frozen . Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 4 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Multi-armed bandit problem (MABP) A particular case of MDP. At each decision epoch, scheduler selects one bandit . Selected bandit evolves stochastically , while the remaining bandits are frozen . States, rewards and transition probabilities are known. Objective is to maximize the total average reward. Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 4 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Multi-armed bandit problem (MABP) A particular case of MDP. At each decision epoch, scheduler selects one bandit . Selected bandit evolves stochastically , while the remaining bandits are frozen . States, rewards and transition probabilities are known. Objective is to maximize the total average reward. In general, optimal policy depends on all the input parameters. Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 4 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Multi-armed bandit problem (MABP) A particular case of MDP. At each decision epoch, scheduler selects one bandit . Selected bandit evolves stochastically , while the remaining bandits are frozen . States, rewards and transition probabilities are known. Objective is to maximize the total average reward. In general, optimal policy depends on all the input parameters. Gittin’s index For MABP, optimal policy is an index rule (Gittins et al., 2011). For example, c µ rule in multi-class queues. Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 4 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Restless Bandit Problem (RBP) RBP is a generalization of MABP. Any number of bandits (more than 1) can be made active. All bandits might evolve stochastically . Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 5 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Restless Bandit Problem (RBP) RBP is a generalization of MABP. Any number of bandits (more than 1) can be made active. All bandits might evolve stochastically . Objective is to optimize the average performance criterion. Computing optimal policy is typically out of reach. Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 5 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Restless Bandit Problem (RBP) RBP is a generalization of MABP. Any number of bandits (more than 1) can be made active. All bandits might evolve stochastically . Objective is to optimize the average performance criterion. Computing optimal policy is typically out of reach. RBPs are PSPACE-complete (Papadimitriou and Tsitsiklis, 1999). Much more convincing evidence of intractability than NP-hardness. Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 5 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Restless Bandit Problem (RBP) RBP is a generalization of MABP. Any number of bandits (more than 1) can be made active. All bandits might evolve stochastically . Objective is to optimize the average performance criterion. Computing optimal policy is typically out of reach. RBPs are PSPACE-complete (Papadimitriou and Tsitsiklis, 1999). Much more convincing evidence of intractability than NP-hardness. Whittle’s relaxation (Whittle, 1988) Restriction on number of active bandits to be respected on average only. Optimal solution to the relaxed problem is of index type. The Whittle’s index recovers Gittin’s index for non-restless case. Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 5 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Whittle’s index policy A heuristic for the original problem. A bandit with the highest Whittle’s index is made active. Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 6 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Whittle’s index policy A heuristic for the original problem. A bandit with the highest Whittle’s index is made active. Whittle’s index policy performs strikingly well (Ni˜ no-Mora, 2007). Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 6 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Whittle’s index policy A heuristic for the original problem. A bandit with the highest Whittle’s index is made active. Whittle’s index policy performs strikingly well (Ni˜ no-Mora, 2007). Asymptotically optimal under certain conditions (Weber and Weiss, 1990, 1991). A generalization to several classes of bandits, arrivals of new bandits and multiple actions (Verloop, 2016). Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 6 / 47

Whittle’s index for Markovian bandits Restless Bandits Overview Whittle’s index policy A heuristic for the original problem. A bandit with the highest Whittle’s index is made active. Whittle’s index policy performs strikingly well (Ni˜ no-Mora, 2007). Asymptotically optimal under certain conditions (Weber and Weiss, 1990, 1991). A generalization to several classes of bandits, arrivals of new bandits and multiple actions (Verloop, 2016). Results A unifying framework for obtaining Whittle’s index. Retrieve many available Whittle’s indices in literature including machine repairman problem, content delivery problem etc. Manu K. Gupta (IRIT, Toulouse) Whittle’s index for Markovian bandits 6 / 47

Outline Restless Bandits 1 Overview Problem Description - PowerPoint PPT Presentation

Whittles index for Markovian bandits A UNIFYING COMPUTATION OF W HITTLE S INDEX FOR M ARKOVIAN BANDITS Manu K. Gupta 2 Joint work with U. Ayesta 1 , 2 & I.M. Verloop 1 , 2 1 Centre National de la Recherche Scientifique (CNRS), 2 Institut

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Dropping on the Edge: Flexibility and Dropping on the Edge: Flexibility and Trac Conrmation

LSH-Based Probabilistic Pruning of Inverted Indices for Sets and Ranked Lists Koninika Pal and

Database System Architecture Index Structures Hector Garcia-Molina Stijn Vansummeren Index

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

AICPA Business and Industry Economic Outlook Survey Detailed Survey Results: 2Q 2020 Management

MECT Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell

Learning from Irregularly-Sampled Time Series A Missing Data Perspective Steven Cheng-Xian Li

Review: Case where index is useful CS5208: Query Optimization 2 1 Query Optimization Since