Adaptive combinatorial allocation: How to use limited resources - PowerPoint PPT Presentation

Adaptive combinatorial allocation: How to use limited resources while learning what works Maximilian Kasy Alexander Teytelboym August 2020

Introduction Many policy problems have the following form: • Resources, agents, or locations need to be allocated to each other. • There are various feasibility constraints. • The returns of different options (combinations) are unknown. • The decision has to be made repeatedly. 1 / 16

Examples 1. Demographic composition of classrooms • Distribute students across classrooms, • to maximize test scores in the presence of (nonlinear) peer effects, • subject to overall demographic composition, classroom capacity. 2. Foster family placement • Allocate foster children to foster parents, • to maximize child outcomes, • subject to parent capacity, keeping siblings together, match feasibility. 3. Combinations of therapies • Allocate (multiple) therapies to patients, • respecting resource constraint, medical compatibility. 2 / 16

Sketch of setup • There are J options (e.g., matches) available to the policymaker. • Every period, the policymaker’s action is to choose at most M options. • Before the next period, the policymaker observes the outcomes of every chosen option (combinatorial semi-bandit setting). • The policymaker’s reward is the sum of the outcomes of the chosen options. • The policymaker’s objective is to maximize the cumulative expected rewards. • Equivalently, the policymaker’s objective is to minimize expected regret —the shortfall of cumulative expected rewards relative to the oracle optimum. 3 / 16

Overview of the results • In each example, the number of actions available to the policymaker is huge, e.g., � J � there are ways to choose M out of J possible options/matches. M • The policymaker’s decision problem is a computationally intractable dynamic stochastic optimization problem. • Our heuristic solution is Thompson sampling —in every period the policymaker chooses an action with the posterior probability that this action is optimal. • We derive a finite-sample, prior-independent bound on expected regret : √ surprisingly, per-unit regret only grows in J and does not grow in M . • We illustrate the performance of our bound with simulations . • Work in progress: Applications —experimental (MTurk) and observational (refugee resettlement). 4 / 16

Introduction Setup Performance guarantee Applications Simulations

Setup • Options j ∈ { 1 , . . . , J } . • Only sufficient resources to select M ≤ J options. • Feasible combinations of options: a ∈ A ⊆ { a ∈ { 0 , 1 } J : � a � 1 = M } . • Periods: t = 1 , . . . , T . • Vector of potential outcomes (i.i.d. across periods): Y t ∈ [0 , 1] J . • Average potential outcomes: Θ j = E [ Y jt | Θ] . • Prior belief over the vector Θ ∈ [0 , 1] J with arbitrary dependence across j . 5 / 16

Observability • After period t , we observe outcomes for all chosen options: Y t ( a ) = ( a j · Y jt : j = 1 , . . . , J ) . • Thus actions in period t can condition on the information ( A t ′ , Y t ′ ( A t ′ )) : 1 ≤ t ′ < t � � F t = . • These assumptions make our setting a “semi-bandit” problem: We observe more than just � j a j · Y jt , as we would in a bandit problem with actions a ! 6 / 16

Objective and regret • Reward for action a : � � a , Y t � = a j · Y jt . j • Expected reward: R ( a ) = E [ � a , Y t �| Θ] = � a , Θ � . • Optimal action: A ∗ ∈ argmax R ( a ) = argmax � a , Θ � . a ∈A a ∈A • Expected regret at T : � T � � ( R ( A ∗ ) − R ( A t )) E 1 . t =1 7 / 16

Thompson sampling • Take a random action a ∈ A , sampled according to the distribution P t ( A t = a ) = P t ( A ∗ t = a ) . • This assumption implies in particular that E t [ A t ] = E t [ A ∗ ] . • Introduced by Thompson (1933) for treatment assignment in adaptive experiments. 8 / 16

Regret bound Theorem Under the assumptions just stated, � T � � 1 � J � ( R ( A ∗ ) − R ( A t )) � � � E 1 ≤ 2 JTM · log + 1 . M t =1 Features of this bound : • It holds in finite samples, there is no remainder. • It does not depend on the prior distribution for Θ. • It allows for prior distributions with arbitrary statistical dependence across the components of Θ. • It implies that Thompson sampling achieves the efficient rate of convergence. 9 / 16

Regret bound Theorem Under the assumptions just stated, � T � � 1 � J � ( R ( A ∗ ) − R ( A t )) � � � E 1 ≤ 2 JTM · log + 1 . M t =1 Verbal description of this bound : • The worst case expected regret (per unit) across all possible priors goes to 0 at a rate of 1 over the square root of the sample size, T · M . √ • The bound grows, as a function of the number of possible options J , like J (ignoring the logarithmic term). • Worst case regret per unit does not grow in the batch size M , � J � despite the fact that action sets can be of size ! M 10 / 16

Key steps of the proof 1. Use Pinsker’s inequality to relate expected regret to the information about the optimal action A ∗ . Information is measured by the KL-distance of posteriors and priors. (This step draws on Russo and Van Roy (2016).) 2. Relate the KL-distance to the entropy reduction of the events A ∗ j = 1. The combination of these two arguments allows to bound the expected regret for option j in terms of the entropy reduction for the posterior of A ∗ j . (This step draws on Bubeck and Sellke (2020).) 3. The total reduction of entropy across the options j , and across the time periods t , can be no more than the sum of the prior entropy for each of the events A ∗ j = 1, which is � J � � � bounded by M · log + 1 . M 11 / 16

MTurk Matching Experiment: Proposed Design • Matching message senders to receivers based on types. • 4 types = { Indian, American } × { Female, Male } • 16 agents per batch, 4 of each type, for both senders and recipients. • Instruction to sender: In your message, please share advice on how to best reconcile online work with family obligations. In doing so, please reflect on your own past experiences. [. . . ] The person who will read your message is an Indian woman. • Instruction to receiver: Read the message and score on 13 dimensions (1–5), e.g.,: The experiences described in this message are different from what I usually experience. This message contained advice that is useful to me. The person who wrote this understands the difficulties I experience at work. 12 / 16

Simulations Estimated average outcomes True average outcomes Estimated average outcomes True average outcomes 4 4 3 3 3 3 V V V V 2 2 2 2 1 1 1 1 1 2 3 1 2 3 1 2 3 4 1 2 3 4 U U U U Regret across batches Regret across batches 0.6 0.6 0.4 0.4 Regret Regret 0.2 0.2 0.0 0.0 0 10 20 30 40 0 10 20 30 40 Period Period 13 / 16

Simulations Estimated average outcomes True average outcomes Estimated average outcomes True average outcomes 6 6 5 5 5 5 4 4 4 4 V V V V 3 3 3 3 2 2 2 2 1 1 1 1 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 6 U U U U Regret across batches Regret across batches 0.6 0.6 0.4 0.4 Regret Regret 0.2 0.2 0.0 0.0 0 10 20 30 40 0 10 20 30 40 Period Period 14 / 16

Simulations Estimated average outcomes True average outcomes Estimated average outcomes True average outcomes 7 7 8 8 7 7 6 6 6 6 5 5 5 5 V V V V 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 U U U U Regret across batches Regret across batches 0.6 0.6 0.4 0.4 Regret Regret 0.2 0.2 0.0 0.0 0 10 20 30 40 0 10 20 30 40 Period Period 15 / 16

Thank you! 16 / 16

Adaptive combinatorial allocation: How to use limited resources - PowerPoint PPT Presentation

Adaptive combinatorial allocation: How to use limited resources while learning what works Maximilian Kasy Alexander Teytelboym August 2020 Introduction Many policy problems have the following form: Resources, agents, or locations need to

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction to Combinatorial Algorithms Lucia Moura Fall 2015 Introduction to Combinatorial

Introduction to Combinatorial Algorithms Lucia Moura Winter 2018 Introduction to Combinatorial

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Introduction: Combinatorial Problems Combinatorial Problem Solving (CPS) Enric Rodr

More Register Allocation Last time Register allocation Global allocation via graph

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

CHAPTER IV IV CHAPTER Combinatorial Optimization Combinatorial Optimization by Neural Networks

Combinatorial Markov chains R. Gr ubel Leibniz Universit at Hannover AofA, Menorca 2013

20.1 Combinatorial Optimization next chapters: combinatorial optimization similar scenario,

Project Nexus Principle Workshop Project Nexus Principle Workshop ALLOCATION ALLOCATION 15

ADEPT Scalability Predictor in Support of Adaptive Resource Allocation IPDPS 2010 Outline

Security of Communications What Can Cryptography Guarantee? One ever wanted to exchange

Service Strategy NKFUST Nature of Service Competitive Environment Competitive

Earnings Conference Call Third Quarter 2013 November 1, 2013 Cautionary Statements And Risk

Selling to Cournot oligopolists: pricing under uncertainty & generalized mean residual life

Comprehending Compensation John P. Griffin, J.D., LL.M., Principal, ASC Institute, LLC John

Investor Presentation: MLPA Conference 2016 Safe Harbor Disclosure Statement Statements made

Reasons for joint ventures Founders getting together Founders adding new skills to team

COVID-19 Community Partner Calls Welcome! Thank you for joining us today 2 Who is this call

Adaptive combinatorial allocation: How to use limited resources - PowerPoint PPT Presentation

Adaptive combinatorial allocation: How to use limited resources while learning what works Maximilian Kasy Alexander Teytelboym August 2020 Introduction Many policy problems have the following form: Resources, agents, or locations need to

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction to Combinatorial Algorithms Lucia Moura Fall 2015 Introduction to Combinatorial

Introduction to Combinatorial Algorithms Lucia Moura Winter 2018 Introduction to Combinatorial

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Introduction: Combinatorial Problems Combinatorial Problem Solving (CPS) Enric Rodr

More Register Allocation Last time Register allocation Global allocation via graph

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

CHAPTER IV IV CHAPTER Combinatorial Optimization Combinatorial Optimization by Neural Networks

Combinatorial Markov chains R. Gr ubel Leibniz Universit at Hannover AofA, Menorca 2013

20.1 Combinatorial Optimization next chapters: combinatorial optimization similar scenario,

Project Nexus Principle Workshop Project Nexus Principle Workshop ALLOCATION ALLOCATION 15

ADEPT Scalability Predictor in Support of Adaptive Resource Allocation IPDPS 2010 Outline

Security of Communications What Can Cryptography Guarantee? One ever wanted to exchange

Service Strategy NKFUST Nature of Service Competitive Environment Competitive

Earnings Conference Call Third Quarter 2013 November 1, 2013 Cautionary Statements And Risk

Selling to Cournot oligopolists: pricing under uncertainty &amp; generalized mean residual life

Comprehending Compensation John P. Griffin, J.D., LL.M., Principal, ASC Institute, LLC John

Investor Presentation: MLPA Conference 2016 Safe Harbor Disclosure Statement Statements made

Reasons for joint ventures Founders getting together Founders adding new skills to team

COVID-19 Community Partner Calls Welcome! Thank you for joining us today 2 Who is this call

Selling to Cournot oligopolists: pricing under uncertainty & generalized mean residual life