allocating resources in the future
play

Allocating Resources, in the Future Sid Banerjee School of ORIE - PowerPoint PPT Presentation

Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making online resource allocation: basic model ... ... (1) (2) (3)


  1. Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making

  2. online resource allocation: basic model ... ... 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) B 1 =3 • single resource, initial capacity B ; T agents arrive sequentially • agent t has type θ ( t ) = reward earned if agent is allocated 1/18

  3. online resource allocation: basic model ... ... 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) B 2 =3 • single resource, initial capacity B ; T agents arrive sequentially • agent t has type θ ( t ) = reward earned if agent is allocated • principle makes irrevocable decisions 1/18

  4. online resource allocation: basic model ... ... 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) B 3 =2 • single resource, initial capacity B ; T agents arrive sequentially • agent t has type θ ( t ) = reward earned if agent is allocated • principle makes irrevocable decisions; resource is non-replenishable 1/18

  5. ... ... online resource allocation: basic model 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) B t =1 • single resource, initial capacity B ; T agents arrive sequentially • agent t has type θ ( t ) = reward earned if agent is allocated • principle makes irrevocable decisions; resource is non-replenishable • assumptions on agent types { θ t } i =1 (e.g. θ ( t ) = v i with prob p i i.i.d.) • finite set of values { v i } n • in general: arrivals can be time varying, correlated 1/18

  6. ... ... online resource allocation: basic model 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) B t =1 • single resource, initial capacity B ; T agents arrive sequentially • agent t has type θ ( t ) = reward earned if agent is allocated • principle makes irrevocable decisions; resource is non-replenishable • assumptions on agent types { θ t } i =1 (e.g. θ ( t ) = v i with prob p i i.i.d.) • finite set of values { v i } n • in general: arrivals can be time varying, correlated online resource allocation problem allocate resources to maximize sum of rewards 1/18

  7. online resource allocation: first generalization ... ... 𝜄 (1) 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) 𝜄 ~ (A i ,v i ) w.p. p i • d resources, initial capacities ( B 1 , B 2 , . . . , B d ) • T agents; each has type θ i = ( A i , v i ) • A i ∈ { 0 , 1 } d : resource requirement, v i : value • agent has type θ i with prob p i also known as: network revenue management; single-minded buyer 2/18

  8. online resource allocation: second generalization ... ... 𝜄 (1) 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) 𝜄 ~ (v i1 ,v i2 ) w.p. p i • d resources, initial capacities ( B 1 , B 2 , . . . , B d ) • T agents arrive sequentially • each has type θ = ( v i 1 , v i 2 , . . . , v id ), wants single resource also known as: online weighted matching; unit-demand buyer 3/18

  9. online allocation across fields • related problems studied in Markov decision processes, online algorithms, prophet inequalities, revenue management, etc. • informational variants: distributional knowledge ≺ bandit settings ≺ adversarial inputs 4/18

  10. the technological zeitgeist the ‘deep’ learning revolution vast improvements in machine learning for data-driven prediction 5/18

  11. axiomatizing the zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction • axiom: have access to black-box predictive algorithms 6/18

  12. axiomatizing the zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction • axiom: have access to black-box predictive algorithms core question of this talk how does having such an oracle affect online resource allocation? • TL;DR - new online allocation policies with strong regret bounds • re-examining old questions leads to surprising new insights 6/18

  13. bridging online allocation and predictive models The Bayesian Prophet: A Low-Regret Framework for Online Decision Making Alberto Vera & S.B. (2018) https://ssrn.com/abstract_id=3158062 7/18

  14. focus of talk: allocation with single-minded agents ... ... 𝜄 (1) 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) 𝜄 ~ (A i ,v i ) w.p. p i • d resources, initial capacities ( B 1 , B 2 , . . . , B d ) • T agents arrive sequentially; each has type θ = ( A , v ) • A = resource requirement, v = value • agent has type θ i with prob p i , i.i.d. online allocation problem allocate resources to maximize sum of rewards 8/18

  15. performance measure ... ... 𝜄 (1) 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) 𝜄 ~ (A i ,v i ) w.p. p i optimal policy can be computed via dynamic programming – requires exact distributional knowledge – ‘curse of dimensionality’: | state-space | = T × B 1 × . . . × B d – does not quantify cost of uncertainty 9/18

  16. performance measure ... ... 𝜄 (1) 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) 𝜄 ~ (A i ,v i ) w.p. p i optimal policy can be computed via dynamic programming – requires exact distributional knowledge – ‘curse of dimensionality’: | state-space | = T × B 1 × . . . × B d – does not quantify cost of uncertainty ‘prophet’ benchmark V off : OFFLINE optimal policy; has full knowledge of { θ 1 , θ 2 , . . . , θ T } 9/18

  17. performance measure: regret prophet benchmark: V off • OFFLINE knows entire type sequence { θ t | t = 1 . . . T } • for the network revenue management setting, V off given by n � max . x i v i i =1 n � s . t . A i x i ≤ B i =1 0 ≤ x i ≤ N i [1 : T ] – N i [1 : T ] ∼ # of arrivals of type θ i = ( A i , v i ) over { 1 , 2 , . . . , T } regret E [ Regret ] = E [ V off − V alg ] 10/18

  18. online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t , B , have statistical info about V off [ t , T ]) 11/18

  19. online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t , B , have statistical info about V off [ t , T ]) V off [ t , T ] decreases if OFFLINE accepts t th arrival � � • let π t = P 11/18

  20. online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t , B , have statistical info about V off [ t , T ]) V off [ t , T ] decreases if OFFLINE accepts t th arrival � � • let π t = P Bayes selector accept t th arrival iff π t > 0 . 5 11/18

  21. online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t , B , have statistical info about V off [ t , T ]) V off [ t , T ] decreases if OFFLINE accepts t th arrival � � • let π t = P Bayes selector accept t th arrival iff π t > 0 . 5 theorem [Vera & B, 2018] (under mild tail bounds on N i [ t : T ]) Bayes selector has E [ Regret ] independent of T , B 1 , B 2 , . . . , B d 11/18

  22. online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t , B , have statistical info about V off [ t , T ]) V off [ t , T ] decreases if OFFLINE accepts t th arrival � � • let π t = P Bayes selector accept t th arrival iff π t > 0 . 5 theorem [Vera & B, 2018] (under mild tail bounds on N i [ t : T ]) Bayes selector has E [ Regret ] independent of T , B 1 , B 2 , . . . , B d • arrivals can be time-varying, correlated; discounted rewards • works for general settings (single-minded, unit-demand, etc.) • can use approx oracle (e.g., from samples) 11/18

  23. standard approach: randomized admission control (RAC) offline optimum V off n � max . x i v i i =1 n � A i x i ≤ B s . t . i =1 0 ≤ x i ≤ N i [1 : T ] 12/18

  24. standard approach: randomized admission control (RAC) offline optimum V off (upfront) fluid LP V fl n n � � max . x i v i max . x i v i i =1 i =1 n n � � A i x i ≤ B s . t . s . t . A i x i ≤ B i =1 i =1 0 ≤ x i ≤ N i [1 : T ] 0 ≤ x i ≤ E [ N i [1 : T ]] = Tp i – E [ V off ] ≤ V fl (via Jensen’s, concavity of V off w.r.t. N i ) x i – fluid RAC: accept type θ i with prob Tp i 12/18

  25. standard approach: randomized admission control (RAC) offline optimum V off (upfront) fluid LP V fl n n � � max . x i v i max . x i v i i =1 i =1 n n � � A i x i ≤ B s . t . s . t . A i x i ≤ B i =1 i =1 0 ≤ x i ≤ N i [1 : T ] 0 ≤ x i ≤ E [ N i [1 : T ]] = Tp i – E [ V off ] ≤ V fl (via Jensen’s, concavity of V off w.r.t. N i ) x i – fluid RAC: accept type θ i with prob Tp i proposition √ fluid RAC has E [ Regret ] = Θ( T ) – [Gallego & van Ryzin’97], [Maglaras & Meissner’06] – N.B. this is a static policy! 12/18

  26. RAC with re-solving offline optimum V off re-solved fluid LP V fl ( t ): n n � � max . x i v i x i [ t ] v i max . i =1 i =1 n n � � A i x i ≤ B s . t . s . t . A i x i [ t ] ≤ B [ t ] i =1 i =1 0 ≤ x i ≤ N i 0 ≤ x i [ t ] ≤ E [ N i [ t : T ]] = ( T − t ) p i x i [ t ] AC with re-solving: at time t , accept type θ i with prob ( T − t ) p i 13/18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend