Allocating Resources, in the Future Sid Banerjee School of ORIE - - PowerPoint PPT Presentation

allocating resources in the future
SMART_READER_LITE
LIVE PREVIEW

Allocating Resources, in the Future Sid Banerjee School of ORIE - - PowerPoint PPT Presentation

Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making online resource allocation: basic model ... ... (1) (2) (3)


slide-1
SLIDE 1

Allocating Resources, in the Future

Sid Banerjee School of ORIE May 3, 2018

Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making

slide-2
SLIDE 2
  • nline resource allocation: basic model

𝜄(1) 𝜄(2) 𝜄(t) 𝜄(T)

... ...

𝜄(3) B1=3

  • single resource, initial capacity B; T agents arrive sequentially
  • agent t has type θ(t) = reward earned if agent is allocated

1/18

slide-3
SLIDE 3
  • nline resource allocation: basic model

𝜄(1) 𝜄(2) 𝜄(t) 𝜄(T)

... ...

𝜄(3) B2=3

  • single resource, initial capacity B; T agents arrive sequentially
  • agent t has type θ(t) = reward earned if agent is allocated
  • principle makes irrevocable decisions

1/18

slide-4
SLIDE 4
  • nline resource allocation: basic model

𝜄(1) 𝜄(2) 𝜄(t) 𝜄(T)

... ...

𝜄(3) B3=2

  • single resource, initial capacity B; T agents arrive sequentially
  • agent t has type θ(t) = reward earned if agent is allocated
  • principle makes irrevocable decisions; resource is non-replenishable

1/18

slide-5
SLIDE 5
  • nline resource allocation: basic model

𝜄(1) 𝜄(2) 𝜄(t) 𝜄(T)

... ...

𝜄(3) Bt=1

  • single resource, initial capacity B; T agents arrive sequentially
  • agent t has type θ(t) = reward earned if agent is allocated
  • principle makes irrevocable decisions; resource is non-replenishable
  • assumptions on agent types {θt}
  • finite set of values {vi}n

i=1 (e.g. θ(t) = vi with prob pi i.i.d.)

  • in general: arrivals can be time varying, correlated

1/18

slide-6
SLIDE 6
  • nline resource allocation: basic model

𝜄(1) 𝜄(2) 𝜄(t) 𝜄(T)

... ...

𝜄(3) Bt=1

  • single resource, initial capacity B; T agents arrive sequentially
  • agent t has type θ(t) = reward earned if agent is allocated
  • principle makes irrevocable decisions; resource is non-replenishable
  • assumptions on agent types {θt}
  • finite set of values {vi}n

i=1 (e.g. θ(t) = vi with prob pi i.i.d.)

  • in general: arrivals can be time varying, correlated
  • nline resource allocation problem

allocate resources to maximize sum of rewards

1/18

slide-7
SLIDE 7
  • nline resource allocation: first generalization

𝜄(1) 𝜄(2) 𝜄(t) 𝜄(T)

... ...

𝜄(3) 𝜄 ~ (Ai,vi) w.p. pi 𝜄(1)

  • d resources, initial capacities (B1, B2, . . . , Bd)
  • T agents; each has type θi = (Ai, vi)
  • Ai ∈ {0, 1}d : resource requirement, vi : value
  • agent has type θi with prob pi

also known as: network revenue management; single-minded buyer

2/18

slide-8
SLIDE 8
  • nline resource allocation: second generalization

𝜄(1) 𝜄(2) 𝜄(t) 𝜄(T)

... ...

𝜄(3) 𝜄 ~ (vi1 ,vi2) w.p. pi 𝜄(1)

  • d resources, initial capacities (B1, B2, . . . , Bd)
  • T agents arrive sequentially
  • each has type θ = (vi1, vi2, . . . , vid), wants single resource

also known as: online weighted matching; unit-demand buyer

3/18

slide-9
SLIDE 9
  • nline allocation across fields
  • related problems studied in Markov decision processes, online

algorithms, prophet inequalities, revenue management, etc.

  • informational variants:

distributional knowledge ≺ bandit settings ≺ adversarial inputs

4/18

slide-10
SLIDE 10

the technological zeitgeist

the ‘deep’ learning revolution vast improvements in machine learning for data-driven prediction

5/18

slide-11
SLIDE 11

axiomatizing the zeitgeist

the deep learning revolution vast improvements in machine learning for data-driven prediction

  • axiom: have access to black-box predictive algorithms

6/18

slide-12
SLIDE 12

axiomatizing the zeitgeist

the deep learning revolution vast improvements in machine learning for data-driven prediction

  • axiom: have access to black-box predictive algorithms

core question of this talk how does having such an oracle affect online resource allocation?

  • TL;DR - new online allocation policies with strong regret bounds
  • re-examining old questions leads to surprising new insights

6/18

slide-13
SLIDE 13

bridging online allocation and predictive models

The Bayesian Prophet: A Low-Regret Framework for Online Decision Making Alberto Vera & S.B. (2018) https://ssrn.com/abstract_id=3158062

7/18

slide-14
SLIDE 14

focus of talk: allocation with single-minded agents

𝜄(1) 𝜄(2) 𝜄(t) 𝜄(T)

... ...

𝜄(3) 𝜄 ~ (Ai,vi) w.p. pi 𝜄(1)

  • d resources, initial capacities (B1, B2, . . . , Bd)
  • T agents arrive sequentially; each has type θ = (A, v)
  • A = resource requirement, v = value
  • agent has type θi with prob pi, i.i.d.
  • nline allocation problem

allocate resources to maximize sum of rewards

8/18

slide-15
SLIDE 15

performance measure

𝜄(1) 𝜄(2) 𝜄(t) 𝜄(T)

... ...

𝜄(3) 𝜄 ~ (Ai,vi) w.p. pi 𝜄(1)

  • ptimal policy

can be computed via dynamic programming – requires exact distributional knowledge – ‘curse of dimensionality’: |state-space| = T × B1 × . . . × Bd – does not quantify cost of uncertainty

9/18

slide-16
SLIDE 16

performance measure

𝜄(1) 𝜄(2) 𝜄(t) 𝜄(T)

... ...

𝜄(3) 𝜄 ~ (Ai,vi) w.p. pi 𝜄(1)

  • ptimal policy

can be computed via dynamic programming – requires exact distributional knowledge – ‘curse of dimensionality’: |state-space| = T × B1 × . . . × Bd – does not quantify cost of uncertainty ‘prophet’ benchmark V off : OFFLINE optimal policy; has full knowledge of {θ1, θ2, . . . , θT}

9/18

slide-17
SLIDE 17

performance measure: regret

prophet benchmark: V off

  • OFFLINE knows entire type sequence {θt|t = 1 . . . T}
  • for the network revenue management setting, V off given by

max.

n

  • i=1

xivi s.t.

n

  • i=1

Aixi ≤ B 0 ≤ xi ≤ Ni[1 : T] – Ni[1 : T] ∼ # of arrivals of type θi = (Ai, vi) over {1, 2, . . . , T} regret E[Regret] = E[V off − V alg]

10/18

slide-18
SLIDE 18
  • nline allocation with prediction oracle

given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T])

11/18

slide-19
SLIDE 19
  • nline allocation with prediction oracle

given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T])

  • let πt = P
  • V off [t, T] decreases if OFFLINE accepts tth arrival
  • 11/18
slide-20
SLIDE 20
  • nline allocation with prediction oracle

given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T])

  • let πt = P
  • V off [t, T] decreases if OFFLINE accepts tth arrival
  • Bayes selector

accept tth arrival iff πt > 0.5

11/18

slide-21
SLIDE 21
  • nline allocation with prediction oracle

given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T])

  • let πt = P
  • V off [t, T] decreases if OFFLINE accepts tth arrival
  • Bayes selector

accept tth arrival iff πt > 0.5 theorem [Vera & B, 2018] (under mild tail bounds on Ni[t : T]) Bayes selector has E[Regret] independent of T, B1, B2, . . . , Bd

11/18

slide-22
SLIDE 22
  • nline allocation with prediction oracle

given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T])

  • let πt = P
  • V off [t, T] decreases if OFFLINE accepts tth arrival
  • Bayes selector

accept tth arrival iff πt > 0.5 theorem [Vera & B, 2018] (under mild tail bounds on Ni[t : T]) Bayes selector has E[Regret] independent of T, B1, B2, . . . , Bd

  • arrivals can be time-varying, correlated; discounted rewards
  • works for general settings (single-minded, unit-demand, etc.)
  • can use approx oracle (e.g., from samples)

11/18

slide-23
SLIDE 23

standard approach: randomized admission control (RAC)

  • ffline optimum V off

max.

n

  • i=1

xivi s.t.

n

  • i=1

Aixi ≤ B 0 ≤ xi ≤ Ni[1 : T]

12/18

slide-24
SLIDE 24

standard approach: randomized admission control (RAC)

  • ffline optimum V off

max.

n

  • i=1

xivi s.t.

n

  • i=1

Aixi ≤ B 0 ≤ xi ≤ Ni[1 : T] (upfront) fluid LP V fl max.

n

  • i=1

xivi s.t.

n

  • i=1

Aixi ≤ B 0 ≤ xi ≤ E[Ni[1 : T]] = Tpi – E[V off ] ≤ V fl (via Jensen’s, concavity of V off w.r.t. Ni) – fluid RAC: accept type θi with prob

xi Tpi 12/18

slide-25
SLIDE 25

standard approach: randomized admission control (RAC)

  • ffline optimum V off

max.

n

  • i=1

xivi s.t.

n

  • i=1

Aixi ≤ B 0 ≤ xi ≤ Ni[1 : T] (upfront) fluid LP V fl max.

n

  • i=1

xivi s.t.

n

  • i=1

Aixi ≤ B 0 ≤ xi ≤ E[Ni[1 : T]] = Tpi – E[V off ] ≤ V fl (via Jensen’s, concavity of V off w.r.t. Ni) – fluid RAC: accept type θi with prob

xi Tpi

proposition fluid RAC has E[Regret] = Θ( √ T) – [Gallego & van Ryzin’97], [Maglaras & Meissner’06] – N.B. this is a static policy!

12/18

slide-26
SLIDE 26

RAC with re-solving

  • ffline optimum V off

max.

n

  • i=1

xivi s.t.

n

  • i=1

Aixi ≤ B 0 ≤ xi ≤ Ni re-solved fluid LP V fl(t): max.

n

  • i=1

xi[t]vi s.t.

n

  • i=1

Aixi[t] ≤ B[t] 0 ≤ xi[t] ≤ E[Ni[t : T]] = (T − t)pi AC with re-solving: at time t, accept type θi with prob

xi[t] (T−t)pi 13/18

slide-27
SLIDE 27

RAC with re-solving

  • ffline optimum V off

max.

n

  • i=1

xivi s.t.

n

  • i=1

Aixi ≤ B 0 ≤ xi ≤ Ni re-solved fluid LP V fl(t): max.

n

  • i=1

xi[t]vi s.t.

n

  • i=1

Aixi[t] ≤ B[t] 0 ≤ xi[t] ≤ E[Ni[t : T]] = (T − t)pi AC with re-solving: at time t, accept type θi with prob

xi[t] (T−t)pi

– regret improves to o( √ T) [Reiman & Wang’08] – O(1) regret under (dual) non-degeneracy [Jasin & Kumar’12]

13/18

slide-28
SLIDE 28

RAC with re-solving

  • ffline optimum V off

max.

n

  • i=1

xivi s.t.

n

  • i=1

Aixi ≤ B 0 ≤ xi ≤ Ni re-solved fluid LP V fl(t): max.

n

  • i=1

xi[t]vi s.t.

n

  • i=1

Aixi[t] ≤ B[t] 0 ≤ xi[t] ≤ E[Ni[t : T]] = (T − t)pi AC with re-solving: at time t, accept type θi with prob

xi[t] (T−t)pi

– regret improves to o( √ T) [Reiman & Wang’08] – O(1) regret under (dual) non-degeneracy [Jasin & Kumar’12] – most results use V fl as benchmark (including ‘prophet inequality’) proposition [Vera & B’18] for degenerate instances, V fl − E[V off ] = Ω( √ T)

13/18

slide-29
SLIDE 29

Bayes selector for i.i.d arrivals

Bayes selector πt = P

  • V off [t, T] decreases if OFFLINE accepts tth arrival
  • – accept tth arrival iff πt > 0.5

14/18

slide-30
SLIDE 30

Bayes selector for i.i.d arrivals

Bayes selector πt = P

  • V off [t, T] decreases if OFFLINE accepts tth arrival
  • – accept tth arrival iff πt > 0.5

re-solved fluid LP max.

n

  • i=1

xi[t]vi s.t. Ax[t] ≤ B[t], 0 ≤ xi[t] ≤ E[Ni[t : T]]

14/18

slide-31
SLIDE 31

Bayes selector for i.i.d arrivals

Bayes selector πt = P

  • V off [t, T] decreases if OFFLINE accepts tth arrival
  • – accept tth arrival iff πt > 0.5

re-solved fluid LP max.

n

  • i=1

xi[t]vi s.t. Ax[t] ≤ B[t], 0 ≤ xi[t] ≤ E[Ni[t : T]] a the re-solved LP gives an approximate admission oracle fluid Bayes selector accept type θi iff

xi[t] E[Ni[t:T]] > 0.5 14/18

slide-32
SLIDE 32

Bayes selector for i.i.d arrivals

Bayes selector πt = P

  • V off [t, T] decreases if OFFLINE accepts tth arrival
  • – accept tth arrival iff πt > 0.5

re-solved fluid LP max.

n

  • i=1

xi[t]vi s.t. Ax[t] ≤ B[t], 0 ≤ xi[t] ≤ E[Ni[t : T]] a the re-solved LP gives an approximate admission oracle fluid Bayes selector accept type θi iff

xi[t] E[Ni[t:T]] > 0.5

proposition [Vera & B, 2018] fluid Bayes selector has E[Regret] ≤ 2vmax n

i=1 p−1 i 14/18

slide-33
SLIDE 33

Bayes selector for i.i.d arrivals

Bayes selector πt = P

  • V off [t, T] decreases if OFFLINE accepts tth arrival
  • – accept tth arrival iff πt > 0.5

re-solved fluid LP max.

n

  • i=1

xi[t]vi s.t. Ax[t] ≤ B[t], 0 ≤ xi[t] ≤ E[Ni[t : T]] a the re-solved LP gives an approximate admission oracle fluid Bayes selector accept type θi iff

xi[t] E[Ni[t:T]] > 0.5

proposition [Vera & B, 2018] fluid Bayes selector has E[Regret] ≤ 2vmax n

i=1 p−1 i

– proposed for multi-secretary by [Gurvich & Arlotto, 2017] – NRM via partial resolving [Bumpensanti & Wang, 2018]

14/18

slide-34
SLIDE 34

proof outline

the proof comprises two parts

  • 1. compensated coupling: regret bound for Bayes selector for generic
  • nline decision problem
  • 2. bound compensation for online packing problems via LP sensitivity,

measure concentration

15/18

slide-35
SLIDE 35

the compensated coupling: make OFFLINE follow ONLINE

for any time t, budget B[t]

  • let V off (t, B[t]) OFFLINE starting from current state

16/18

slide-36
SLIDE 36

the compensated coupling: make OFFLINE follow ONLINE

for any time t, budget B[t]

  • let V off (t, B[t]) OFFLINE starting from current state
  • for any action a, disagreement set Qt(a) set of sample-paths ω

where a is sub-optimal (given B[t])

16/18

slide-37
SLIDE 37

the compensated coupling: make OFFLINE follow ONLINE

for any time t, budget B[t]

  • let V off (t, B[t]) OFFLINE starting from current state
  • for any action a, disagreement set Qt(a) set of sample-paths ω

where a is sub-optimal (given B[t])

  • can compensate OFFLINE to follow same action a as ONLINE

V off (t, B[t]) ≤ Ralg

t

+ vmax1ω∈Qt(a) + V off (t + 1, B[t + 1])

16/18

slide-38
SLIDE 38

the compensated coupling: make OFFLINE follow ONLINE

for any time t, budget B[t]

  • let V off (t, B[t]) OFFLINE starting from current state
  • for any action a, disagreement set Qt(a) set of sample-paths ω

where a is sub-optimal (given B[t])

  • can compensate OFFLINE to follow same action a as ONLINE

V off (t, B[t]) ≤ Ralg

t

+ vmax1ω∈Qt(a) + V off (t + 1, B[t + 1])

  • iterating, we get

E[V off ] ≤ E[V alg] + vmax

T

  • t=1

P[Qt(at)] note: Bayes selector picks at = mina P[Qt(at)]

16/18

slide-39
SLIDE 39

compensated coupling for single resource allocation

for any time t, budget B[t]

  • if Bayes selector rejects type θi, assume OFFLINE front-loads θi

– error only if OFFLINE rejects all future θi

  • if Bayes selector accepts type θi, assume OFFLINE back-loads θi

– error only if OFFLINE accepts all future θi

17/18

slide-40
SLIDE 40

compensated coupling for single resource allocation

for any time t, budget B[t]

  • if Bayes selector rejects type θi, assume OFFLINE front-loads θi

– error only if OFFLINE rejects all future θi

  • if Bayes selector accepts type θi, assume OFFLINE back-loads θi

– error only if OFFLINE accepts all future θi

  • claim: smaller of the two events has probability e−c(T−t)

17/18

slide-41
SLIDE 41

summary

  • nline allocation via the Bayes selector
  • new online allocation policy with horizon-independent regret
  • way to use black-box predictive algorithms
  • generic regret bounds for any online decision problem

18/18

slide-42
SLIDE 42

Thanks!

18/18