Learning with Global Cost in Stochastic Environments Eyal Even-dar, - - PowerPoint PPT Presentation

learning with global cost in stochastic environments
SMART_READER_LITE
LIVE PREVIEW

Learning with Global Cost in Stochastic Environments Eyal Even-dar, - - PowerPoint PPT Presentation

Learning with Global Cost in Stochastic Environments Eyal Even-dar, Shie Mannor and Yishay Mansour Technion COLT, June 2010 (You havent heard it last year.) Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT,


slide-1
SLIDE 1

Learning with Global Cost in Stochastic Environments

Eyal Even-dar, Shie Mannor and Yishay Mansour

Technion

COLT, June 2010 (You haven’t heard it last year.)

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 1 / 26

slide-2
SLIDE 2

Learning with Global Cost in Stochastic Environments

Eyal Even-dar, Shie Mannor and Yishay Mansour

Technion

COLT, June 2010 (You haven’t heard it last year.) Really.

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 1 / 26

slide-3
SLIDE 3

Table of contents

1

Introduction

2

The Framework

3

Natural algorithms that don’t work

4

Algorithms that sort of work

5

Analysis

6

Conclusions and open problems

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 2 / 26

slide-4
SLIDE 4

Regret Minimization

Let L be a sequence of losses of length T, then R(T, L) = E[max(Cost(alg, L) − Cost(opt in hindsight, L), 0)] R(T) = maxL R(T, L)

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 3 / 26

slide-5
SLIDE 5

Regret Minimization

Let L be a sequence of losses of length T, then R(T, L) = E[max(Cost(alg, L) − Cost(opt in hindsight, L), 0)] R(T) = maxL R(T, L) An algorithm is no-regret if R(T) is sublinear in T.

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 3 / 26

slide-6
SLIDE 6

Regret Minimization

Let L be a sequence of losses of length T, then R(T, L) = E[max(Cost(alg, L) − Cost(opt in hindsight, L), 0)] R(T) = maxL R(T, L) An algorithm is no-regret if R(T) is sublinear in T. Cost is in general not additive

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 3 / 26

slide-7
SLIDE 7

Regret Minimization (biased view)

So we have come a long way

N experts (full and partial information) Shortest path (full and partial information) Strongly convex functions (better bounds) Many more... (40% of papers this year).

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 4 / 26

slide-8
SLIDE 8

Regret Minimization (biased view)

So we have come a long way

N experts (full and partial information) Shortest path (full and partial information) Strongly convex functions (better bounds) Many more... (40% of papers this year).

But some room to grow

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 4 / 26

slide-9
SLIDE 9

Regret Minimization (biased view)

So we have come a long way

N experts (full and partial information) Shortest path (full and partial information) Strongly convex functions (better bounds) Many more... (40% of papers this year).

But some room to grow

There is no memory/state (in most works).

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 4 / 26

slide-10
SLIDE 10

Regret Minimization (biased view)

So we have come a long way

N experts (full and partial information) Shortest path (full and partial information) Strongly convex functions (better bounds) Many more... (40% of papers this year).

But some room to grow

There is no memory/state (in most works). Losses are assumed to be additive across time (in almost all works).

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 4 / 26

slide-11
SLIDE 11

Regret Minimization (biased view)

So we have come a long way

N experts (full and partial information) Shortest path (full and partial information) Strongly convex functions (better bounds) Many more... (40% of papers this year).

But some room to grow

There is no memory/state (in most works). Losses are assumed to be additive across time (in almost all works). Most algorithms are essentially greedy (bad for job talks).

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 4 / 26

slide-12
SLIDE 12

Regret Minimization with State

Routing [AK .... ] MDPs [EKM, YMS] Paging [BBK] Data structures [BCK] Load balancing – this talk

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 5 / 26

slide-13
SLIDE 13

Are we optimizing the true loss function?

Predicting click through rates (calibration) Handwriting recognition (calibration) Relevant documents, viral marketing (sub modular function)

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 6 / 26

slide-14
SLIDE 14

Are we optimizing the true loss function?

Predicting click through rates (calibration) Handwriting recognition (calibration) Relevant documents, viral marketing (sub modular function) Load balancing

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 6 / 26

slide-15
SLIDE 15

Model

N alternatives Algorithm chooses a distribution ¯ pt over the alternatives and then

  • bserves loss vector ¯

ℓt. Algorithm accumulated loss: ¯ LA

t = t τ=1 ¯

ℓτ · ¯ pτ Overall loss: ¯ Lt = t

τ=1 ¯

ℓτ Algorithm cost: C( ¯ LA

t ), where C is a global cost function.

OPT cost: C ∗( ¯ Lt) = minα∈∆(N) C(α · ¯ Lt). Regret: max{C( ¯ LA

t ) − C ∗( ¯

Lt), 0}. Assume C is Ld norm (d ≥ 1 = ⇒ C is convex and C ∗ concave).

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 7 / 26

slide-16
SLIDE 16

Model - load balancing with makespan

Assume makespan: C = · ∞ .

Time loss Dist. Alg Accu. C(Alg) Over loss C ∗ 1 (1,1) (.5,.5) (.5,.5) .5 (1,1) .5

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 8 / 26

slide-17
SLIDE 17

Model - load balancing with makespan

Assume makespan: C = · ∞ .

Time loss Dist. Alg Accu. C(Alg) Over loss C ∗ 1 (1,1) (.5,.5) (.5,.5) .5 (1,1) .5 2 (1,0) (.5,.5) (1,.5) 1 (2,1) .66

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 8 / 26

slide-18
SLIDE 18

Model - load balancing with makespan

Assume makespan: C = · ∞ .

Time loss Dist. Alg Accu. C(Alg) Over loss C ∗ 1 (1,1) (.5,.5) (.5,.5) .5 (1,1) .5 2 (1,0) (.5,.5) (1,.5) 1 (2,1) .66 3 (1,0) (.33,.66) (1.33,.5) 1.33 (3,1) .75

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 8 / 26

slide-19
SLIDE 19

Model - load balancing with makespan

Assume makespan: C = · ∞ .

Time loss Dist. Alg Accu. C(Alg) Over loss C ∗ 1 (1,1) (.5,.5) (.5,.5) .5 (1,1) .5 2 (1,0) (.5,.5) (1,.5) 1 (2,1) .66 3 (1,0) (.33,.66) (1.33,.5) 1.33 (3,1) .75 4 (0,1) (.25,.75) (1.33,1.25) 1.33 (3,2) 1.2

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 8 / 26

slide-20
SLIDE 20

Model - load balancing with makespan

Assume makespan: C = · ∞ .

Time loss Dist. Alg Accu. C(Alg) Over loss C ∗ 1 (1,1) (.5,.5) (.5,.5) .5 (1,1) .5 2 (1,0) (.5,.5) (1,.5) 1 (2,1) .66 3 (1,0) (.33,.66) (1.33,.5) 1.33 (3,1) .75 4 (0,1) (.25,.75) (1.33,1.25) 1.33 (3,2) 1.2 Minimizing the sum of losses does not minimize C ∗ and vice versa

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 8 / 26

slide-21
SLIDE 21

Model - load balancing with makespan

Let’s focus on the makespan (L∞) for now. Optimal policy in hindsight the load vector ¯ L is pi = 1/Li N

j=1 1/Lj

Cost of the optimal policy is C ∗(¯ L) = 1 N

j=1 1/Lj

= N

j=1 Lj

N

j=1

  • i=j Li

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 9 / 26

slide-22
SLIDE 22

The Loss Model

The loss sequence is generated by a stochastic source. In the talk we consider a very simple case, however the results hold in general.

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 10 / 26

slide-23
SLIDE 23

The Loss Model

The loss sequence is generated by a stochastic source. In the talk we consider a very simple case, however the results hold in general.

The loss vector allows correlation between the arms: some measure D provided IID loss vectors. (Note: arms are possibly correlated.)

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 10 / 26

slide-24
SLIDE 24

The Loss Model

The loss sequence is generated by a stochastic source. In the talk we consider a very simple case, however the results hold in general.

The loss vector allows correlation between the arms: some measure D provided IID loss vectors. (Note: arms are possibly correlated.) Known D and unknown D are both interesting. (We thought known D would be easy - how hard can the stochastic case be if you solved the adversarial case and you know the source?)

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 10 / 26

slide-25
SLIDE 25

Known source - a simple example

Consider two machines: Each time w.p 1/2 load vector is (1, 0) and w.p 1/2 load vector is (0, 1) W.h.p the cost of the best fixed policy in hindsight is T/4 − O(1) What is the optimal policy?

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 11 / 26

slide-26
SLIDE 26

“Naive model based”

Standard technique in control/machine learning:

1 Learn the model 2 Compute optimal policy for the learned model

AKA “certainty equivalence”

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 12 / 26

slide-27
SLIDE 27

“Naive model based”

Standard technique in control/machine learning:

1 Learn the model 2 Compute optimal policy for the learned model

AKA “certainty equivalence” We already know the model, so let’s do the following:

Naive model based algorithm

At each time-step assign 1/2 of the job to machine 1 and half to machine 2 How good is it? is it optimal?

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 12 / 26

slide-28
SLIDE 28

“Naive model based” - Simulation

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 13 / 26

slide-29
SLIDE 29

Performance analysis

Cost ingredients

Sum of actual loads on two machines Difference between the machines max(x, y) = x + y 2 + |x − y| 2

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 14 / 26

slide-30
SLIDE 30

“Naive model based” - Analysis

Analysis

Expected sum: T/2 (like every algorithm...) Difference: W.h.p Load on one machine is at least T/4 + √ T/2 and

  • n the second machine is T/4 −

√ T/2. Thus difference is √ T. Cost: At least T/4 + √ T/2. Regret: O( √ T)

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 15 / 26

slide-31
SLIDE 31

“Naive model based” - Analysis

Analysis

Expected sum: T/2 (like every algorithm...) Difference: W.h.p Load on one machine is at least T/4 + √ T/2 and

  • n the second machine is T/4 −

√ T/2. Thus difference is √ T. Cost: At least T/4 + √ T/2. Regret: O( √ T)

Can we do better?

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 15 / 26

slide-32
SLIDE 32

Least loaded machine

The algorithm

At every time-step assign the next job to the least loaded machine

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 16 / 26

slide-33
SLIDE 33

Least loaded machine

The algorithm

At every time-step assign the next job to the least loaded machine

Analysis

Expected sum: T/2 Expected difference: < 1 Expected cost: T/4 Expected regret: O( √ T)

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 16 / 26

slide-34
SLIDE 34

Comparison - Simulation

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 17 / 26

slide-35
SLIDE 35

Regret and bias-variance tradeoffs

Least loaded machine

In terms of expected loss, the algorithm is optimal Regret is still O( √ T). The regret measures the variance and the bias for this setting!

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 18 / 26

slide-36
SLIDE 36

Regret and bias-variance tradeoffs

Least loaded machine

In terms of expected loss, the algorithm is optimal Regret is still O( √ T). The regret measures the variance and the bias for this setting! Can we lower the regret while maintaining the optimal expected loss?

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 18 / 26

slide-37
SLIDE 37

Balance at the end

End balance

Until T − 4 √ T: play at random (.5, .5) After time T − 4 √ T: use least loaded machine algorithm

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 19 / 26

slide-38
SLIDE 38

Balance at the end

End balance

Until T − 4 √ T: play at random (.5, .5) After time T − 4 √ T: use least loaded machine algorithm

Analysis

Expected sum: T/2 Expected difference: < 1 Expected cost: T/4 Expected regret: O(T 1/4)

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 19 / 26

slide-39
SLIDE 39

Comparison - Simulation

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 20 / 26

slide-40
SLIDE 40

Improved algorithms

Recursive balance

Partition time into blocks of size, T/2,T/4,T/8,...1. (Yes: blocks become smaller.) At every block set play 1

2 + ǫ to balance the “offset” from the

previous block.

“offset” - the deviation of the process from its true probability (not influenced by the algorithm)

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 21 / 26

slide-41
SLIDE 41

Improved algorithms

Recursive balance

Partition time into blocks of size, T/2,T/4,T/8,...1. (Yes: blocks become smaller.) At every block set play 1

2 + ǫ to balance the “offset” from the

previous block.

“offset” - the deviation of the process from its true probability (not influenced by the algorithm)

Regret of the algorithm is O(log T)

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 21 / 26

slide-42
SLIDE 42

Improved algorithms

Recursive balance

Partition time into blocks of size, T/2,T/4,T/8,...1. (Yes: blocks become smaller.) At every block set play 1

2 + ǫ to balance the “offset” from the

previous block.

“offset” - the deviation of the process from its true probability (not influenced by the algorithm)

Regret of the algorithm is O(log T)

Anytime

Set ǫ = 1/T 1/4 At every time step assign weight 1

2 + ǫ on the least loaded machine.

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 21 / 26

slide-43
SLIDE 43

Improved algorithms

Recursive balance

Partition time into blocks of size, T/2,T/4,T/8,...1. (Yes: blocks become smaller.) At every block set play 1

2 + ǫ to balance the “offset” from the

previous block.

“offset” - the deviation of the process from its true probability (not influenced by the algorithm)

Regret of the algorithm is O(log T)

Anytime

Set ǫ = 1/T 1/4 At every time step assign weight 1

2 + ǫ on the least loaded machine.

Regret of the algorithm O(T 1/4) any time.

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 21 / 26

slide-44
SLIDE 44

Comparison - Simulation

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 22 / 26

slide-45
SLIDE 45

Analysis

Define generic properties of a desired phased algorithm. P1 The empirical frequencies of the losses are close their true expectations. P2 The base weights are close to the optimal weight for all actions. P3 The phase length does not shrink too fast. We analyze a generic algorithm with the above properties: = ⇒ Regret is small if properties hold for most phases with high probability. Define a generic algorithm: Use a base weight vector w∗. During phase k the weight of action i the algorithm does not change, and it equals wk(i) = w∗(i) + T k−1 T k (optk−1(i) − w∗(i))

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 23 / 26

slide-46
SLIDE 46

Makespan: known probabilities

Theorem

For known probabilities the regret is at most O(log T) Set w∗(i) = 1/p(i)

P

, where P = n

i=1 1/p(i), i.e., the optimal

allocation for p. Set the number of phases m = log(T). Set the length of phase k to be T k = T/2k for k ∈ [1, m].

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 24 / 26

slide-47
SLIDE 47

Makespan: Unknown probabilities

Theorem

For unknown probabilities w.h.p the regret is at most O(log2 T). Don’t have true p: estimate entire past leads to difficult analysis. We couldn’t solve it.

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 25 / 26

slide-48
SLIDE 48

Makespan: Unknown probabilities

Theorem

For unknown probabilities w.h.p the regret is at most O(log2 T). Don’t have true p: estimate entire past leads to difficult analysis. We couldn’t solve it. Instead we divide the time horizon to blocks and each block to phases. Partition T to log(T/2) blocks, where the r-th block, Br, has 2r time steps. Set reference wr,∗(i) using the observed probabilities in block Br−1 as follows. In block Br we have m = r phases, where the duration of phase k is T r,k = |Br|/2k = 2r−k. (Decreasing phase lengths.) Not known if tight.

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 25 / 26

slide-49
SLIDE 49

Conclusions and open problems

Stochastic vs adversarial model: different rates. (Not really surprising.) Information model is specific - other information models are possible Next COLT? Calibration without calibrating.

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 26 / 26

slide-50
SLIDE 50

Conclusions and open problems

Stochastic vs adversarial model: different rates. (Not really surprising.) Information model is specific - other information models are possible Next COLT? Calibration without calibrating. Still open: Lower bounds Looks really hard For which other global functions no regret is possible? Relaxed goals for global functions. Thank you!

Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 26 / 26