Discounted UCB Levente Kocsis and Csaba Szepesv ari MTA SZTAKI, - - PowerPoint PPT Presentation

discounted ucb
SMART_READER_LITE
LIVE PREVIEW

Discounted UCB Levente Kocsis and Csaba Szepesv ari MTA SZTAKI, - - PowerPoint PPT Presentation

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Discounted UCB Levente Kocsis and Csaba Szepesv ari MTA SZTAKI, Hungary Levente Kocsis and Csaba Szepesv ari Discounted UCB Contents UCB1-tuned


slide-1
SLIDE 1

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

Discounted UCB

Levente Kocsis and Csaba Szepesv´ ari

MTA SZTAKI, Hungary

Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

slide-2
SLIDE 2

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

slide-3
SLIDE 3

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

UCB1-tuned+

sit =

t

  • τ=0

I(Iτ = i)xτ nit =

t

  • τ=0

I(Iτ = i) µit = sit/nit nt =

  • i

nit It+1 = argmax

i

 µit +

  • max(µit(1 − µit), 0.002) ln nt

nit  

Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

slide-4
SLIDE 4

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

Discounted UCB1-tuned+

sit =

t

  • τ=0

I(Iτ = i)γt−τxτ nit =

t

  • τ=0

I(Iτ = i)γt−τ µit = sit/nit nt =

  • i

nit It+1 = argmax

i

 µit +

  • max(µit(1 − µit), 0.002) ln nt

nit  

Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

slide-5
SLIDE 5

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

Experiments: Task 1 (averaged over 1000 seeds)

0.01 0.1 1 10 100 1000 10 100 1000 10000 100000

regret iteration

UCB1-tuned Exp3 gamma=1.0 gamma=0.99999 gamma=0.9999

Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

slide-6
SLIDE 6

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

Experiments: Task 1 (averaged over test seeds)

0.1 1 10 100 1000 10 100 1000 10000 100000

regret iteration

UCB1-tuned Exp3 gamma=1.0 gamma=0.99999 gamma=0.9999

Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

slide-7
SLIDE 7

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

Experiments: Task 2 (averaged over 1000 seeds)

10 100 1000 10000 100000 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000

regret iteration

UCB1-tuned Exp3 gamma=0.999 gamma=0.99 periodic, gamma=0.999

Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

slide-8
SLIDE 8

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

Experiments: Task 2 (averaged over test seeds)

10 100 1000 10000 100000 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000

regret iteration

UCB1-tuned Exp3 gamma=0.999 gamma=0.99 periodic, gamma=0.999

Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

slide-9
SLIDE 9

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

Experiments: Task 3 (averaged over 1000 seeds)

0.1 1 10 100 1000 10 100 1000 10000 100000

regret iteration

UCB1-tuned Exp3 gamma=1.0 gamma=0.99999 gamma=0.9999

Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

slide-10
SLIDE 10

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

Experiments: Task 3 (averaged over test seeds)

0.1 1 10 100 1000 10 100 1000 10000 100000

regret iteration

UCB1-tuned Exp3 gamma=1.0 gamma=0.99999 gamma=0.9999

Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

slide-11
SLIDE 11

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

Other algorithms

◮ line fitting ◮ discounted UCB + exploiting periodicity ◮ adaptive discounted UCB

Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

slide-12
SLIDE 12

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions

Conclusions

◮ Challenging challenge ◮ Task 4(?): mixing task 1 and 2 ◮ Regret bounds depending on how fast the response rate vary? ◮ Universal algorithms (algorithms adapting to response rate)

Levente Kocsis and Csaba Szepesv´ ari Discounted UCB