Old Dog Learns New Tricks: Randomized UCB for Bandit Problems
AISTATS 2020
Old Dog Learns New Tricks: Randomized UCB for Bandit Problems - - PowerPoint PPT Presentation
Old Dog Learns New Tricks: Randomized UCB for Bandit Problems AISTATS 2020 Motivating example: clinical trials Do not have complete information about the effectiveness or side-effects of the drugs. Aim : Infer the best drug by
AISTATS 2020
1
1
1
2
2
T
T
2
3
4
4
5
i∈[K]
6
i∈[K]
i∈[K]
6
i∈[K]
i∈[K]
i∈[K]
6
7
m/2σ2). 7
m/2σ2).
7
8
i
8
i
8
i
8
∆i>0
i
M + α2
M
9
∆i>0
i
M + α2
M
i
9
∆i>0
i
M + α2
M
i
9
∆i>0
i
M + α2
M
i
9
ℓ=1 XℓX T ℓ .
t
ℓ=1 YℓXℓ. Mean
t .
10
ℓ=1 XℓX T ℓ .
t
ℓ=1 YℓXℓ. Mean
t .
i∈[K]
t
10
ℓ=1 XℓX T ℓ .
t
ℓ=1 YℓXℓ. Mean
t .
i∈[K]
t
2
10
ℓ=1 XℓX T ℓ .
t
ℓ=1 YℓXℓ. Mean
t .
i∈[K]
t
2
2
10
2
dλ
11
2
dλ
11
2
dλ
11
2
dλ
11
2
dλ
11
12
13
14
15
15
Yasin Abbasi-Yadkori, D´ avid P´ al, and Csaba Szepesv´
Shipra Agrawal and Navin Goyal. Analysis of Thompson sampling for the multi-armed bandit problem. In COLT, 2012. Shipra Agrawal and Navin Goyal. Thompson sampling for contextual bandits with linear payoffs. In ICML, 2013. Sarah Filippi, Olivier Cappe, Aur´ elien Garivier, and Csaba Szepesv´
2010. Aur´ elien Garivier and Olivier Capp´
Branislav Kveton, Csaba Szepesv´ ari, Mohammad Ghavamzadeh, and Craig Boutilier. Perturbed-history exploration in stochastic linear bandits. UAI, 2019a. Branislav Kveton, Csaba Szepesv´ ari, Mohammad Ghavamzadeh, and Craig Boutilier. Perturbed-history exploration in stochastic multi-armed bandits. In IJCAI-19, 2019b. Branislav Kveton, Csaba Szepesv´ ari, Sharan Vaswani, Zheng Wen, Tor Lattimore, and Mohammad Ghavamzadeh. Garbage in, reward out: Bootstrapping exploration in multi-armed bandits. In ICML, 2019c. Branislav Kveton, Manzil Zaheer, Csaba Szepesv´ ari, Lihong Li, Mohammad Ghavamzadeh, and Craig Boutilier. Randomized exploration in generalized linear bandits. arXiv:1906.08947, 2019d. John Langford and Tong Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In NIPS, 2008. Lihong Li, Yu Lu, and Dengyong Zhou. Provably optimal algorithms for generalized linear contextual bandits. In ICML, 2017. 16