Neural Contextual Bandits with UCB-based Exploration
Dongruo Zhou 1 Lihong Li 2 Quanquan Gu 1
1Department of Computer Science, UCLA 2Google Research 1 / 49
Neural Contextual Bandits with UCB-based Exploration Dongruo Zhou 1 - - PowerPoint PPT Presentation
Neural Contextual Bandits with UCB-based Exploration Dongruo Zhou 1 Lihong Li 2 Quanquan Gu 1 1 Department of Computer Science, UCLA 2 Google Research 1 / 49 Outline Background Contextual bandit problem Deep neural networks 2 / 49
1Department of Computer Science, UCLA 2Google Research 1 / 49
2 / 49
3 / 49
4 / 49
5 / 49
6 / 49
7 / 49
8 / 49
t − rt,at)
t = argmaxa∈[K]E[rt,a] is the optimal action at round t
9 / 49
10 / 49
11 / 49
12 / 49
13 / 49
14 / 49
15 / 49
16 / 49
18 / 49
19 / 49
20 / 49
21 / 49
22 / 49
23 / 49
24 / 49
25 / 49
26 / 49
27 / 49
a=1
28 / 49
a=1
t−1g(xt,a; θt−1)/m
29 / 49
a=1
t−1g(xt,a; θt−1)/m
t,aZ−1 t−1xt,a
30 / 49
a=1
t−1g(xt,a; θt−1)/m
t,aZ−1 t−1xt,a
31 / 49
32 / 49
t
2/2
33 / 49
t
2/2
34 / 49
36 / 49
i=1.
37 / 49
i=1.
i=1 are parallel.
38 / 49
i=1.
i=1 are parallel.
i=1 is defined as
39 / 49
i=1.
i=1 are parallel.
i=1 is defined as
40 / 49
i=1.
i=1 are parallel.
i=1 is defined as
41 / 49
i=1 ∈ RTK. Set J =
42 / 49
i=1 ∈ RTK. Set J =
43 / 49
i=1 ∈ RTK. Set J =
44 / 49
i=1 ∈ RTK. Set J =
45 / 49
46 / 49
47 / 49
2000 4000 6000 8000 10000 Round 200 400 600 800 1000 1200 1400 1600 Regret
LinUCB KernelUCB BootstrappedNN Neural -Greedy0 NeuralUCB0 Neural -Greedy NeuralUCB
48 / 49
2000 4000 6000 8000 10000 Round 200 400 600 800 1000 1200 1400 1600 Regret
LinUCB KernelUCB BootstrappedNN Neural -Greedy0 NeuralUCB0 Neural -Greedy NeuralUCB
49 / 49