Reminder from last week Goals Lower bounds on the weak regret
The Nonstochastic Multi Armed Bandit Problem Part 2 and counting...
Shahaf Nacson
TAU
Nov 15, 2017
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
The Nonstochastic Multi Armed Bandit Problem Part 2 and counting... - - PowerPoint PPT Presentation
Reminder from last week Goals Lower bounds on the weak regret The Nonstochastic Multi Armed Bandit Problem Part 2 and counting... Shahaf Nacson TAU Nov 15, 2017 Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem Reminder from last
Reminder from last week Goals Lower bounds on the weak regret
TAU
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Assume K is known to player in advance
Generalized trivially to [a, b] by (b − a)x + a
Player learns only rewards of arms he chose
Can even be adversarial
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
I.e. before the first arm is pulled Assignments can be picked after player strategy is already known
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
denoted usually by i ∈ {1, ..., K}
denoted usually by time t ∈ {1, ..., T} One action i per time t
xi(t) ∈ [0, 1]
Choose arm it at time t (and receive reward xit(t)) Player only knows xi1(1), ..., xit(t) of previously chosen actions i1, ..., it
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
denoted usually by i ∈ {1, ..., K}
denoted usually by time t ∈ {1, ..., T} One action i per time t
xi(t) ∈ [0, 1]
Can be viewed as a sequence I1, I2, ... where each It is a mapping from ({1, ..., K} × [0, 1])t−1 → {1, ..., K}, that is the set of action indices and previous rewards to the set of indices
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
T
denoted as GA when context of T is obvious
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
T
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
j T
denoted as Gmax as well
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Does not match the upper bound of previous week’s algorithm
Closing the gap is still an open problem (today??)
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
s.t all strategies would reach an expected regret of our lower bound
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
s.t all strategies would reach an expected regret of our lower bound
Pretty straightforward
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
s.t all strategies would reach an expected regret of our lower bound
Pretty straightforward
Here is where all the magic happens
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
s.t all strategies would reach an expected regret of our lower bound
Pretty straightforward
Here is where all the magic happens
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
s.t all strategies would reach an expected regret of our lower bound
Here is where all the magic happens
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
2] to be chosen later down the road
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
2 + ǫ)T
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Pi{·} = P∗{ · |i = I}
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Pi{·} = P∗{ · |i = I}
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
r = r T - the entire sequence
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
3) ≈ 0.925, but let’s keep it simple).
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
2 + ǫ if it = I and 1 2 if it = I:
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
2 + ǫ if it = I and 1 2 if it = I:
T
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
T
K
K
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
K
K
i=1 Eunif [Ni] = T and so, K i=1
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
K
K
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
T gives a lower bound of Ω(
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
3)x for x ∈ [0, 1 4], The Lemma will
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
2 1 2)?
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
2 1 2)? 0
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
2 1 2 + ǫ)?
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
2 1 2 + ǫ)? − 1 2 ln(1 − 4ǫ2)
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
1 ≤ (2 ln 2)KL(Punif Pi)
2 and
2 + ǫ
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
1 ≤ (2 ln 2)KL( P Q)
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
1 ≤ (2 ln 2)KL( P Q)
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
1 ≤ (2 ln 2)KL( P Q)
The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
1 = 4
2
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
1 ≤ (2 ln 2)KL( p q)
2
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
4 and q ≤ p. For q = p, g(p, q) = 0, hence
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
1 ≤ (2 ln 2)KL(Punif Pi)
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
T
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
T
T
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
T
T
T
2 1 2) + Punif {it = i}KL( 1 2 1 2 + ǫ))
2 regardless of past results. Pi is 1 2 if we didn’t
2 + ǫ if we did.
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
T
T
T
2 1 2) + Punif {it = i}KL( 1 2 1 2 + ǫ))
2 1 2) = 0, KL( 1 2 1 2 + ǫ) = − 1 2 ln(1 − 4ǫ2) ) Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
T
T
2 1 2) + Punif {it = i}KL( 1 2 1 2 + ǫ))
T
2 ln(1 − 4ǫ2))
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem
Reminder from last week Goals Lower bounds on the weak regret
Shahaf Nacson The Nonstochastic Multi Armed Bandit Problem