SLIDE 1 Advanced Algorithms (IV)
Shanghai Jiao Tong University
Chihao Zhang
March 23rd, 2020
SLIDE 2
Review
SLIDE 3
Review
We learnt the Markov inequality Pr[X ≥ a] ≤ E[X] a
SLIDE 4
Review
We learnt the Markov inequality Pr[X ≥ a] ≤ E[X] a We can choose an increasing function so that
f
Pr[X ≥ a] = Pr[f(X) ≥ f(a)] ≤ E[f(X)] f(a)
SLIDE 5
SLIDE 6
yields the Chebyshev’s inequality
f(x) = x2
SLIDE 7
yields the Chebyshev’s inequality
f(x) = x2
Pr[|X − E[X]| ≥ a] ≤ Var[X] a2 = E[X2] − E[X]2 a2
SLIDE 8
yields the Chebyshev’s inequality
f(x) = x2
Pr[|X − E[X]| ≥ a] ≤ Var[X] a2 = E[X2] − E[X]2 a2 What is a good choice of ?
f
SLIDE 9 yields the Chebyshev’s inequality
f(x) = x2
Pr[|X − E[X]| ≥ a] ≤ Var[X] a2 = E[X2] − E[X]2 a2 What is a good choice of ?
f
- grows fast
- is bounded and easy to calculate
f E[f(X)]
SLIDE 10
Moment Generating Function
SLIDE 11
Moment Generating Function
The function is a natural choice
f(x) = etx
SLIDE 12
Moment Generating Function
The function is a natural choice
f(x) = etx
The function is called the moment generating function
E[f(X)] = E[etX]
SLIDE 13
Moment Generating Function
The function is a natural choice
f(x) = etx
The function is called the moment generating function
E[f(X)] = E[etX]
In some cases, is easy to calculate…
E[etX]
SLIDE 14
Chernoff Bound
SLIDE 15 Chernoff Bound
Assume , where each is an independent Bernoulli variable with mean
X =
n
∑
i=1
Xi Xi ∼ Ber(pi) pi
SLIDE 16 Chernoff Bound
Assume , where each is an independent Bernoulli variable with mean
X =
n
∑
i=1
Xi Xi ∼ Ber(pi) pi
SLIDE 17 Chernoff Bound
Assume , where each is an independent Bernoulli variable with mean
X =
n
∑
i=1
Xi Xi ∼ Ber(pi) pi
E[etX] = E[et∑n
i=1 Xi]
=
n
∏
i=1
E[eXi] =
n
∏
i=1
(pi ⋅ et + 1 − pi) =
n
∏
i=1
epi(et−1) = eE[X](et−1)
SLIDE 18
SLIDE 19 Let μ = E[X] =
n
∑
i=1
pi
SLIDE 20 For , we can deduce
t > 0
Let μ = E[X] =
n
∑
i=1
pi
SLIDE 21 For , we can deduce
t > 0
Let μ = E[X] =
n
∑
i=1
pi
Pr[X > (1 + δ)μ] = Pr[etX ≥ et(1+δ)μ] ≤ E[etX] et(1+δ)μ = e(et−1)μ et(1+δ)μ
SLIDE 22 For , we can deduce
t > 0
Let μ = E[X] =
n
∑
i=1
pi
In order to obtain a tight bound, we optimize to minimize
t
Pr[X > (1 + δ)μ] = Pr[etX ≥ et(1+δ)μ] ≤ E[etX] et(1+δ)μ = e(et−1)μ et(1+δ)μ
SLIDE 23
SLIDE 24
Since , we can choose .
e(et−1)μ et(1+δ)μ = eμ(et−1−t(1+δ)) t = log(1 + δ) > 0
SLIDE 25 Since , we can choose .
e(et−1)μ et(1+δ)μ = eμ(et−1−t(1+δ)) t = log(1 + δ) > 0
So Pr[X > (1 + δ)μ] ≤ (
eδ (1 + δ)1+δ )
μ
SLIDE 26 Since , we can choose .
e(et−1)μ et(1+δ)μ = eμ(et−1−t(1+δ)) t = log(1 + δ) > 0
So Pr[X > (1 + δ)μ] ≤ (
eδ (1 + δ)1+δ )
μ
We can similarly obtain (using )
t < 0
SLIDE 27 Since , we can choose .
e(et−1)μ et(1+δ)μ = eμ(et−1−t(1+δ)) t = log(1 + δ) > 0
So Pr[X > (1 + δ)μ] ≤ (
eδ (1 + δ)1+δ )
μ
We can similarly obtain (using )
t < 0
Pr[X < (1 − δ)μ] ≤ ( e−δ (1 − δ)1−δ)
μ
SLIDE 28
SLIDE 29 To summarize, for , we have
X =
n
∑
i=1
Xi
eδ (1 + δ)1+δ )
μ
Pr[X ≤ (1 − δ)μ] ≤ ( e−δ (1 − δ)1−δ)
μ
SLIDE 30 A more useful expression is that for 0 < δ ≤ 1
- Pr[X ≥ (1 + δ)μ] ≤ e−μδ2/3
Pr[X ≤ (1 − δ)μ] ≤ e−μδ2/2
To summarize, for , we have
X =
n
∑
i=1
Xi
eδ (1 + δ)1+δ )
μ
Pr[X ≤ (1 − δ)μ] ≤ ( e−δ (1 − δ)1−δ)
μ
SLIDE 31
Max Load
SLIDE 32
Max Load
Recall in the max load problem, we throw balls into bins
n n
SLIDE 33
Max Load
Recall in the max load problem, we throw balls into bins
n n
The number of balls in -th bin,
i Xi ∼ Bin (n, 1 n )
SLIDE 34
Max Load
Recall in the max load problem, we throw balls into bins
n n
The number of balls in -th bin,
i Xi ∼ Bin (n, 1 n )
Note that , what is the probability that ?
E[Xi] = 1 Xi > c log n log log n
SLIDE 35
SLIDE 36
In this case, .
1 + δ = c log n log log n
SLIDE 37
In this case, .
1 + δ = c log n log log n
Applying Chernoff bound, we obtain
SLIDE 38
In this case, .
1 + δ = c log n log log n
Applying Chernoff bound, we obtain Pr[Xi ≥ c log n log log n ] ≤ eδ (1 + δ)1+δ ≤ n−c+o(1),
SLIDE 39
In this case, .
1 + δ = c log n log log n
Applying Chernoff bound, we obtain Pr[Xi ≥ c log n log log n ] ≤ eδ (1 + δ)1+δ ≤ n−c+o(1), which is tight in order comparing to our analytic result.
SLIDE 40
SLIDE 41
The Chernoff bound has a few drawbacks:
SLIDE 42 The Chernoff bound has a few drawbacks:
needs to be independent.
- is required to follow the
Xi Xi Ber(pi)
SLIDE 43 The Chernoff bound has a few drawbacks:
needs to be independent.
- is required to follow the
Xi Xi Ber(pi)
We will try to generalize the Chernoff bound to
SLIDE 44
Hoeffding Inequality
SLIDE 45
Hoeffding Inequality
The Hoeffding Inequality generalizes to those with and .
Xi E[Xi] = 0 ai ≤ Xi ≤ bi
SLIDE 46 Hoeffding Inequality
The Hoeffding Inequality generalizes to those with and .
Xi E[Xi] = 0 ai ≤ Xi ≤ bi
Pr [
n
∑
i=1
Xi ≥ t] ≤ exp (− 2t2 ∑n
i=1 (bi − ai)2 )
SLIDE 47
SLIDE 48
The key property to establish Hoeffding inequality is an upper bound on the moment generating function
SLIDE 49
The key property to establish Hoeffding inequality is an upper bound on the moment generating function Lemma Assume satisfies and , then
X X ∈ [a, b] E[X] = 0 E[etX] ≤ exp ( t2 8 (b − a)2 )
SLIDE 50
The key property to establish Hoeffding inequality is an upper bound on the moment generating function Lemma Assume satisfies and , then
X X ∈ [a, b] E[X] = 0 E[etX] ≤ exp ( t2 8 (b − a)2 )
You can find the proof of the lemma and Hoeffding inequality in the book Probability and Computing
SLIDE 51
Multi-Armed Bandit
SLIDE 52
Multi-Armed Bandit
In the problem of MAB, there are bandits
k
SLIDE 53 Multi-Armed Bandit
In the problem of MAB, there are bandits
k
- each bandit has a unknown random reward
distribution on with
- each round one can pull an arm and obtain
a reward
fi [0,1] μi = E[fi] i r ∼ fi
SLIDE 54 Multi-Armed Bandit
In the problem of MAB, there are bandits
k
The goal is to identify the best arm via trials
- each bandit has a unknown random reward
distribution on with
- each round one can pull an arm and obtain
a reward
fi [0,1] μi = E[fi] i r ∼ fi
SLIDE 55
Regret of MAB
SLIDE 56 Regret of MAB
We assume μ1 = max
1≤i≤k μi
SLIDE 57 Regret of MAB
If the game is played for rounds, the best reward
in expectation
T Tμ1
We assume μ1 = max
1≤i≤k μi
SLIDE 58 Regret of MAB
If the game is played for rounds, the best reward
in expectation
T Tμ1
We are often not so lucky to achieve this, so the goal is to find a strategy to minimize We assume μ1 = max
1≤i≤k μi
SLIDE 59 Regret of MAB
If the game is played for rounds, the best reward
in expectation
T Tμ1
We are often not so lucky to achieve this, so the goal is to find a strategy to minimize R(T) = Tμ1 −
T
∑
t=1
μat
pulled at round
at t
Regret Best Reward
We assume μ1 = max
1≤i≤k μi
SLIDE 60
What is a good strategy?
SLIDE 61
What is a good strategy?
We view as a function of and consider
R(T) T T → ∞
SLIDE 62
What is a good strategy?
We view as a function of and consider
R(T) T T → ∞
If we eventually find the best arm, then R(T) = o(T)
SLIDE 63
What is a good strategy?
We view as a function of and consider
R(T) T T → ∞
If we eventually find the best arm, then R(T) = o(T) If we fail to find the best arm, we will suffer a regret , where the gap between the optimal and suboptimal rewards
Ω(ΔT) Δ
SLIDE 64
What is a good strategy?
We view as a function of and consider
R(T) T T → ∞
If we eventually find the best arm, then R(T) = o(T) If we fail to find the best arm, we will suffer a regret , where the gap between the optimal and suboptimal rewards
Ω(ΔT) Δ
So we need the failure probability is O(1/T)
SLIDE 65
The Upper Confidence Bound Algorithm
SLIDE 66
The Upper Confidence Bound Algorithm
We collect information up to round T
SLIDE 67 The Upper Confidence Bound Algorithm
We collect information up to round T
- number of times that -th arm has been
pulled
- estimate of the mean , which is equal to
if and is the reward at -th round
ni(T) i ̂ μi(T) μi ∑T
t=1 1[at = i] ⋅ r(t)
ni(T) ni(T) ≠ 0 r(t) t
SLIDE 68
Choose the Best Arm So Far?
SLIDE 69
Choose the Best Arm So Far?
The most straightforward idea is to choose the arm with best ̂
μi(T)
SLIDE 70
Choose the Best Arm So Far?
The most straightforward idea is to choose the arm with best ̂
μi(T)
The strategy might be inferior in case that we are unlucky so that the best arm performs bad at the first few trials.
SLIDE 71
Choose the Best Arm So Far?
The most straightforward idea is to choose the arm with best ̂
μi(T)
The strategy might be inferior in case that we are unlucky so that the best arm performs bad at the first few trials. So we have to add some offset term for those arms that are not “well-explored”
SLIDE 72
SLIDE 73 The UCB algorithm chooses the arm with largest ̂ μi(T) + ci(T)
term of arm at round
ci(T) i T
SLIDE 74 The UCB algorithm chooses the arm with largest ̂ μi(T) + ci(T)
term of arm at round
ci(T) i T
SLIDE 75 The UCB algorithm chooses the arm with largest ̂ μi(T) + ci(T)
term of arm at round
ci(T) i T
Intuitively, should be decreasing in , so we give more chances to arms that have not been well-tested
ci(T) ni
SLIDE 76 The UCB algorithm chooses the arm with largest ̂ μi(T) + ci(T)
term of arm at round
ci(T) i T
Intuitively, should be decreasing in , so we give more chances to arms that have not been well-tested
ci(T) ni
Let’s find out how to set ci(T)
SLIDE 77
We need the following event to happen whp ∀2 ≤ i ≤ k, ̂ μ1(T) + c1(T) > ̂ μi(T) + ci(T)
SLIDE 78
We need the following event to happen whp ∀2 ≤ i ≤ k, ̂ μ1(T) + c1(T) > ̂ μi(T) + ci(T) A sufficient condition for this is ̂ μ1(T) + c1(T) > μ1 > μi + 2ci(T) > ̂ μi(T) + ci(T)
SLIDE 79
We need the following event to happen whp ∀2 ≤ i ≤ k, ̂ μ1(T) + c1(T) > ̂ μi(T) + ci(T) A sufficient condition for this is ̂ μ1(T) + c1(T) > μ1 > μi + 2ci(T) > ̂ μi(T) + ci(T) :
> + > ∀j, ̂ μj(T) − cj(T) < μj < ̂ μj(T) + cj(T)
SLIDE 80
We need the following event to happen whp ∀2 ≤ i ≤ k, ̂ μ1(T) + c1(T) > ̂ μi(T) + ci(T) A sufficient condition for this is ̂ μ1(T) + c1(T) > μ1 > μi + 2ci(T) > ̂ μi(T) + ci(T) :
> + > ∀j, ̂ μj(T) − cj(T) < μj < ̂ μj(T) + cj(T)
:
> ∀i ≥ 2, ci(T) < μ1 − μi 2
SLIDE 81
We need the following event to happen whp ∀2 ≤ i ≤ k, ̂ μ1(T) + c1(T) > ̂ μi(T) + ci(T) A sufficient condition for this is ̂ μ1(T) + c1(T) > μ1 > μi + 2ci(T) > ̂ μi(T) + ci(T) :
> + > ∀j, ̂ μj(T) − cj(T) < μj < ̂ μj(T) + cj(T)
:
> ∀i ≥ 2, ci(T) < μ1 − μi 2
Trade-off on cj(T)
SLIDE 82
SLIDE 83
We apply Hoeffding inequality to bound the probability of
SLIDE 84
We apply Hoeffding inequality to bound the probability of ∀j, ∀t ≤ T, ̂ μj(t) − cj(t) < μj < ̂ μj(t) + cj(t)
SLIDE 85
We apply Hoeffding inequality to bound the probability of ∀j, ∀t ≤ T, ̂ μj(t) − cj(t) < μj < ̂ μj(t) + cj(t)
SLIDE 86 We apply Hoeffding inequality to bound the probability of ∀j, ∀t ≤ T, ̂ μj(t) − cj(t) < μj < ̂ μj(t) + cj(t)
Pr[| ̂ μj(t) − μj| > cj(t)] ≤ 2 exp (− 2c2
j
nj(1/nj)2 ) = 2 exp(−2c2
j nj)
SLIDE 87 We apply Hoeffding inequality to bound the probability of ∀j, ∀t ≤ T, ̂ μj(t) − cj(t) < μj < ̂ μj(t) + cj(t)
Pr[| ̂ μj(t) − μj| > cj(t)] ≤ 2 exp (− 2c2
j
nj(1/nj)2 ) = 2 exp(−2c2
j nj)
So the Hoeffding inequality suggests us to choose
cj(T) = Ω( log T nj(T) )
SLIDE 88
SLIDE 89
For this choice of , the condition becomes to
ci(T) ci(T) < μ1 − μi 2
SLIDE 90
For this choice of , the condition becomes to
ci(T) ci(T) < μ1 − μi 2
c log T ni(T) < μ1 − μi 2
SLIDE 91
For this choice of , the condition becomes to
ci(T) ci(T) < μ1 − μi 2
c log T ni(T) < μ1 − μi 2 This means , so we need to try each arm times for free!
ni(T) = Ω(log T) Ω(log T)
SLIDE 92
Some tedious calculations are required to obtain the final regret bound, which is Θ(log T) For this choice of , the condition becomes to
ci(T) ci(T) < μ1 − μi 2
c log T ni(T) < μ1 − μi 2 This means , so we need to try each arm times for free!
ni(T) = Ω(log T) Ω(log T)