Advanced Algorithms (IV) Shanghai Jiao Tong University Chihao Zhang March 23rd, 2020
Review
Review We learnt the Markov inequality Pr[ X ≥ a ] ≤ E [ X ] a
Review We learnt the Markov inequality Pr[ X ≥ a ] ≤ E [ X ] a We can choose an increasing function so that f Pr[ X ≥ a ] = Pr[ f ( X ) ≥ f ( a )] ≤ E [ f ( X )] f ( a )
f ( x ) = x 2 yields the Chebyshev’s inequality
f ( x ) = x 2 yields the Chebyshev’s inequality = E [ X 2 ] − E [ X ] 2 Pr[ | X − E [ X ] | ≥ a ] ≤ Var [ X ] a 2 a 2
f ( x ) = x 2 yields the Chebyshev’s inequality = E [ X 2 ] − E [ X ] 2 Pr[ | X − E [ X ] | ≥ a ] ≤ Var [ X ] a 2 a 2 What is a good choice of ? f
f ( x ) = x 2 yields the Chebyshev’s inequality = E [ X 2 ] − E [ X ] 2 Pr[ | X − E [ X ] | ≥ a ] ≤ Var [ X ] a 2 a 2 What is a good choice of ? f - grows fast f - is bounded and easy to calculate E [ f ( X )]
Moment Generating Function
Moment Generating Function The function f ( x ) = e tx is a natural choice
Moment Generating Function The function f ( x ) = e tx is a natural choice The function E [ f ( X )] = E [ e tX ] is called the moment generating function
Moment Generating Function The function f ( x ) = e tx is a natural choice The function E [ f ( X )] = E [ e tX ] is called the moment generating function In some cases, E [ e tX ] is easy to calculate…
Chernoff Bound
Chernoff Bound n ∑ Assume , where each is an X i ∼ Ber( p i ) X = X i i =1 independent Bernoulli variable with mean p i
Chernoff Bound n ∑ Assume , where each is an X i ∼ Ber( p i ) X = X i i =1 independent Bernoulli variable with mean p i
Chernoff Bound n ∑ Assume , where each is an X i ∼ Ber( p i ) X = X i i =1 independent Bernoulli variable with mean p i E [ e tX ] = E [ e t ∑ n i =1 X i ] n n ( p i ⋅ e t + 1 − p i ) ∏ ∏ E [ e X i ] = = i =1 i =1 n e p i ( e t − 1) = e E [ X ]( e t − 1) ∏ = i =1
n ∑ Let μ = E [ X ] = p i i =1
n ∑ Let μ = E [ X ] = p i i =1 For , we can deduce t > 0
n ∑ Let μ = E [ X ] = p i i =1 For , we can deduce t > 0 Pr[ X > (1 + δ ) μ ] = Pr[ e tX ≥ e t (1+ δ ) μ ] e t (1+ δ ) μ = e ( e t − 1) μ ≤ E [ e tX ] e t (1+ δ ) μ
n ∑ Let μ = E [ X ] = p i i =1 For , we can deduce t > 0 Pr[ X > (1 + δ ) μ ] = Pr[ e tX ≥ e t (1+ δ ) μ ] e t (1+ δ ) μ = e ( e t − 1) μ ≤ E [ e tX ] e t (1+ δ ) μ In order to obtain a tight bound, we optimize to t minimize
e ( e t − 1) μ e t (1+ δ ) μ = e μ ( e t − 1 − t (1+ δ )) Since , we can choose . t = log(1 + δ ) > 0
e ( e t − 1) μ e t (1+ δ ) μ = e μ ( e t − 1 − t (1+ δ )) Since , we can choose . t = log(1 + δ ) > 0 So Pr[ X > (1 + δ ) μ ] ≤ ( (1 + δ ) 1+ δ ) μ e δ
e ( e t − 1) μ e t (1+ δ ) μ = e μ ( e t − 1 − t (1+ δ )) Since , we can choose . t = log(1 + δ ) > 0 So Pr[ X > (1 + δ ) μ ] ≤ ( (1 + δ ) 1+ δ ) μ e δ We can similarly obtain (using ) t < 0
e ( e t − 1) μ e t (1+ δ ) μ = e μ ( e t − 1 − t (1+ δ )) Since , we can choose . t = log(1 + δ ) > 0 So Pr[ X > (1 + δ ) μ ] ≤ ( (1 + δ ) 1+ δ ) μ e δ We can similarly obtain (using ) t < 0 Pr[ X < (1 − δ ) μ ] ≤ ( μ (1 − δ ) 1 − δ ) e − δ
n ∑ To summarize, for , we have X = X i i =1 Pr[ X ≥ (1 + δ ) μ ] ≤ ( (1 + δ ) 1+ δ ) μ e δ • Pr[ X ≤ (1 − δ ) μ ] ≤ ( μ (1 − δ ) 1 − δ ) e − δ •
n ∑ To summarize, for , we have X = X i i =1 Pr[ X ≥ (1 + δ ) μ ] ≤ ( (1 + δ ) 1+ δ ) μ e δ • Pr[ X ≤ (1 − δ ) μ ] ≤ ( μ (1 − δ ) 1 − δ ) e − δ • A more useful expression is that for 0 < δ ≤ 1 • Pr[ X ≥ (1 + δ ) μ ] ≤ e − μδ 2 /3 • Pr[ X ≤ (1 − δ ) μ ] ≤ e − μδ 2 /2
Max Load
Max Load Recall in the max load problem, we throw balls into n bins n
Max Load Recall in the max load problem, we throw balls into n bins n X i ∼ Bin ( n , 1 n ) The number of balls in -th bin, i
Max Load Recall in the max load problem, we throw balls into n bins n X i ∼ Bin ( n , 1 n ) The number of balls in -th bin, i Note that , what is the probability that E [ X i ] = 1 X i > c log n ? log log n
1 + δ = c log n In this case, . log log n
1 + δ = c log n In this case, . log log n Applying Chernoff bound, we obtain
1 + δ = c log n In this case, . log log n Applying Chernoff bound, we obtain e δ Pr[ X i ≥ c log n (1 + δ ) 1+ δ ≤ n − c + o (1) , log log n ] ≤
1 + δ = c log n In this case, . log log n Applying Chernoff bound, we obtain e δ Pr[ X i ≥ c log n (1 + δ ) 1+ δ ≤ n − c + o (1) , log log n ] ≤ which is tight in order comparing to our analytic result.
The Chernoff bound has a few drawbacks:
The Chernoff bound has a few drawbacks: • each needs to be independent. X i • is required to follow the X i Ber( p i )
The Chernoff bound has a few drawbacks: • each needs to be independent. X i • is required to follow the X i Ber( p i ) We will try to generalize the Chernoff bound to overcome these issues
Hoeffding Inequality
Hoeffding Inequality The Hoeffding Inequality generalizes to those with X i and . a i ≤ X i ≤ b i E [ X i ] = 0
Hoeffding Inequality The Hoeffding Inequality generalizes to those with X i and . a i ≤ X i ≤ b i E [ X i ] = 0 Pr [ X i ≥ t ] ≤ exp ( − i =1 ( b i − a i ) 2 ) n 2 t 2 ∑ ∑ n i =1
The key property to establish Hoeffding inequality is an upper bound on the moment generating function
The key property to establish Hoeffding inequality is an upper bound on the moment generating function Lemma Assume satisfies and , then X ∈ [ a , b ] X E [ X ] = 0 E [ e tX ] ≤ exp ( ) t 2 8 ( b − a ) 2
The key property to establish Hoeffding inequality is an upper bound on the moment generating function Lemma Assume satisfies and , then X ∈ [ a , b ] X E [ X ] = 0 E [ e tX ] ≤ exp ( ) t 2 8 ( b − a ) 2 You can find the proof of the lemma and Hoeffding inequality in the book Probability and Computing
Multi-Armed Bandit
Multi-Armed Bandit In the problem of MAB, there are bandits k
Multi-Armed Bandit In the problem of MAB, there are bandits k • each bandit has a unknown random reward distribution on with f i [0,1] μ i = E [ f i ] • each round one can pull an arm and obtain i a reward r ∼ f i
Multi-Armed Bandit In the problem of MAB, there are bandits k • each bandit has a unknown random reward distribution on with f i [0,1] μ i = E [ f i ] • each round one can pull an arm and obtain i a reward r ∼ f i The goal is to identify the best arm via trials
Regret of MAB
Regret of MAB We assume μ 1 = max 1 ≤ i ≤ k μ i
Regret of MAB We assume μ 1 = max 1 ≤ i ≤ k μ i If the game is played for rounds, the best reward T on can obtain is in expectation T μ 1
Regret of MAB We assume μ 1 = max 1 ≤ i ≤ k μ i If the game is played for rounds, the best reward T on can obtain is in expectation T μ 1 We are often not so lucky to achieve this, so the goal is to find a strategy to minimize
Regret of MAB We assume μ 1 = max 1 ≤ i ≤ k μ i If the game is played for rounds, the best reward T on can obtain is in expectation T μ 1 We are often not so lucky to achieve this, so the goal is to find a strategy to minimize T ∑ R ( T ) = T μ 1 − μ a t t =1 - the arm actually a t Regret Best Reward pulled at round t
What is a good strategy?
What is a good strategy? We view as a function of and consider T → ∞ R ( T ) T
What is a good strategy? We view as a function of and consider T → ∞ R ( T ) T If we eventually find the best arm, then R ( T ) = o ( T )
What is a good strategy? We view as a function of and consider T → ∞ R ( T ) T If we eventually find the best arm, then R ( T ) = o ( T ) If we fail to find the best arm, we will suffer a regret , where the gap between the optimal and Ω ( Δ T ) Δ suboptimal rewards
What is a good strategy? We view as a function of and consider T → ∞ R ( T ) T If we eventually find the best arm, then R ( T ) = o ( T ) If we fail to find the best arm, we will suffer a regret , where the gap between the optimal and Ω ( Δ T ) Δ suboptimal rewards So we need the failure probability is O (1/ T )
The Upper Confidence Bound Algorithm
The Upper Confidence Bound Algorithm We collect information up to round T
̂ The Upper Confidence Bound Algorithm We collect information up to round T • - number of times that -th arm has been n i ( T ) i pulled • - estimate of the mean , which is equal to μ i ( T ) μ i ∑ T t =1 1 [ a t = i ] ⋅ r ( t ) if and is the n i ( T ) ≠ 0 r ( t ) n i ( T ) reward at -th round t
Choose the Best Arm So Far?
Recommend
More recommend