advanced algorithms iv
play

Advanced Algorithms (IV) Shanghai Jiao Tong University Chihao Zhang - PowerPoint PPT Presentation

Advanced Algorithms (IV) Shanghai Jiao Tong University Chihao Zhang March 23rd, 2020 Review Review We learnt the Markov inequality Pr[ X a ] E [ X ] a Review We learnt the Markov inequality Pr[ X a ] E [ X ] a We can


  1. Advanced Algorithms (IV) Shanghai Jiao Tong University Chihao Zhang March 23rd, 2020

  2. Review

  3. Review We learnt the Markov inequality Pr[ X ≥ a ] ≤ E [ X ] a

  4. Review We learnt the Markov inequality Pr[ X ≥ a ] ≤ E [ X ] a We can choose an increasing function so that f Pr[ X ≥ a ] = Pr[ f ( X ) ≥ f ( a )] ≤ E [ f ( X )] f ( a )

  5. f ( x ) = x 2 yields the Chebyshev’s inequality

  6. f ( x ) = x 2 yields the Chebyshev’s inequality = E [ X 2 ] − E [ X ] 2 Pr[ | X − E [ X ] | ≥ a ] ≤ Var [ X ] a 2 a 2

  7. f ( x ) = x 2 yields the Chebyshev’s inequality = E [ X 2 ] − E [ X ] 2 Pr[ | X − E [ X ] | ≥ a ] ≤ Var [ X ] a 2 a 2 What is a good choice of ? f

  8. f ( x ) = x 2 yields the Chebyshev’s inequality = E [ X 2 ] − E [ X ] 2 Pr[ | X − E [ X ] | ≥ a ] ≤ Var [ X ] a 2 a 2 What is a good choice of ? f - grows fast f - is bounded and easy to calculate E [ f ( X )]

  9. Moment Generating Function

  10. Moment Generating Function The function f ( x ) = e tx is a natural choice

  11. Moment Generating Function The function f ( x ) = e tx is a natural choice The function E [ f ( X )] = E [ e tX ] is called the moment generating function

  12. Moment Generating Function The function f ( x ) = e tx is a natural choice The function E [ f ( X )] = E [ e tX ] is called the moment generating function In some cases, E [ e tX ] is easy to calculate…

  13. Chernoff Bound

  14. Chernoff Bound n ∑ Assume , where each is an X i ∼ Ber( p i ) X = X i i =1 independent Bernoulli variable with mean p i

  15. Chernoff Bound n ∑ Assume , where each is an X i ∼ Ber( p i ) X = X i i =1 independent Bernoulli variable with mean p i

  16. Chernoff Bound n ∑ Assume , where each is an X i ∼ Ber( p i ) X = X i i =1 independent Bernoulli variable with mean p i E [ e tX ] = E [ e t ∑ n i =1 X i ] n n ( p i ⋅ e t + 1 − p i ) ∏ ∏ E [ e X i ] = = i =1 i =1 n e p i ( e t − 1) = e E [ X ]( e t − 1) ∏ = i =1

  17. n ∑ Let μ = E [ X ] = p i i =1

  18. n ∑ Let μ = E [ X ] = p i i =1 For , we can deduce t > 0

  19. n ∑ Let μ = E [ X ] = p i i =1 For , we can deduce t > 0 Pr[ X > (1 + δ ) μ ] = Pr[ e tX ≥ e t (1+ δ ) μ ] e t (1+ δ ) μ = e ( e t − 1) μ ≤ E [ e tX ] e t (1+ δ ) μ

  20. n ∑ Let μ = E [ X ] = p i i =1 For , we can deduce t > 0 Pr[ X > (1 + δ ) μ ] = Pr[ e tX ≥ e t (1+ δ ) μ ] e t (1+ δ ) μ = e ( e t − 1) μ ≤ E [ e tX ] e t (1+ δ ) μ In order to obtain a tight bound, we optimize to t minimize

  21. e ( e t − 1) μ e t (1+ δ ) μ = e μ ( e t − 1 − t (1+ δ )) Since , we can choose . t = log(1 + δ ) > 0

  22. e ( e t − 1) μ e t (1+ δ ) μ = e μ ( e t − 1 − t (1+ δ )) Since , we can choose . t = log(1 + δ ) > 0 So Pr[ X > (1 + δ ) μ ] ≤ ( (1 + δ ) 1+ δ ) μ e δ

  23. e ( e t − 1) μ e t (1+ δ ) μ = e μ ( e t − 1 − t (1+ δ )) Since , we can choose . t = log(1 + δ ) > 0 So Pr[ X > (1 + δ ) μ ] ≤ ( (1 + δ ) 1+ δ ) μ e δ We can similarly obtain (using ) t < 0

  24. e ( e t − 1) μ e t (1+ δ ) μ = e μ ( e t − 1 − t (1+ δ )) Since , we can choose . t = log(1 + δ ) > 0 So Pr[ X > (1 + δ ) μ ] ≤ ( (1 + δ ) 1+ δ ) μ e δ We can similarly obtain (using ) t < 0 Pr[ X < (1 − δ ) μ ] ≤ ( μ (1 − δ ) 1 − δ ) e − δ

  25. n ∑ To summarize, for , we have X = X i i =1 Pr[ X ≥ (1 + δ ) μ ] ≤ ( (1 + δ ) 1+ δ ) μ e δ • Pr[ X ≤ (1 − δ ) μ ] ≤ ( μ (1 − δ ) 1 − δ ) e − δ •

  26. n ∑ To summarize, for , we have X = X i i =1 Pr[ X ≥ (1 + δ ) μ ] ≤ ( (1 + δ ) 1+ δ ) μ e δ • Pr[ X ≤ (1 − δ ) μ ] ≤ ( μ (1 − δ ) 1 − δ ) e − δ • A more useful expression is that for 0 < δ ≤ 1 • Pr[ X ≥ (1 + δ ) μ ] ≤ e − μδ 2 /3 • Pr[ X ≤ (1 − δ ) μ ] ≤ e − μδ 2 /2

  27. Max Load

  28. Max Load Recall in the max load problem, we throw balls into n bins n

  29. Max Load Recall in the max load problem, we throw balls into n bins n X i ∼ Bin ( n , 1 n ) The number of balls in -th bin, i

  30. Max Load Recall in the max load problem, we throw balls into n bins n X i ∼ Bin ( n , 1 n ) The number of balls in -th bin, i Note that , what is the probability that E [ X i ] = 1 X i > c log n ? log log n

  31. 1 + δ = c log n In this case, . log log n

  32. 1 + δ = c log n In this case, . log log n Applying Chernoff bound, we obtain

  33. 1 + δ = c log n In this case, . log log n Applying Chernoff bound, we obtain e δ Pr[ X i ≥ c log n (1 + δ ) 1+ δ ≤ n − c + o (1) , log log n ] ≤

  34. 1 + δ = c log n In this case, . log log n Applying Chernoff bound, we obtain e δ Pr[ X i ≥ c log n (1 + δ ) 1+ δ ≤ n − c + o (1) , log log n ] ≤ which is tight in order comparing to our analytic result.

  35. The Chernoff bound has a few drawbacks:

  36. The Chernoff bound has a few drawbacks: • each needs to be independent. X i • is required to follow the X i Ber( p i )

  37. The Chernoff bound has a few drawbacks: • each needs to be independent. X i • is required to follow the X i Ber( p i ) We will try to generalize the Chernoff bound to overcome these issues

  38. Hoeffding Inequality

  39. Hoeffding Inequality The Hoeffding Inequality generalizes to those with X i and . a i ≤ X i ≤ b i E [ X i ] = 0

  40. Hoeffding Inequality The Hoeffding Inequality generalizes to those with X i and . a i ≤ X i ≤ b i E [ X i ] = 0 Pr [ X i ≥ t ] ≤ exp ( − i =1 ( b i − a i ) 2 ) n 2 t 2 ∑ ∑ n i =1

  41. The key property to establish Hoeffding inequality is an upper bound on the moment generating function

  42. The key property to establish Hoeffding inequality is an upper bound on the moment generating function Lemma Assume satisfies and , then X ∈ [ a , b ] X E [ X ] = 0 E [ e tX ] ≤ exp ( ) t 2 8 ( b − a ) 2

  43. The key property to establish Hoeffding inequality is an upper bound on the moment generating function Lemma Assume satisfies and , then X ∈ [ a , b ] X E [ X ] = 0 E [ e tX ] ≤ exp ( ) t 2 8 ( b − a ) 2 You can find the proof of the lemma and Hoeffding inequality in the book Probability and Computing

  44. Multi-Armed Bandit

  45. Multi-Armed Bandit In the problem of MAB, there are bandits k

  46. Multi-Armed Bandit In the problem of MAB, there are bandits k • each bandit has a unknown random reward distribution on with f i [0,1] μ i = E [ f i ] • each round one can pull an arm and obtain i a reward r ∼ f i

  47. Multi-Armed Bandit In the problem of MAB, there are bandits k • each bandit has a unknown random reward distribution on with f i [0,1] μ i = E [ f i ] • each round one can pull an arm and obtain i a reward r ∼ f i The goal is to identify the best arm via trials

  48. Regret of MAB

  49. Regret of MAB We assume μ 1 = max 1 ≤ i ≤ k μ i

  50. Regret of MAB We assume μ 1 = max 1 ≤ i ≤ k μ i If the game is played for rounds, the best reward T on can obtain is in expectation T μ 1

  51. Regret of MAB We assume μ 1 = max 1 ≤ i ≤ k μ i If the game is played for rounds, the best reward T on can obtain is in expectation T μ 1 We are often not so lucky to achieve this, so the goal is to find a strategy to minimize

  52. Regret of MAB We assume μ 1 = max 1 ≤ i ≤ k μ i If the game is played for rounds, the best reward T on can obtain is in expectation T μ 1 We are often not so lucky to achieve this, so the goal is to find a strategy to minimize T ∑ R ( T ) = T μ 1 − μ a t t =1 - the arm actually a t Regret Best Reward pulled at round t

  53. What is a good strategy?

  54. What is a good strategy? We view as a function of and consider T → ∞ R ( T ) T

  55. What is a good strategy? We view as a function of and consider T → ∞ R ( T ) T If we eventually find the best arm, then R ( T ) = o ( T )

  56. What is a good strategy? We view as a function of and consider T → ∞ R ( T ) T If we eventually find the best arm, then R ( T ) = o ( T ) If we fail to find the best arm, we will suffer a regret , where the gap between the optimal and Ω ( Δ T ) Δ suboptimal rewards

  57. What is a good strategy? We view as a function of and consider T → ∞ R ( T ) T If we eventually find the best arm, then R ( T ) = o ( T ) If we fail to find the best arm, we will suffer a regret , where the gap between the optimal and Ω ( Δ T ) Δ suboptimal rewards So we need the failure probability is O (1/ T )

  58. The Upper Confidence Bound Algorithm

  59. The Upper Confidence Bound Algorithm We collect information up to round T

  60. ̂ The Upper Confidence Bound Algorithm We collect information up to round T • - number of times that -th arm has been n i ( T ) i pulled • - estimate of the mean , which is equal to μ i ( T ) μ i ∑ T t =1 1 [ a t = i ] ⋅ r ( t ) if and is the n i ( T ) ≠ 0 r ( t ) n i ( T ) reward at -th round t

  61. Choose the Best Arm So Far?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend