Advanced Algorithms (IV) Shanghai Jiao Tong University Chihao Zhang - - PowerPoint PPT Presentation

advanced algorithms iv
SMART_READER_LITE
LIVE PREVIEW

Advanced Algorithms (IV) Shanghai Jiao Tong University Chihao Zhang - - PowerPoint PPT Presentation

Advanced Algorithms (IV) Shanghai Jiao Tong University Chihao Zhang March 23rd, 2020 Review Review We learnt the Markov inequality Pr[ X a ] E [ X ] a Review We learnt the Markov inequality Pr[ X a ] E [ X ] a We can


slide-1
SLIDE 1

Advanced Algorithms (IV)

Shanghai Jiao Tong University

Chihao Zhang

March 23rd, 2020

slide-2
SLIDE 2

Review

slide-3
SLIDE 3

Review

We learnt the Markov inequality Pr[X ≥ a] ≤ E[X] a

slide-4
SLIDE 4

Review

We learnt the Markov inequality Pr[X ≥ a] ≤ E[X] a We can choose an increasing function so that

f

Pr[X ≥ a] = Pr[f(X) ≥ f(a)] ≤ E[f(X)] f(a)

slide-5
SLIDE 5
slide-6
SLIDE 6

yields the Chebyshev’s inequality

f(x) = x2

slide-7
SLIDE 7

yields the Chebyshev’s inequality

f(x) = x2

Pr[|X − E[X]| ≥ a] ≤ Var[X] a2 = E[X2] − E[X]2 a2

slide-8
SLIDE 8

yields the Chebyshev’s inequality

f(x) = x2

Pr[|X − E[X]| ≥ a] ≤ Var[X] a2 = E[X2] − E[X]2 a2 What is a good choice of ?

f

slide-9
SLIDE 9

yields the Chebyshev’s inequality

f(x) = x2

Pr[|X − E[X]| ≥ a] ≤ Var[X] a2 = E[X2] − E[X]2 a2 What is a good choice of ?

f

  • grows fast
  • is bounded and easy to calculate

f E[f(X)]

slide-10
SLIDE 10

Moment Generating Function

slide-11
SLIDE 11

Moment Generating Function

The function is a natural choice

f(x) = etx

slide-12
SLIDE 12

Moment Generating Function

The function is a natural choice

f(x) = etx

The function is called the moment generating function

E[f(X)] = E[etX]

slide-13
SLIDE 13

Moment Generating Function

The function is a natural choice

f(x) = etx

The function is called the moment generating function

E[f(X)] = E[etX]

In some cases, is easy to calculate…

E[etX]

slide-14
SLIDE 14

Chernoff Bound

slide-15
SLIDE 15

Chernoff Bound

Assume , where each is an independent Bernoulli variable with mean

X =

n

i=1

Xi Xi ∼ Ber(pi) pi

slide-16
SLIDE 16

Chernoff Bound

Assume , where each is an independent Bernoulli variable with mean

X =

n

i=1

Xi Xi ∼ Ber(pi) pi

slide-17
SLIDE 17

Chernoff Bound

Assume , where each is an independent Bernoulli variable with mean

X =

n

i=1

Xi Xi ∼ Ber(pi) pi

E[etX] = E[et∑n

i=1 Xi]

=

n

i=1

E[eXi] =

n

i=1

(pi ⋅ et + 1 − pi) =

n

i=1

epi(et−1) = eE[X](et−1)

slide-18
SLIDE 18
slide-19
SLIDE 19

Let μ = E[X] =

n

i=1

pi

slide-20
SLIDE 20

For , we can deduce

t > 0

Let μ = E[X] =

n

i=1

pi

slide-21
SLIDE 21

For , we can deduce

t > 0

Let μ = E[X] =

n

i=1

pi

Pr[X > (1 + δ)μ] = Pr[etX ≥ et(1+δ)μ] ≤ E[etX] et(1+δ)μ = e(et−1)μ et(1+δ)μ

slide-22
SLIDE 22

For , we can deduce

t > 0

Let μ = E[X] =

n

i=1

pi

In order to obtain a tight bound, we optimize to minimize

t

Pr[X > (1 + δ)μ] = Pr[etX ≥ et(1+δ)μ] ≤ E[etX] et(1+δ)μ = e(et−1)μ et(1+δ)μ

slide-23
SLIDE 23
slide-24
SLIDE 24

Since , we can choose .

e(et−1)μ et(1+δ)μ = eμ(et−1−t(1+δ)) t = log(1 + δ) > 0

slide-25
SLIDE 25

Since , we can choose .

e(et−1)μ et(1+δ)μ = eμ(et−1−t(1+δ)) t = log(1 + δ) > 0

So Pr[X > (1 + δ)μ] ≤ (

eδ (1 + δ)1+δ )

μ

slide-26
SLIDE 26

Since , we can choose .

e(et−1)μ et(1+δ)μ = eμ(et−1−t(1+δ)) t = log(1 + δ) > 0

So Pr[X > (1 + δ)μ] ≤ (

eδ (1 + δ)1+δ )

μ

We can similarly obtain (using )

t < 0

slide-27
SLIDE 27

Since , we can choose .

e(et−1)μ et(1+δ)μ = eμ(et−1−t(1+δ)) t = log(1 + δ) > 0

So Pr[X > (1 + δ)μ] ≤ (

eδ (1 + δ)1+δ )

μ

We can similarly obtain (using )

t < 0

Pr[X < (1 − δ)μ] ≤ ( e−δ (1 − δ)1−δ)

μ

slide-28
SLIDE 28
slide-29
SLIDE 29

To summarize, for , we have

X =

n

i=1

Xi

  • Pr[X ≥ (1 + δ)μ] ≤ (

eδ (1 + δ)1+δ )

μ

Pr[X ≤ (1 − δ)μ] ≤ ( e−δ (1 − δ)1−δ)

μ

slide-30
SLIDE 30

A more useful expression is that for 0 < δ ≤ 1

  • Pr[X ≥ (1 + δ)μ] ≤ e−μδ2/3

Pr[X ≤ (1 − δ)μ] ≤ e−μδ2/2

To summarize, for , we have

X =

n

i=1

Xi

  • Pr[X ≥ (1 + δ)μ] ≤ (

eδ (1 + δ)1+δ )

μ

Pr[X ≤ (1 − δ)μ] ≤ ( e−δ (1 − δ)1−δ)

μ

slide-31
SLIDE 31

Max Load

slide-32
SLIDE 32

Max Load

Recall in the max load problem, we throw balls into bins

n n

slide-33
SLIDE 33

Max Load

Recall in the max load problem, we throw balls into bins

n n

The number of balls in -th bin,

i Xi ∼ Bin (n, 1 n )

slide-34
SLIDE 34

Max Load

Recall in the max load problem, we throw balls into bins

n n

The number of balls in -th bin,

i Xi ∼ Bin (n, 1 n )

Note that , what is the probability that ?

E[Xi] = 1 Xi > c log n log log n

slide-35
SLIDE 35
slide-36
SLIDE 36

In this case, .

1 + δ = c log n log log n

slide-37
SLIDE 37

In this case, .

1 + δ = c log n log log n

Applying Chernoff bound, we obtain

slide-38
SLIDE 38

In this case, .

1 + δ = c log n log log n

Applying Chernoff bound, we obtain Pr[Xi ≥ c log n log log n ] ≤ eδ (1 + δ)1+δ ≤ n−c+o(1),

slide-39
SLIDE 39

In this case, .

1 + δ = c log n log log n

Applying Chernoff bound, we obtain Pr[Xi ≥ c log n log log n ] ≤ eδ (1 + δ)1+δ ≤ n−c+o(1), which is tight in order comparing to our analytic result.

slide-40
SLIDE 40
slide-41
SLIDE 41

The Chernoff bound has a few drawbacks:

slide-42
SLIDE 42

The Chernoff bound has a few drawbacks:

  • each

needs to be independent.

  • is required to follow the

Xi Xi Ber(pi)

slide-43
SLIDE 43

The Chernoff bound has a few drawbacks:

  • each

needs to be independent.

  • is required to follow the

Xi Xi Ber(pi)

We will try to generalize the Chernoff bound to

  • vercome these issues
slide-44
SLIDE 44

Hoeffding Inequality

slide-45
SLIDE 45

Hoeffding Inequality

The Hoeffding Inequality generalizes to those with and .

Xi E[Xi] = 0 ai ≤ Xi ≤ bi

slide-46
SLIDE 46

Hoeffding Inequality

The Hoeffding Inequality generalizes to those with and .

Xi E[Xi] = 0 ai ≤ Xi ≤ bi

Pr [

n

i=1

Xi ≥ t] ≤ exp (− 2t2 ∑n

i=1 (bi − ai)2 )

slide-47
SLIDE 47
slide-48
SLIDE 48

The key property to establish Hoeffding inequality is an upper bound on the moment generating function

slide-49
SLIDE 49

The key property to establish Hoeffding inequality is an upper bound on the moment generating function Lemma Assume satisfies and , then

X X ∈ [a, b] E[X] = 0 E[etX] ≤ exp ( t2 8 (b − a)2 )

slide-50
SLIDE 50

The key property to establish Hoeffding inequality is an upper bound on the moment generating function Lemma Assume satisfies and , then

X X ∈ [a, b] E[X] = 0 E[etX] ≤ exp ( t2 8 (b − a)2 )

You can find the proof of the lemma and Hoeffding inequality in the book Probability and Computing

slide-51
SLIDE 51

Multi-Armed Bandit

slide-52
SLIDE 52

Multi-Armed Bandit

In the problem of MAB, there are bandits

k

slide-53
SLIDE 53

Multi-Armed Bandit

In the problem of MAB, there are bandits

k

  • each bandit has a unknown random reward

distribution on with

  • each round one can pull an arm and obtain

a reward

fi [0,1] μi = E[fi] i r ∼ fi

slide-54
SLIDE 54

Multi-Armed Bandit

In the problem of MAB, there are bandits

k

The goal is to identify the best arm via trials

  • each bandit has a unknown random reward

distribution on with

  • each round one can pull an arm and obtain

a reward

fi [0,1] μi = E[fi] i r ∼ fi

slide-55
SLIDE 55

Regret of MAB

slide-56
SLIDE 56

Regret of MAB

We assume μ1 = max

1≤i≤k μi

slide-57
SLIDE 57

Regret of MAB

If the game is played for rounds, the best reward

  • n can obtain is

in expectation

T Tμ1

We assume μ1 = max

1≤i≤k μi

slide-58
SLIDE 58

Regret of MAB

If the game is played for rounds, the best reward

  • n can obtain is

in expectation

T Tμ1

We are often not so lucky to achieve this, so the goal is to find a strategy to minimize We assume μ1 = max

1≤i≤k μi

slide-59
SLIDE 59

Regret of MAB

If the game is played for rounds, the best reward

  • n can obtain is

in expectation

T Tμ1

We are often not so lucky to achieve this, so the goal is to find a strategy to minimize R(T) = Tμ1 −

T

t=1

μat

  • the arm actually

pulled at round

at t

Regret Best Reward

We assume μ1 = max

1≤i≤k μi

slide-60
SLIDE 60

What is a good strategy?

slide-61
SLIDE 61

What is a good strategy?

We view as a function of and consider

R(T) T T → ∞

slide-62
SLIDE 62

What is a good strategy?

We view as a function of and consider

R(T) T T → ∞

If we eventually find the best arm, then R(T) = o(T)

slide-63
SLIDE 63

What is a good strategy?

We view as a function of and consider

R(T) T T → ∞

If we eventually find the best arm, then R(T) = o(T) If we fail to find the best arm, we will suffer a regret , where the gap between the optimal and suboptimal rewards

Ω(ΔT) Δ

slide-64
SLIDE 64

What is a good strategy?

We view as a function of and consider

R(T) T T → ∞

If we eventually find the best arm, then R(T) = o(T) If we fail to find the best arm, we will suffer a regret , where the gap between the optimal and suboptimal rewards

Ω(ΔT) Δ

So we need the failure probability is O(1/T)

slide-65
SLIDE 65

The Upper Confidence Bound Algorithm

slide-66
SLIDE 66

The Upper Confidence Bound Algorithm

We collect information up to round T

slide-67
SLIDE 67

The Upper Confidence Bound Algorithm

We collect information up to round T

  • number of times that -th arm has been

pulled

  • estimate of the mean , which is equal to

if and is the reward at -th round

ni(T) i ̂ μi(T) μi ∑T

t=1 1[at = i] ⋅ r(t)

ni(T) ni(T) ≠ 0 r(t) t

slide-68
SLIDE 68

Choose the Best Arm So Far?

slide-69
SLIDE 69

Choose the Best Arm So Far?

The most straightforward idea is to choose the arm with best ̂

μi(T)

slide-70
SLIDE 70

Choose the Best Arm So Far?

The most straightforward idea is to choose the arm with best ̂

μi(T)

The strategy might be inferior in case that we are unlucky so that the best arm performs bad at the first few trials.

slide-71
SLIDE 71

Choose the Best Arm So Far?

The most straightforward idea is to choose the arm with best ̂

μi(T)

The strategy might be inferior in case that we are unlucky so that the best arm performs bad at the first few trials. So we have to add some offset term for those arms that are not “well-explored”

slide-72
SLIDE 72
slide-73
SLIDE 73

The UCB algorithm chooses the arm with largest ̂ μi(T) + ci(T)

  • the confidence

term of arm at round

ci(T) i T

slide-74
SLIDE 74

The UCB algorithm chooses the arm with largest ̂ μi(T) + ci(T)

  • the confidence

term of arm at round

ci(T) i T

slide-75
SLIDE 75

The UCB algorithm chooses the arm with largest ̂ μi(T) + ci(T)

  • the confidence

term of arm at round

ci(T) i T

Intuitively, should be decreasing in , so we give more chances to arms that have not been well-tested

ci(T) ni

slide-76
SLIDE 76

The UCB algorithm chooses the arm with largest ̂ μi(T) + ci(T)

  • the confidence

term of arm at round

ci(T) i T

Intuitively, should be decreasing in , so we give more chances to arms that have not been well-tested

ci(T) ni

Let’s find out how to set ci(T)

slide-77
SLIDE 77

We need the following event to happen whp ∀2 ≤ i ≤ k, ̂ μ1(T) + c1(T) > ̂ μi(T) + ci(T)

slide-78
SLIDE 78

We need the following event to happen whp ∀2 ≤ i ≤ k, ̂ μ1(T) + c1(T) > ̂ μi(T) + ci(T) A sufficient condition for this is ̂ μ1(T) + c1(T) > μ1 > μi + 2ci(T) > ̂ μi(T) + ci(T)

slide-79
SLIDE 79

We need the following event to happen whp ∀2 ≤ i ≤ k, ̂ μ1(T) + c1(T) > ̂ μi(T) + ci(T) A sufficient condition for this is ̂ μ1(T) + c1(T) > μ1 > μi + 2ci(T) > ̂ μi(T) + ci(T) :

> + > ∀j, ̂ μj(T) − cj(T) < μj < ̂ μj(T) + cj(T)

slide-80
SLIDE 80

We need the following event to happen whp ∀2 ≤ i ≤ k, ̂ μ1(T) + c1(T) > ̂ μi(T) + ci(T) A sufficient condition for this is ̂ μ1(T) + c1(T) > μ1 > μi + 2ci(T) > ̂ μi(T) + ci(T) :

> + > ∀j, ̂ μj(T) − cj(T) < μj < ̂ μj(T) + cj(T)

:

> ∀i ≥ 2, ci(T) < μ1 − μi 2

slide-81
SLIDE 81

We need the following event to happen whp ∀2 ≤ i ≤ k, ̂ μ1(T) + c1(T) > ̂ μi(T) + ci(T) A sufficient condition for this is ̂ μ1(T) + c1(T) > μ1 > μi + 2ci(T) > ̂ μi(T) + ci(T) :

> + > ∀j, ̂ μj(T) − cj(T) < μj < ̂ μj(T) + cj(T)

:

> ∀i ≥ 2, ci(T) < μ1 − μi 2

Trade-off on cj(T)

slide-82
SLIDE 82
slide-83
SLIDE 83

We apply Hoeffding inequality to bound the probability of

slide-84
SLIDE 84

We apply Hoeffding inequality to bound the probability of ∀j, ∀t ≤ T, ̂ μj(t) − cj(t) < μj < ̂ μj(t) + cj(t)

slide-85
SLIDE 85

We apply Hoeffding inequality to bound the probability of ∀j, ∀t ≤ T, ̂ μj(t) − cj(t) < μj < ̂ μj(t) + cj(t)

slide-86
SLIDE 86

We apply Hoeffding inequality to bound the probability of ∀j, ∀t ≤ T, ̂ μj(t) − cj(t) < μj < ̂ μj(t) + cj(t)

Pr[| ̂ μj(t) − μj| > cj(t)] ≤ 2 exp (− 2c2

j

nj(1/nj)2 ) = 2 exp(−2c2

j nj)

slide-87
SLIDE 87

We apply Hoeffding inequality to bound the probability of ∀j, ∀t ≤ T, ̂ μj(t) − cj(t) < μj < ̂ μj(t) + cj(t)

Pr[| ̂ μj(t) − μj| > cj(t)] ≤ 2 exp (− 2c2

j

nj(1/nj)2 ) = 2 exp(−2c2

j nj)

So the Hoeffding inequality suggests us to choose

cj(T) = Ω( log T nj(T) )

slide-88
SLIDE 88
slide-89
SLIDE 89

For this choice of , the condition becomes to

ci(T) ci(T) < μ1 − μi 2

slide-90
SLIDE 90

For this choice of , the condition becomes to

ci(T) ci(T) < μ1 − μi 2

c log T ni(T) < μ1 − μi 2

slide-91
SLIDE 91

For this choice of , the condition becomes to

ci(T) ci(T) < μ1 − μi 2

c log T ni(T) < μ1 − μi 2 This means , so we need to try each arm times for free!

ni(T) = Ω(log T) Ω(log T)

slide-92
SLIDE 92

Some tedious calculations are required to obtain the final regret bound, which is Θ(log T) For this choice of , the condition becomes to

ci(T) ci(T) < μ1 − μi 2

c log T ni(T) < μ1 − μi 2 This means , so we need to try each arm times for free!

ni(T) = Ω(log T) Ω(log T)