PAC Identification of Many Good Arms in Stochastic Multi-Armed - - PowerPoint PPT Presentation

pac identification of many good arms in stochastic multi
SMART_READER_LITE
LIVE PREVIEW

PAC Identification of Many Good Arms in Stochastic Multi-Armed - - PowerPoint PPT Presentation

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Indian Institute of Technology Bombay, India 1 / 8 What Is It All About? 2 / 8 What Is It All


slide-1
SLIDE 1

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits

Arghya Roy Chaudhuri under the guidance of

  • Prof. Shivaram Kalyanakrishnan

Indian Institute of Technology Bombay, India

1 / 8

slide-2
SLIDE 2

What Is It All About?

2 / 8

slide-3
SLIDE 3

What Is It All About?

2 / 8

slide-4
SLIDE 4

What Is It All About?

2 / 8

slide-5
SLIDE 5

What Is It All About?

3 / 8

slide-6
SLIDE 6

What Is a Multi-Armed Bandit?

1.0 0.5 0.0

Mean (Unknown)

0.9 0.5 0.2

Bandits: Slot machines Mean: Pr[Reward = 1]

4 / 8

slide-7
SLIDE 7

What Is a Multi-Armed Bandit?

1.0 0.5 0.0

Mean (Unknown)

0.9 0.5 0.2

Bandits: Slot machines Mean: Pr[Reward = 1] To identify the best arm: E[SC] = Ω n ǫ2 log 1 δ

  • To identify the best subset of size

m: E[SC] = Ω n ǫ2 log m δ

  • 4 / 8
slide-8
SLIDE 8

What Is a Multi-Armed Bandit?

1.0 0.5 0.0

Mean (Unknown)

0.9 0.5 0.2

Bandits: Slot machines Mean: Pr[Reward = 1] To identify the best arm: E[SC] = Ω n ǫ2 log 1 δ

  • To identify the best subset of size

m: E[SC] = Ω n ǫ2 log m δ

  • We need an alternative.

4 / 8

slide-9
SLIDE 9

Large Bandit Instances

Difficulty for n ≫ T: limn→∞ n

ǫ2 log 1 δ = ∞.

5 / 8

slide-10
SLIDE 10

Large Bandit Instances

Difficulty for n ≫ T: limn→∞ n

ǫ2 log 1 δ = ∞.

Get around: Identifying 1 from the best ρ-fraction is possible.

5 / 8

slide-11
SLIDE 11

Large Bandit Instances

Difficulty for n ≫ T: limn→∞ n

ǫ2 log 1 δ = ∞.

Get around: Identifying 1 from the best ρ-fraction is possible.

5 / 8

slide-12
SLIDE 12

Large Bandit Instances

Difficulty for n ≫ T: limn→∞ n

ǫ2 log 1 δ = ∞.

Get around: Identifying 1 from the best ρ-fraction is possible.

5 / 8

slide-13
SLIDE 13

Large Bandit Instances

Difficulty for n ≫ T: limn→∞ n

ǫ2 log 1 δ = ∞.

Get around: Identifying 1 from the best ρ-fraction is possible.

5 / 8

slide-14
SLIDE 14

Large Bandit Instances

Difficulty for n ≫ T: limn→∞ n

ǫ2 log 1 δ = ∞.

Get around: Identifying 1 from the best ρ-fraction is possible.

5 / 8

slide-15
SLIDE 15

Large Bandit Instances

Difficulty for n ≫ T: limn→∞ n

ǫ2 log 1 δ = ∞.

Get around: Identifying 1 from the best ρ-fraction is possible.

5 / 8

slide-16
SLIDE 16

Large Bandit Instances

Difficulty for n ≫ T: limn→∞ n

ǫ2 log 1 δ = ∞.

Get around: Identifying 1 from the best ρ-fraction is possible. Redefine the problem to identify 1 from the best m arms. Defining ρ = m

n , generalise the

problem. What if we n is relatively small?

5 / 8

slide-17
SLIDE 17

Finite-Armed Bandit Instances

(k, m, n): To identify any distinct k arms from the best m arms in a set

  • f n arms.

6 / 8

slide-18
SLIDE 18

Finite-Armed Bandit Instances

(k, m, n): To identify any distinct k arms from the best m arms in a set

  • f n arms.

k = 1: Any 1 arm out of the best subset of size m.

6 / 8

slide-19
SLIDE 19

Finite-Armed Bandit Instances

(k, m, n): To identify any distinct k arms from the best m arms in a set

  • f n arms.

k = m: Best subset identification.

6 / 8

slide-20
SLIDE 20

Finite-Armed Bandit Instances

(k, m, n): To identify any distinct k arms from the best m arms in a set

  • f n arms.

k = m = 1: Best arm identification.

6 / 8

slide-21
SLIDE 21

Finite-Armed Bandit Instances

(k, m, n): To identify any distinct k arms from the best m arms in a set

  • f n arms.

k = 1: Any 1 arm out of the best subset of size m. k = m: Best subset identification. k = m = 1: Best arm identification. Contributions: LUCB-k-m (Fully sequential + Adaptive). Worst case upper and lower bound.

6 / 8

slide-22
SLIDE 22

Infinite-Armed Bandit Instances

(k, ρ): To identify any distinct k arms from the best ρ fraction of arms.

7 / 8

slide-23
SLIDE 23

Infinite-Armed Bandit Instances

(k, ρ): To identify any distinct k arms from the best ρ fraction of arms.

7 / 8

slide-24
SLIDE 24

Thank You!

Poster: #54 Email: arghya@cse.iitb.ac.in

8 / 8