The multi armed-bandit problem (with covariates if we have time) - - PowerPoint PPT Presentation

the multi armed bandit problem
SMART_READER_LITE
LIVE PREVIEW

The multi armed-bandit problem (with covariates if we have time) - - PowerPoint PPT Presentation

The multi armed-bandit problem (with covariates if we have time) Vianney Perchet & Philippe Rigollet LPMA ORFE Universit Paris Diderot Princeton University Algorithms and Dynamics for Games and Optimization October, 14-18th 2013


slide-1
SLIDE 1

The multi armed-bandit problem

(with covariates if we have time)

Vianney Perchet & Philippe Rigollet

LPMA ORFE Université Paris – Diderot Princeton University

Algorithms and Dynamics for Games and Optimization October, 14-18th 2013

slide-2
SLIDE 2

Introduction

Introduction

Boring and useless definitions:

Bandits: Optimization of a noisy function.

– Observations: f(x) + εx where εx is random variable – Statistics: lack of information (exploration) – Optimization: maximize f(·) (exploitation) – Games: cumulative loss/payoff/reward

Covariates: Some additional side observations gathered Start "easy": f is maximized over a finite set

Concrete, simple and understandable examples follow.

slide-3
SLIDE 3

Introduction

Some real world examples

slide-4
SLIDE 4

Introduction

Some real world examples

slide-5
SLIDE 5

Introduction

Some real world examples

slide-6
SLIDE 6

Introduction

Simplified decision problem of Google

Different firms go to Google and offer if you put my ad after the keywords "Flat Rental Paris", every time a customer clicks on it, I’ll give you bi’s euros A given ad i has some exogenous but unknown probability of being clicked pi. Displaying ad i gives in expectation pi.bi to Google. Objective of Google... maximize cumulated payoff as fast as possible.

slide-7
SLIDE 7

Introduction

Simplified decision problem of Google

Different firms go to Google and offer if you put my ad after the keywords "Flat Rental Paris", every time a customer clicks on it, I’ll give you bi’s euros A given ad i has some exogenous but unknown probability of being clicked pi. Displaying ad i gives in expectation pi.bi to Google. Objective of Google... maximize cumulated payoff as fast as possible.

Difficulties: The expected revenue of an ad i is unknown; pi cannot be estimated if ad i is not displayed. Take risk and display new ads (to compute new and maybe high pi) or be safe and display the best estimated ad ?

slide-8
SLIDE 8

Static case Framework Static Case Successive Elimination (SE)

Static bandit – No queries

Structure of a specific instance

Decision set: {1, . . . , K} (the set of "arms" ... ads). Expected payoff of arm k: f k ∈ [0, 1]. Best ad ⋆, f ⋆. Problem difficulty: ∆k = f ⋆ − f k, ∆min = min∆k>0 ∆k

Repeated decision problem. At stage t ∈ N,

Choose kt ∈ {1, . . . , K}, receive Yt ∈ [0, 1] i.i.d. expectation f kt Observe only the payoff Yt (and not f kt) and move to stage t + 1

Objectives: maximize cumulative expected payoff or Minimize regret: RT = T.f ⋆ − T

t=1 f kt = T t=1 ∆kt

Choose the quickest possible the best decision with noise.

slide-9
SLIDE 9

Static case Framework Static Case Successive Elimination (SE)

Static Case: UCB

Lower bound for K=2: RT ≥ ✷

log(T∆2

min)

∆min

with ∆min = min f ⋆ − f k

Famous algo: Upper Confidence Bound (and its variants) Using UCB, E[RT] ≤ 8

k log(T) ∆k

≤ 8K log(T)

∆min

slide-10
SLIDE 10

Static case Framework Static Case Successive Elimination (SE)

Static Case: UCB

Lower bound for K=2: RT ≥ ✷

log(T∆2

min)

∆min

with ∆min = min f ⋆ − f k

Famous algo: Upper Confidence Bound (and its variants)

– Draw each arm 1, .., K once and observe Y 1

1 , .., Y K K (Round 1)

– After stage t, compute the following:

tk = ♯ {τ ≤ t; kτ = k} the number of times arm k was drawn; ¯ Y k

t = 1

tk

  • τ≤t; kτ =k

Y k

τ an estimate of f k

– Draw the arm kt+1 = arg maxk ¯ Y k

t +

  • 2 log(t)

tk

Using UCB, E[RT] ≤ 8

k log(T) ∆k

≤ 8K log(T)

∆min

slide-11
SLIDE 11

Static case Framework Static Case Successive Elimination (SE)

Remarks on UCB

Lower bound for K=2: RT ≥ ✷

log(T∆2

min)

∆min

, ∆min = min∆k>0 ∆k UCB algo:

– Draw each arm 1, .., K once and observe Y 1

1 , .., Y K K (Round 1)

– Draw the arm kt+1 = arg maxk ¯ Y k

t +

  • 2 log(t)

tk

UCB Upper bound: E[RT] ≤ 8

k log(T) ∆k

≤ 8K log(T)

∆min

Remarks:

– Proof based on Hoeffding inequality; – Not intuitive: clearly suboptimal arms keep being drawn – MOSS, a variant of UCB, achieves E[RT] ≤ ✷K log(T∆2

min/K)

∆min

– Neither log(T) or K log(T∆2

min/K) sufficient with covariates.

slide-12
SLIDE 12

Static case Framework Static Case Successive Elimination (SE)

Successive Elimination (SE)

Simple policy based on the intuition: Determine the suboptimal arms, and do not play them. Time is divided in rounds n ∈ N:

– after round n: eliminate arms (with great proba.) suboptimal i.e., arm k s.t. ¯ Y k

n +

  • 2 log(T/n)

n

≤ ¯ Y k′

n −

  • 2 log(T/n)

n

– at round n + 1: draw each remaining arm once. Easy to describe, to understand (but not to analyse for K > 2...), intuitive. Simple confidence term (but requires knowledge of T). (SE) is a variant of Even-Dar et al. (’06) Auer and Ortner (’10)

slide-13
SLIDE 13

Static case Framework Static Case Successive Elimination (SE)

Regret of successive elimination

Theorem [P. and Rigollet (’13)] Played on K arms, the (SE) policy satisfies E[RT] ≤ ✷ min

  • k

log(T∆2

k)

∆k ,

  • TK log(K)
  • UCB:

k log(T) ∆k , MOSS: K log(T∆2

min/K)

∆min

E[RT] =

k ∆k.E[nk] with nk the number of draws of arm k

Exact bound:

slide-14
SLIDE 14

Static case Framework Static Case Successive Elimination (SE)

Regret of successive elimination

Theorem [P. and Rigollet (’13)] Played on K arms, the (SE) policy satisfies E[RT] ≤ ✷ min

  • k

log(T∆2

k)

∆k ,

  • TK log(K)
  • UCB:

k log(T) ∆k , MOSS: K log(T∆2

min/K)

∆min

E[RT] =

k ∆k.E[nk] with nk the number of draws of arm k

Exact bound: E[RT] ≤ min

  • 646
  • k

1 ∆k log

  • max

T∆2

k

18 , e

  • , 166
  • TK log(K)
slide-15
SLIDE 15

Static case Framework Static Case Successive Elimination (SE)

Successive Elimination: Example

f 1 f 2 Two arms A round: a draw of both arms

slide-16
SLIDE 16

Static case Framework Static Case Successive Elimination (SE)

Successive Elimination: Example

f 1 f 2 Two arms Round 1: ¯ Y 1

1

¯ Y 2

1

  • 2log(T/1)

1

no elimination A round: a draw of both arms

slide-17
SLIDE 17

Static case Framework Static Case Successive Elimination (SE)

Successive Elimination: Example

f 1 f 2 Two arms Round 1: ¯ Y 1

2

¯ Y 2

2

  • 2log(T/2)

2

no elimination Round 2: no elimination A round: a draw of both arms

slide-18
SLIDE 18

Static case Framework Static Case Successive Elimination (SE)

Successive Elimination: Example

f 1 f 2 Two arms Round 1: ¯ Y 1

3

¯ Y 2

3

  • 2log(T/3)

3

no elimination Round 2: elimination Round 3: no elimination A round: a draw of both arms

slide-19
SLIDE 19

Static case Framework Static Case Successive Elimination (SE)

Sketch of proof with K = 2

Basic idea: arm 2 (subopt.) eliminated at the first round n s.t.: ¯ Y 2

n +

  • 2log(T/n)

n ≤ ¯ Y 1

n −

  • 2log(T/n)

n

slide-20
SLIDE 20

Static case Framework Static Case Successive Elimination (SE)

Sketch of proof with K = 2

Basic idea: arm 2 (subopt.) eliminated at the first round n s.t.: f 2 +

  • 2log(T/n)

n ≤ f 1 −

  • 2log(T/n)

n

slide-21
SLIDE 21

Static case Framework Static Case Successive Elimination (SE)

Sketch of proof with K = 2

Basic idea: arm 2 (subopt.) eliminated at the first round n s.t.: f 1 − f 2 = ∆2 ≥ 2

  • 2log(T/n)

n

slide-22
SLIDE 22

Static case Framework Static Case Successive Elimination (SE)

Sketch of proof with K = 2

Basic idea: arm 2 (subopt.) eliminated at the first round n s.t.: f 1 − f 2 = ∆2 ≥ 2

  • 2log(T/n)

n n2 ≤ ✷log(T∆2

2)

∆2

2

slide-23
SLIDE 23

Static case Framework Static Case Successive Elimination (SE)

Sketch of proof with K = 2

Basic idea: arm 2 (subopt.) eliminated at the first round n s.t.: f 1 − f 2 = ∆2 ≥ 2

  • 2log(T/n)

n n2 ≤ ✷log(T∆2

2)

∆2

2

What could go wrong: Arm 1 eliminated before round n2 P

  • ∃ n ≤ n2, ¯

Y 1

n − ¯

Y 2

n ≤ −2

  • 2log(T/n)

n

  • ≤ ✷n2

T Arm 2 not eliminated at round n2.

slide-24
SLIDE 24

Static case Framework Static Case Successive Elimination (SE)

Sketch of proof with K = 2

Basic idea: arm 2 (subopt.) eliminated at the first round n s.t.: f 1 − f 2 = ∆2 ≥ 2

  • 2log(T/n)

n n2 ≤ ✷log(T∆2

2)

∆2

2

What could go wrong: Arm 1 eliminated before round n2 (with proba. ≤ ✷ n2

T )

P

  • ∃ n ≤ n2, ¯

Y 1

n − ¯

Y 2

n ≤ −2

  • 2log(T/n)

n

  • ≤ ✷n2

T Arm 2 not eliminated at round n2.

slide-25
SLIDE 25

Static case Framework Static Case Successive Elimination (SE)

Sketch of proof with K = 2

Basic idea: arm 2 (subopt.) eliminated at the first round n s.t.: f 1 − f 2 = ∆2 ≥ 2

  • 2log(T/n)

n n2 ≤ ✷log(T∆2

2)

∆2

2

What could go wrong: Arm 1 eliminated before round n2 (with proba. ≤ ✷ n2

T )

Arm 2 not eliminated at round n2. P

  • ∀ n ≤ n2, ¯

Y 2

n − ¯

Y 1

n ≥ −2

  • 2log(T/n)

n

slide-26
SLIDE 26

Static case Framework Static Case Successive Elimination (SE)

Sketch of proof with K = 2

Basic idea: arm 2 (subopt.) eliminated at the first round n s.t.: f 1 − f 2 = ∆2 ≥ 2

  • 2log(T/n)

n n2 ≤ ✷log(T∆2

2)

∆2

2

What could go wrong: Arm 1 eliminated before round n2 (with proba. ≤ ✷ n2

T )

Arm 2 not eliminated at round n2. P   ¯ Y 2

n2 − ¯

Y 1

n2 ≥ −2

  • 2log(T/n2)

n2  

slide-27
SLIDE 27

Static case Framework Static Case Successive Elimination (SE)

Sketch of proof with K = 2

Basic idea: arm 2 (subopt.) eliminated at the first round n s.t.: f 1 − f 2 = ∆2 ≥ 2

  • 2log(T/n)

n n2 ≤ ✷log(T∆2

2)

∆2

2

What could go wrong: Arm 1 eliminated before round n2 (with proba. ≤ ✷ n2

T )

Arm 2 not eliminated at round n2. P

  • ¯

Y 2

n2 − ¯

Y 1

n2 ≥ −2∆2

slide-28
SLIDE 28

Static case Framework Static Case Successive Elimination (SE)

Sketch of proof with K = 2

Basic idea: arm 2 (subopt.) eliminated at the first round n s.t.: f 1 − f 2 = ∆2 ≥ 2

  • 2log(T/n)

n n2 ≤ ✷log(T∆2

2)

∆2

2

What could go wrong: Arm 1 eliminated before round n2 (with proba. ≤ ✷ n2

T )

Arm 2 not eliminated at round n2. (with proba. ≤ ✷ n2

T )

P

  • [ ¯

Y 1

n2 − ¯

Y 2

n2] − ∆2 ≤ −∆2

  • ≤ exp
  • −✷n2∆2

2

  • ≤ ✷n2

T

slide-29
SLIDE 29

Static case Framework Static Case Successive Elimination (SE)

Sketch of proof with K = 2

Basic idea: arm 2 (subopt.) eliminated at the first round n s.t.: f 1 − f 2 = ∆2 ≥ 2

  • 2log(T/n)

n n2 ≤ ✷log(T∆2

2)

∆2

2

What could go wrong: Arm 1 eliminated before round n2 (with proba. ≤ ✷ n2

T )

Arm 2 not eliminated at round n2. (with proba. ≤ ✷ n2

T )

Number of draws of arm 2 (each incurs a regret of ∆2): T if something wrong (w.p. ✷ n2

T ), n2 otherwise ( w.p. ≤ 1):

E[RT] ≤

  • n2 + ✷n2

T T

  • ∆2 ≤ ✷n2∆2 ≤ ✷log(T∆2

2)

∆2

slide-30
SLIDE 30

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

General Model

Covariates: Xt ∈ X = [0, 1]d, i.i.d., law µ (equivalent to) λ

Examples: request received by Amazon or Google Xt observed before taking a decision at time t ∈ N Equivalence: two unknown constants cλ(A) ≤ µ(A) ≤ cλ(A)

Decisions: kt ∈ K = {1, .., K}; construction of a policy π Payoff: Y k

t ∈ [0, 1] ∼ νk(Xt), E[Y k|X] = f k(X)

Objective: regret RT := T

t=1 f π⋆(Xt)(Xt) − f kt(Xt) ≤ o(T)

slide-31
SLIDE 31

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

General Model

Covariates: Xt ∈ X = [0, 1]d, i.i.d., law µ (equivalent to) λ Decisions: kt ∈ K = {1, .., K}; construction of a policy π

Examples: Choice of the ad to be displayed Decision kt taken after the observation of Xt at time t ∈ N Objectives: Find the best decision given the request

Payoff: Y k

t ∈ [0, 1] ∼ νk(Xt), E[Y k|X] = f k(X)

Objective: regret RT := T

t=1 f π⋆(Xt)(Xt) − f kt(Xt) ≤ o(T)

slide-32
SLIDE 32

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

General Model

Covariates: Xt ∈ X = [0, 1]d, i.i.d., law µ (equivalent to) λ Decisions: kt ∈ K = {1, .., K}; construction of a policy π Payoff: Y k

t ∈ [0, 1] ∼ νk(Xt), E[Y k|X] = f k(X)

Examples: proba/reward of click on ad k function of the request Only Y kt

t

is observed before moving to stage t + 1; Optimization: Find the decision kt that maximizes f k(Xt)

Objective: regret RT := T

t=1 f π⋆(Xt)(Xt) − f kt(Xt) ≤ o(T)

slide-33
SLIDE 33

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

General Model

Covariates: Xt ∈ X = [0, 1]d, i.i.d., law µ (equivalent to) λ Decisions: kt ∈ K = {1, .., K}; construction of a policy π Payoff: Y k

t ∈ [0, 1] ∼ νk(Xt), E[Y k|X] = f k(X)

Objective: regret RT := T

t=1 f π⋆(Xt)(Xt) − f kt(Xt) ≤ o(T)

Optimal policy: π⋆(X) = arg max f k(X); and f π⋆(X)(X) = f ⋆(X) Maximize cumulated payoffs T

t=1 f kt(Xt) or minimize regret

Find a policy π asymptotic. at least as well as π⋆ (in average)

slide-34
SLIDE 34

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity assumptions

1

Smoothness of the pb: Every f k is β-hölder, with β ∈ (0, 1]: ∃ L > 0, ∀ x, y ∈ X, f(x) − f(y) ≤ Lx − yβ

2

Complexity of the pb: (α-margin condition) ∃δ0 > 0 and C0 > 0 PX

  • 0 <
  • f 1(x) − f 2(x)
  • < δ
  • ≤ C0δα,

∀δ ∈ (0, δ0)

slide-35
SLIDE 35

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity assumptions

1

Smoothness of the pb: Every f k is β-hölder, with β ∈ (0, 1]: ∃ L > 0, ∀ x, y ∈ X, f(x) − f(y) ≤ Lx − yβ

2

Complexity of the pb: (α-margin condition) ∃δ0 > 0 and C0 > 0 PX

  • 0 <
  • f ⋆(x) − f ♯(x)
  • < δ
  • ≤ C0δα,

∀δ ∈ (0, δ0) where f ⋆(x) = maxk f k(x) is the maximal f k and f ♯(x) = max

  • f k(x) s.t. f k(x) < f ⋆(x)
  • is the second max.

With K > 2: f ⋆ is β-Hölder but f ♯ is not continuous.

slide-36
SLIDE 36

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity: an easy example (α big)

f 1(x)

slide-37
SLIDE 37

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity: an easy example (α big)

f 1(x) f 2(x)

slide-38
SLIDE 38

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity: an easy example (α big)

f 1(x) f 2(x) f 3(x)

slide-39
SLIDE 39

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity: an easy example (α big)

f 1(x) f 2(x) f 3(x) f ⋆(x)

slide-40
SLIDE 40

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity: an easy example (α big)

f 1(x) f 2(x) f 3(x) f ⋆(x) f ♯(x)

slide-41
SLIDE 41

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity: an easy example (α big)

f 1(x) f 2(x) f 3(x) f ⋆(x) f ♯(x)

slide-42
SLIDE 42

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity: a hard example (α small)

f 1(x)

slide-43
SLIDE 43

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity: a hard example (α small)

f 1(x) f 2(x)

slide-44
SLIDE 44

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity: a hard example (α small)

f 1(x) f 2(x) f 3(x)

slide-45
SLIDE 45

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity: a hard example (α small)

f 1(x) f 2(x) f 3(x) f ⋆(x)

slide-46
SLIDE 46

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity: a hard example (α small)

f 1(x) f 2(x) f 3(x) f ⋆(x) f ♯(x)

slide-47
SLIDE 47

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regularity: a hard example (α small)

f 1(x) f 2(x) f 3(x) f ⋆(x) f ♯(x)

slide-48
SLIDE 48

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Conflict between α and β

∃δ0, C0, PX

  • 0 < f ⋆(x) − f ♯(x) < δ
  • ≤ C0δα,

∀δ ∈ (0, δ0)

– First used by Goldenshluger and Zeevi (’08) – case f 1 = 0; It was an assumption on the distribution of X only – Here: fixed marginal (uniform), measures closeness of functions.

Proposition: Conflict α vs. β αβ > d = ⇒ all arms are either always or never optimal Smoothness β is known, but complexity α is not known.

slide-49
SLIDE 49

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Binned policy

f 1(x) f 2(x) f 3(x) f ⋆(x) f ♯(x)

slide-50
SLIDE 50

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Binned policy

f 1(x) f 2(x) f 3(x)

slide-51
SLIDE 51

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Binned policy

f 1(x) f 2(x) f 3(x)

slide-52
SLIDE 52

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Binned policy

– Consider the uniform partition of [0, 1]d into 1/Md bins Bins: hypercube B with side length |B| equal to M. – Each bin is an independent problem; exact value of Xt discarded – Average reward of bin B: ¯ f k

B =

  • B f k(x)dP(x)

P(B)

(P(B) ≃ Md)

Follow on each bin your favorite static policy.

Reduction to 1/Md static bandits pb. with expected reward (¯ f 1

B, ..,¯

f K

B ).

see Rigollet and Zeevi (’10)

slide-53
SLIDE 53

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Binned Successive Elimination (BSE)

f 1(x) f 2(x) f 3(x)

slide-54
SLIDE 54

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Binned Successive Elimination (BSE)

Theorem [P. and Rigollet (’11)] If 0 < α < 1, E[RT(BSE)] ≤ ✷T

  • K log(K)

T

β(1+α)

2β+d with the choice

  • f parameter M ≃
  • K log(K)

T

  • 1

2β+d

For K = 2, matches lower bound: minimax optimal w.r.t. T.

Same bound can be obtained in the full info. setting (Audibert and Tsybakov, ’07) No log(T): difficulty of nonparametric estimation washes away the effects of exploration/exploitation. α < 1: cannot attain fast rates

slide-55
SLIDE 55

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Sketch for K = 2

Decomposition of regret: E[RT(BSE)] = RH + RE Hard bins (∆B < Mβ):

RH ≤ Mβ.P (Hard) T ≤ Mβ.P

  • 0 < f ⋆ − f ♯ < Mβ

T ≤ TMβ(1+α)

Easy bins ( ∆B ≮ Mβ): with ∆B = sup

x∈B

f ⋆(x) − f ♯(x) ≃

  • B f ⋆ − f ♯dP

P(B)

slide-56
SLIDE 56

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Sketch for K = 2

Decomposition of regret: E[RT(BSE)] = RH + RE Hard bins (∆B < Mβ): RH ≤ TMβ(1+α) ≤ T

  • K log(K)

T

β(1+α)

2β+d

RH ≤ Mβ.P (Hard) T ≤ Mβ.P

  • 0 < f ⋆ − f ♯ < Mβ

T ≤ TMβ(1+α)

Easy bins ( ∆B ≮ Mβ): with ∆B = sup

x∈B

f ⋆(x) − f ♯(x) ≃

  • B f ⋆ − f ♯dP

P(B)

slide-57
SLIDE 57

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Sketch for K = 2

Decomposition of regret: E[RT(BSE)] = RH + RE Hard bins (∆B < Mβ): RH ≤ TMβ(1+α) ≤ T

  • K log(K)

T

β(1+α)

2β+d

Easy bins ( ∆B≮Mβ): RE ≤ ✷

  • easy

log

  • (TMd)∆2

B

  • ∆B

with ∆B = sup

x∈B

f ⋆(x) − f ♯(x) ≃

  • B f ⋆ − f ♯dP

P(B)

slide-58
SLIDE 58

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Sketch for K = 2

Decomposition of regret: E[RT(BSE)] = RH + RE Hard bins (∆B < Mβ): RH ≤ TMβ(1+α) ≤ T

  • K log(K)

T

β(1+α)

2β+d

Easy bins ( ∆B ≥ Mβ): RE ≤ ✷

  • easy

log

  • (TMd)∆2

B

  • ∆B

Order the ∆B as ∆1 ≤ ∆2 ≤ ... ≤ ∆M−d then ∀ℓ ∈ {1, .., M−d}, ℓMd ≤ P

  • 0 < f ⋆ − f ♯ < ∆ℓ
  • ≤ ∆α

slide-59
SLIDE 59

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Sketch for K = 2

Decomposition of regret: E[RT(BSE)] = RH + RE Hard bins (∆B < Mβ): RH ≤ TMβ(1+α) ≤ T

  • K log(K)

T

β(1+α)

2β+d

Easy bins ( ∆B ≥ Mβ): RE ≤ ✷

M−d

  • ℓ=Mαβ−d

log

  • (TMd)(ℓMd)2/α

(ℓMd)1/α Order the ∆B as ∆1 ≤ ∆2 ≤ ... ≤ ∆M−d then ∀ℓ ∈ {1, .., M−d}, ℓMd ≤ P

  • 0 < f ⋆ − f ♯ < ∆ℓ
  • ≤ ∆α

slide-60
SLIDE 60

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Sketch for K = 2

Decomposition of regret: E[RT(BSE)] = RH + RE Hard bins (∆B < Mβ): RH ≤ TMβ(1+α) ≤ T

  • K log(K)

T

β(1+α)

2β+d

Easy bins ( ∆B ≥ Mβ): RE ≤ ✷TMβ(1+α) ≤ ✷T

  • K log(K)

T

β(1+α)

2β+d

RE ≤ ✷

M−d

  • ℓ=Mαβ−d

log

  • (TMd)(ℓMd)2/α

(ℓMd)1/α ≤ TMβ(1+α) because (for α < 1):

M−d

  • ℓ=Mαβ−d

log

  • (TMd)(ℓMd)2/α

(ℓMd)1/α ≤ log

  • TM2β+d

Md+β(1−α) ≤ TMβ(1+α)

slide-61
SLIDE 61

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Sketch for K = 2

Decomposition of regret: E[RT(BSE)] = RH + RE Hard bins (∆B < Mβ): RH ≤ TMβ(1+α) ≤ T

  • K log(K)

T

β(1+α)

2β+d

Easy bins ( ∆B ≮ Mβ): RE ≤ TMβ(1+α) ≤ T

  • K log(K)

T

β(1+α)

2β+d

For α ≥ 1 additional terms: E[RT] multiplied by log(T). We always pay the number of bins (that should be large enough for non-smooth functions) Problem is: too many bins. Solution: Online/adaptive construction of the bins.

slide-62
SLIDE 62

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Suboptimality of (BSE) for α ≥ 1

f 1(x) f 2(x) f 3(x)

slide-63
SLIDE 63

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Suboptimality of (BSE) for α ≥ 1

f 1(x) f 2(x) f 3(x)

slide-64
SLIDE 64

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Adaptative BSE (ABSE)

Basic idea: Given a bin of size |B| (for K = 2): If ¯ f 1

B − ¯

f 2

B ≥ ✷|B|β then f 1 ≥ f 2 on B.

Adaptively Binned Successive Elimination Start with B = [0, 1] and |B|0 ≃

  • K log(K)

T

  • 1

2β+d

– Draw samples (in rounds) of arms when covariates are in B; – If ¯ Y k

n − ¯

Y k′

n ≥ ✷

  • log(T|B|d/n)

n

+ ✷|B|β then eliminate arm k′; – Stop after nB rounds and split B in two halves (of size |B|/2) with

  • log(T|B|d/nB)

nB

= |B|β and nB ≃ log(T|B|2β+d)

|B|2β

– Repeat the procedure on two halves (until |B| ≤ |B|0).

slide-65
SLIDE 65

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

Regret of (ABSE)

Theorem [P. and Rigollet (’11)] Fix α > 0 and 0 < β ≤ 1 then (ABSE) has a regret bounded as E[RT(ABSE)] ≤ ✷T K log(K) T β(1+α)

2β+d

Minimax optimal (Rigollet and Zeevi, 2010. See also Audibert and Tsybakov, 2007) Slivkins (2011, COLT): Zooming (abstract setup, complicated algorithm); no real purpose nor measure to adaptive policy.

slide-66
SLIDE 66

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

(ABSE) illustrated

f 2 f 1 1/4 1/2 1 [0, 1] (1 ∼ 2) keep 1 and 2 [0, 1

2] (1 ∼ 2)

keep 1 and 2 eliminate 2 [0, 1

4] (1 ≥ 2)

eliminate 1 [0, 1

4] (2 ≥ 1)

eliminate 2 [ 1

2, 1] (1 ≥ 2)

slide-67
SLIDE 67

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

(ABSE) illustrated

f 2 f 1 1/4 1/2 1 [0, 1] (1 ∼ 2) keep 1 and 2 [0, 1

2] (1 ∼ 2)

keep 1 and 2 eliminate 2 [0, 1

4] (1 ≥ 2)

eliminate 1 [0, 1

4] (2 ≥ 1)

eliminate 2 [ 1

2, 1] (1 ≥ 2)

eliminate 1 or 2 eliminate 1 or 2 eliminate 1 or not eliminate 2 eliminate 2 or not eliminate 1 eliminate 1 or not eliminate 2

slide-68
SLIDE 68

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

(ABSE) Sketch of proof

  • If everything goes right:

When a bin B is reach, one has ∆B ≤ |B|β (so regret ≤ nB|B|β).

  • What could go wrong

Terminal node:

– Eliminate arm 1 or not eliminate arm 2: Same analysis for (SE) – Happens with proba. less than ✷

nB T|B|d

– Number of times covariates in B less than T|B|d – Regret each time less than ∆B ≤ |B|β

Non-terminal node:

slide-69
SLIDE 69

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

(ABSE) Sketch of proof

  • If everything goes right:

When a bin B is reach, one has ∆B ≤ |B|β (so regret ≤ nB|B|β).

  • What could go wrong

Terminal node: RB ≤ nB|B|β ≤ log(T|B|2β+d)|B|−β

– Eliminate arm 1 or not eliminate arm 2: Same analysis for (SE) – Happens with proba. less than ✷

nB T|B|d

– Number of times covariates in B less than T|B|d – Regret each time less than ∆B ≤ |B|β

Non-terminal node:

slide-70
SLIDE 70

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

(ABSE) Sketch of proof

  • If everything goes right:

When a bin B is reach, one has ∆B ≤ |B|β (so regret ≤ nB|B|β).

  • What could go wrong

Terminal node: RB ≤ nB|B|β ≤ log(T|B|2β+d)|B|−β Non-terminal node:

– Eliminate arm 1 or eliminate arm 2 (¯ f 2

B ≤ ¯

f 1

B ≤ ¯

f 2

B + |B|β)

– For arm 1, same analysis. For arm 2: ∃ n ≤ nB, ¯ Y 1

n −

  • log(T|B|d/n)

n ≥ ¯ Y 2

n +

  • log(T|B|d/n)

n + |B|β

slide-71
SLIDE 71

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

(ABSE) Sketch of proof

  • If everything goes right:

When a bin B is reach, one has ∆B ≤ |B|β (so regret ≤ nB|B|β).

  • What could go wrong

Terminal node: RB ≤ nB|B|β ≤ log(T|B|2β+d)|B|−β Non-terminal node:

– Eliminate arm 1 or eliminate arm 2 (¯ f 2

B ≤ ¯

f 1

B ≤ ¯

f 2

B + |B|β)

– For arm 1, same analysis. For arm 2: ∃ n ≤ nB, ¯ Y 1

n − ¯

Y 2

n − ∆B ≥ 2

  • log(T|B|d/n)

n +|B|β − ∆B

slide-72
SLIDE 72

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

(ABSE) Sketch of proof

  • If everything goes right:

When a bin B is reach, one has ∆B ≤ |B|β (so regret ≤ nB|B|β).

  • What could go wrong

Terminal node: RB ≤ nB|B|β ≤ log(T|B|2β+d)|B|−β Non-terminal node:

– Eliminate arm 1 or eliminate arm 2 (¯ f 2

B ≤ ¯

f 1

B ≤ ¯

f 2

B + |B|β)

– For arm 1, same analysis. For arm 2: P

  • ∃ n ≤ nB, ¯

Y 1

n − ¯

Y 2

n − ∆B ≥ 2

  • log(T|B|d/n)

n

nB T|B|d

slide-73
SLIDE 73

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

(ABSE) Sketch of proof

  • If everything goes right:

When a bin B is reach, one has ∆B ≤ |B|β (so regret ≤ nB|B|β).

  • What could go wrong

Terminal node: RB ≤ nB|B|β ≤ log(T|B|2β+d)|B|−β Non-terminal node: RB ≤ nB|B|β ≤ log(T|B|2β+d)|B|−β

– Eliminate arm 1 or eliminate arm 2 (¯ f 2

B ≤ ¯

f 1

B ≤ ¯

f 2

B + |B|β)

– For arm 1, same analysis. For arm 2: P

  • ∃ n ≤ nB, ¯

Y 1

n − ¯

Y 2

n − ∆B ≥ 2

  • log(T|B|d/n)

n

nB T|B|d

slide-74
SLIDE 74

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

(ABSE) Sketch of proof

  • If everything goes right:

When a bin B is reach, one has ∆B ≤ |B|β (so regret ≤ nB|B|β).

  • What could go wrong

Terminal node: RB ≤ nB|B|β ≤ log(T|B|2β+d)|B|−β Non-terminal node: RB ≤ nB|B|β ≤ log(T|B|2β+d)|B|−β Nℓ =number of bins of size |B| = 2−ℓ (and 2ℓ0 = |B|0): Nℓ.2−ℓd ≤ P

  • 0 < f ⋆ − f ♯ < 2−ℓβ

≤ 2−ℓαβ and E[RT] ≤

  • B

nB|B|β ≤

ℓ0

  • ℓ=0

2ℓ(d−αβ) log

  • T2−ℓ(2β+d)

2ℓβ

slide-75
SLIDE 75

Dynamic Framework Framework Binned Successive Elimination (BSE) Adaptively BSE (ABSE)

(ABSE) Sketch of proof

  • If everything goes right:

When a bin B is reach, one has ∆B ≤ |B|β (so regret ≤ nB|B|β).

  • What could go wrong

Terminal node: RB ≤ nB|B|β ≤ log(T|B|2β+d)|B|−β Non-terminal node: RB ≤ nB|B|β ≤ log(T|B|2β+d)|B|−β Nℓ =number of bins of size |B| = 2−ℓ (and 2ℓ0 = |B|0): Nℓ.2−ℓd ≤ P

  • 0 < f ⋆ − f ♯ < 2−ℓβ

≤ 2−ℓαβ and E[RT] ≤ ✷T K log(K) T β(1+α)

2β+d

slide-76
SLIDE 76

Conclusion and Remark

Conclusion

We introduced and analyzed new policies:

Sequential Elimination: an intuitive policy with great potential for the static case; Binned SE: its generalization for hard dynamic pb; Adaptively BSE: again generalized for both easy and hard pb. – There are all minimax optimal in T; – Conjecture: also in K up to the term log(K). – They require the knowledge of T (OK) and β (more arguable) – Analysis more intricate when K > 2: optimal arm can be eliminated more easily, f ♯ non continuous – Future work: adaptive policy w.r.t. β