Bandits in Auctions (& more) Vianney Perchet joint work with P. - - PowerPoint PPT Presentation

bandits in auctions more
SMART_READER_LITE
LIVE PREVIEW

Bandits in Auctions (& more) Vianney Perchet joint work with P. - - PowerPoint PPT Presentation

Bandits in Auctions (& more) Vianney Perchet joint work with P. Rigollet (MIT) and J. Weed (MIT) CEMRACS 2017 July 20 2017 CMLA, ENS Paris-Saclay & Criteo Research Motivations & Objectives Classical Examples of Bandits Problems


slide-1
SLIDE 1

Bandits in Auctions (& more)

Vianney Perchet joint work with P. Rigollet (MIT) and J. Weed (MIT)

CEMRACS 2017 July 20 2017 CMLA, ENS Paris-Saclay & Criteo Research

slide-2
SLIDE 2

Motivations & Objectives

slide-3
SLIDE 3

Classical Examples of Bandits Problems

– Size of data: n patients with some proba of getting cured – Choose one of two treatments to prescribe

  • r

– Patients cured

  • r dead

1) Inference: Find the best treatment between the red and blue 2) Cumul: Save as many patients as possible

3

slide-4
SLIDE 4

Classical Examples of Bandits Problems

– Size of data: n banners with some proba of click – Choose one of two ads to display

  • r

– Banner clicked or ignored 1) Inference: Find the best ad between the red and blue 2) Cumul: Get as many clicks as possible

3

slide-5
SLIDE 5

Example of Repeated Auctions

Ad slot sold by lemonde.fr. 2nd-price auctions

  • Several (marketing) companies places bids
  • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...)
  • criteo chooses ad of a client, Microsoft or Cdiscount or Booking
  • criteo gets paid by the client if the user clicks on the ad

Main Problem: Repeated auctions with unknown private valuation Learn valuations, find which ad to display & good strategies

4

slide-6
SLIDE 6

Example of Repeated Auctions

Ad slot sold by lemonde.fr. 2nd-price auctions

  • Several (marketing) companies places bids
  • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...)
  • criteo chooses ad of a client, Microsoft or Cdiscount or Booking
  • criteo gets paid by the client if the user clicks on the ad

Main Problem: Repeated auctions with unknown private valuation Learn valuations, find which ad to display & good strategies

4

slide-7
SLIDE 7

Example of Repeated Auctions

Ad slot sold by lemonde.fr. 2nd-price auctions

  • Several (marketing) companies places bids
  • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...)
  • criteo chooses ad of a client, Microsoft or Cdiscount or Booking
  • criteo gets paid by the client if the user clicks on the ad

Main Problem: Repeated auctions with unknown private valuation Learn valuations, find which ad to display & good strategies

4

slide-8
SLIDE 8

Example of Repeated Auctions

Some companies whose cookies can be controlled

4

slide-9
SLIDE 9

Back to Classical Examples of Bandits Problems

– Size of data: n mails with some proba of spam – Choose one of two actions: spam or ham

  • r

– Mail correctly or incorrectly classified 1) Inference: Find the best between the red and blue 2) Cumul: Minimize number of errors as possible

5

slide-10
SLIDE 10

Back to Classical Examples of Bandits Problems

5

slide-11
SLIDE 11

Back to Classical Examples of Bandits Problems

– Size of data: n patients with some proba of getting cured – Choose one of two

  • r

– Patients cured

  • r dead

1) Inference: Find the best treatment between the red and blue 2) Cumul: Save as many patients as possible

5

slide-12
SLIDE 12

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-13
SLIDE 13

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-14
SLIDE 14

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-15
SLIDE 15

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-16
SLIDE 16

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-17
SLIDE 17

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-18
SLIDE 18

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-19
SLIDE 19

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-20
SLIDE 20

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-21
SLIDE 21

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-22
SLIDE 22

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-23
SLIDE 23

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-24
SLIDE 24

Two-Armed Bandit

– Patients arrive and are treated sequentially. – Save as many as possible.

6

slide-25
SLIDE 25

A bit of theory

7

slide-26
SLIDE 26

Stochastic Multi-Armed Bandit

slide-27
SLIDE 27

K-Armed Stochastic Bandit Problems

– K actions i ∈ {1, . . . , K}, outcome Xi

t ∈ R (sub-)Gaussian,

bounded Xi

1, Xi 2, . . . , ∼ N

( µi, 1 ) i.i.d. – Non-Anticipative Policy: πt ( Xπ1

1 , Xπ2 2 , . . . , Xπt−1 t−1

) ∈ {1, . . . , K} – Goal: Maximize expected reward ∑T

t=1 EXπt t = ∑T t=1 µπt

– Performance: Cumulative Regret RT = max

i∈{1,...,K} T

t=1

µi −

T

t=1

µπt = ∆i

T

t=1

1 { πt = i ̸= ⋆ } with ∆i = µ⋆ − µi, the “gap” or cost of error i.

9

slide-28
SLIDE 28

Most Famous algorithm [Auer, Cesa-Bianchi, Fisher, ’02]

  • UCB - “Upper Confidence Bound”

πt+1 = arg max

i

{ X

i t +

√ 2 log(t) Ti(t) } , where Ti(t) = ∑t

t=1 1{πt = i} and X i t = 1 Ti

t

s:is=i Xi s.

Regret: E RT ≲ ∑

k log(T) ∆k

Worst-Case: E RT ≲ sup

Klog(T) ∆ ∧ T∆ ≂ √ KT log(T)

10

slide-29
SLIDE 29

Ideas of proof πt+1 = arg maxi { X

i t +

2 log(t) Ti(t)

}

  • 2-lines proof:

πt+1 = i ̸= ⋆ ⇐ ⇒ X

⋆ t +

√ 2 log(t) T⋆(t) ≤ X

i t +

√ 2 log(t) Ti(t) “ = ⇒ ”∆i ≤ √ 2 log(t) Ti(t) = ⇒ Ti(t) ≲ log(t) ∆2

i

  • Number of mistakes grows as log(t)

∆2

i ; each mistake costs ∆i.

Regret at stage T ≲ ∑

i log(T) ∆2

i

× ∆i ≂ ∑

i log(T) ∆i

  • “ =

⇒ ” actually happens with overwhelming proba

  • “optimal”: no algo always has a regret smaller than ∑

i log(T) ∆i 11

slide-30
SLIDE 30

Other Algos

  • ETC [Perchet,Rigollet]. pull in round-robin then eliminate

RT ≲ ∑

k log(T∆k) ∆k

, worst case RT ≤ √ T log(K)K

  • Other algo, MOSS [Audibert, Bubeck], variants of UCB

RT ≲ K log(T∆min/K)

∆min

, worst case RT ≤ √ TK

  • Infinite number of actions x ∈ [0, 1]d with ∆(x) 1 Lipschitz.

Discretize + UCB gives RT ≲ Tε + √

T ε ≤ T2/3 12

slide-31
SLIDE 31

Adversarial Multi-Armed Bandit

slide-32
SLIDE 32

K-Armed Adversarial Bandit Problems

  • K actions i ∈ [K] = {1, . . . , K}, outcome Xi

t ∈ R bounded in [0, 1]

No assumption on Xi

1, Xi 2, . . .

  • Non-Anticipative Policy: πt

( Xπ1

1 , Xπ2 2 , . . . , Xπt−1 t−1

) ∈ [K]

  • Performance: Cumulative Regret

RT = max

i∈[K] T

t=1

Xi

t − T

t=1

Xπt

t

  • Convex optimization of p → Ep

∑T

t=1 Xi t, from ∆([K]) to [0, 1] 14

slide-33
SLIDE 33

EXP-algo

  • Main insight: πt ∼ pt ∈ ∆([K]), more weights on best actions

pi

t =

eη∑t−1

s=1 Xi s

j∈[K] eη ∑t−1

s=1 Xj s ,

η is a parameter

  • Only Xπt

t is observed, not Xt. Estimate Xt by

Xt

  • Xi

t = 1 −

(1 − Xi

t

pi

t

) 1{πt = i} and run EXP on Xt

  • E

Xi

t = 1 − (1 − pi t).0 + pi t 1−Xi

t

pi

t

= Xi

t, unbiased estimator

  • E ∑

i∈K pi t(

Xi

t)2 ≤ 1 + ∑ i∈[K] pi t

(

1−Xi

t

pi

t

)2 pi

t ≤ K + 1 bounded variance

  • Using this estimate we obtain that

ERT ≤ log(K) η + η(K + 1)T ≤ 3 √ log(K)KT

15

slide-34
SLIDE 34

Bandits & Repeated Auctions

slide-35
SLIDE 35

Back to Repeated Auctions

Ad slot sold by lemonde.fr. 2nd-price auctions

  • Several (marketing) companies places bids
  • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...)
  • criteo chooses ad of a client, Microsoft or Cdiscount or Booking
  • criteo gets paid by the client if the user clicks on the ad

Main Problem: Repeated auctions with unknown private valuation Learn valuations, find which ad to display & good strategies

17

slide-36
SLIDE 36

2nd price Auctions

  • A good is sold on second price auctions auction.
  • Each buyer, with valuation v(i), puts a bet b(i)
  • The highest bidder wins and pays second highest bid

b♯ = maxi̸=argmax b(i) (ties broken arbitrarily) Truthful auctions

  • ptimal strategy bid its own valuation b(i) = v(i)
  • Utility of bidder :

( v(i) − b♯) 1{b(i) ≥ b♯}

  • if b(i) > v(i) might only pay too much
  • if b(i) > v(i) might loose the auction

18

slide-37
SLIDE 37

Reserve price

  • Utility of highest value: v⋆ − b♯
  • Utility of seller (value v0): b♯ − v0, can be negative !

Reserve price A threshold c: if b∗ ≥ c; price max{b♯, c} otherwise not sold

  • Still truthful: c is a bid
  • Optimal reserve price c∗ max. E(max{v♯, c} − v0)1{v∗ ≥ c}
  • Depends on the (actually unknown) distributions of value.

19

slide-38
SLIDE 38

Main model

  • Learning optimal reserve price [Cesa-Bianchi, Gentile, Mansour]

From the point of view of a bidder ?

  • At round t = 1, . . . , T:

bidder bids bt ∈ [0, 1] if bt > mt (maximum other bids & reserve price) win good, observe value vt ∈ [0, 1]

  • Total utility: ∑T

t=1(vt − mt)1{bt > mt}

  • Total regret:

max

b∈[0,1] T

t=1

(vt − mt)1{b > mt} −

T

t=1

(vt − mt)1{bt > mt}

20

slide-39
SLIDE 39

Data Assumptions - Stochastic vs Adversarial

  • Stochastic: vt i.i.d. E[vt] = v ∈ [0, 1]

mt stochastic (i.i.d. E[mt] = m), indpt. of vt mt adversarial (no assumptions), indpt. of vt

In both cases, expected regret attained at v.

T

t=1

(v − mt)1{v > mt} −

T

t=1

(v − mt)1{bt > mt}

  • Adversarial: no assumptions at all on vt and mt

Tools that we will use Variants of stochastic & adversarial multi-armed bandit

21

slide-40
SLIDE 40

Stochastic Repeated Auctions

slide-41
SLIDE 41

Our policy: UCBid

  • Auctions: infinite action space, but with a special structure.
  • Round t + 1 bid

bt+1 = min ( vωt + √

3 log(t) 2ωt , 1

) where ωt number of auctions won.

  • Our first main result

Theorem - Stochastic case UCBid yields a regret bound of ERT ≤ 3 + 12 log(T)

∧ 5 √ T log(T) where ∆ is such that no bid mt is in the interval (v, v + ∆)

23

slide-42
SLIDE 42

Fully stochastic case: UCBid

  • If mt ∼ µ satisfies margin condition, parameter α (unknown):

Definition - margin condition ∀u > 0, µ{(v, v + u)} ≤ Cuα for some constant C. The bigger α, the easier. Theorem - Fully stochastic case ERT ≤      cαT

1−α 2

log

1+α 2 (T)

if α < 1 cαlog2(T) if α = 1 cαlog(T) if α > 1

  • Almost matching lower bound

ERT ≥ { cαT

1−α 2

if α < 1 cαlog(T) if α ≥ 1

24

slide-43
SLIDE 43

Adversarial Repeated Auctions

slide-44
SLIDE 44

Our policy: EXPTree

max

b∈[0,1] T

t=1

(vt − mt)1{b > mt} −

T

t=1

(vt − mt)1{bt > mt}

  • Main idea: Nested partitions Pt of [0, 1]
  • Pt = {[m(s), m(s+1)), s = 0, . . . , t − 1}
  • mt ∈ [m(s∗), m(s∗+1)): split it into [m(s∗), mt) and [mt, m(s∗+1))
  • Weights of interval I is ωI = eη ∑

t

Xs

t where

Xs

t+1 is unbiased est.

  • f the value of a bid in I or in a parent of I.
  • At round t + 1, pick an interval It+1 in Pt+1 with proba

proportional to |It+1|ωt+1.

  • Finally, bid bt+1 uniform in It+1

26

slide-45
SLIDE 45

Performances of EXPTree

Theorem – Upper-bound EXPTree yields a regret bounded as ERT ≤ 4 √ T log(1/∆◦) with ∆◦ the width of interval contains the best fixed bid. Is the dependency in ∆◦ necessary ? yes Theorem – Lower-bound For any algo, there exists a sequence of mt and vt s.t. ERT ≥

1 32

√ T⌊log2(1/2∆◦)⌋

27

slide-46
SLIDE 46

Summary

maxb∈[0,1] ∑T

t=1(vt − mt)1{b > mt} − ∑T t=1(vt − mt)1{bt > mt}

  • vt stochastic, mt stochastic: variant of UCB

– RT ≲ T

1−α 2

log(T)

1+α 2

– Interpolate between log(T) regret (easy pb), and √ T (hard pb)

  • vt stochastic, mt adversarial: variants of UCB

– RT ≲ min {√ T log(T), log(T)

} – Logarithmic regret, even if parts of data are adversarial !

  • vt adversarial, mt adversarial: variant of Exponential weights

– RT ≲ √ T log(1/∆◦) – Same rates as with ∆◦-discretization and full info !

28

slide-47
SLIDE 47

Very (quite ?) interesting.... useful as it is? not really... Here is a list of reasons

29

slide-48
SLIDE 48

On the basic assumptions

  • 1. Stochastic: Data are not iid, patients are different

ill-posedness, feature selection/model selection

  • 2. Different Timing: several actions for one reward

pomdp, learn trade bias/variance

  • 3. Delays: Rewards not received instantaneously

grouping, evaluations

  • 4. Combinatorial: Several decisions at each stage

combinatorial optimization, cascading

  • 5. Non-linearity: concave gain, diminishing returns, etc

30

slide-49
SLIDE 49

Few announcements

  • Tim Roughgarden (Stanford) is giving a 10h lecture series on

Data-Driven Optimal Auction Theory September 14-21, Polytechnique

  • Criteo is organising

Machine Learning in the Real World #3 End of November (21 ?), Paris

  • For both events (or any other info) do not hesitate !

31

slide-50
SLIDE 50

Investigating (past/present/futur) them

32

slide-51
SLIDE 51

Patients are different

  • We assumed (implicitly ?) that all patients/users are identical
  • Treatments efficiency (proba of clicks) depend on age, gender...
  • Those covariates or contexts are observed/known before taking

the decision of blue/red pill The decision (and regret...) should ultimately depend on it

33

slide-52
SLIDE 52

General Model of Contextual Bandits

  • Covariates: ωt ∈ Ω = [0, 1]d, i.i.d., law µ (equivalent to) λ

The cookies of a user, the medical history, etc.

  • Decisions: πt ∈ {1, .., K}

The decision can (should) depend on the context ωt

  • Reward: Xk

t ∈ [0, 1] ∼ νk(ωt), E[Xk|ω] = µk(ω)

The expected reward of action k depend on the context ω

  • Objectives: Find the best decision given the request

Minimize regret RT := ∑T

t=1 µπ⋆(ωt)(ωt) − µπt(ωt) 34

slide-53
SLIDE 53

Regularity assumptions

  • 1. Smoothness of the pb: Every µk is β-hölder, with β ∈ (0, 1]:

∃ L > 0, ∀ ω, ω′ ∈ X, ∥µ(ω) − µ(ω′)∥ ≤ L∥ω − ω′∥β

  • 2. Complexity of the pb: (α-margin condition) ∃C0 > 0,

PX [ 0 <

  • µ1(ω) − µ2(ω)
  • < δ

] ≤ C0δα where maxk

k

is the maximal

k and

max

k

s t

k

is the second max. With K 2: is

  • Hölder but

is not continuous.

35

slide-54
SLIDE 54

Regularity assumptions

  • 1. Smoothness of the pb: Every µk is β-hölder, with β ∈ (0, 1]:

∃ L > 0, ∀ ω, ω′ ∈ X, ∥µ(ω) − µ(ω′)∥ ≤ L∥ω − ω′∥β

  • 2. Complexity of the pb: (α-margin condition) ∃C0 > 0,

PX [ 0 <

  • µ⋆(ω) − µ♯(ω)
  • < δ

] ≤ C0δα where µ⋆(ω) = maxk µk(ω) is the maximal µk and µ♯(ω) = max { µk(ω) s.t. µk(ω) < µ⋆(ω) } is the second max. With K > 2: µ⋆ is β-Hölder but µ♯ is not continuous.

35

slide-55
SLIDE 55

Regularity: an easy example (α big)

µ1(ω)

36

slide-56
SLIDE 56

Regularity: an easy example (α big)

µ1(ω) µ2(ω)

36

slide-57
SLIDE 57

Regularity: an easy example (α big)

µ1(ω) µ2(ω) µ3(ω)

36

slide-58
SLIDE 58

Regularity: an easy example (α big)

µ1(ω) µ2(ω) µ3(ω) µ⋆(ω)

36

slide-59
SLIDE 59

Regularity: an easy example (α big)

µ1(ω) µ2(ω) µ3(ω) µ⋆(ω) µ♯(ω)

36

slide-60
SLIDE 60

Regularity: an easy example (α big)

µ1(ω) µ2(ω) µ3(ω) µ⋆(ω) µ♯(ω)

36

slide-61
SLIDE 61

Regularity: a hard example (α small)

µ1(ω)

37

slide-62
SLIDE 62

Regularity: a hard example (α small)

µ1(ω) µ2(ω)

37

slide-63
SLIDE 63

Regularity: a hard example (α small)

µ1(ω) µ2(ω) µ3(ω)

37

slide-64
SLIDE 64

Regularity: a hard example (α small)

µ1(ω) µ2(ω) µ3(ω) µ⋆(ω)

37

slide-65
SLIDE 65

Regularity: a hard example (α small)

µ1(ω) µ2(ω) µ3(ω) µ⋆(ω) µ♯(ω)

37

slide-66
SLIDE 66

Regularity: a hard example (α small)

µ1(ω) µ2(ω) µ3(ω) µ⋆(ω) µ♯(ω)

37

slide-67
SLIDE 67

Binned policy

µ1(ω) µ2(ω) µ3(ω) µ⋆(ω) µ♯(ω)

38

slide-68
SLIDE 68

Binned policy

µ1(ω) µ2(ω) µ3(ω)

38

slide-69
SLIDE 69

Binned policy

µ1(ω) µ2(ω) µ3(ω)

38

slide-70
SLIDE 70

Binned Successive Elimination (BSE)

Theorem [P. and Rigollet (’13)] If α < 1, E[RT(BSE)] ≲ T (

K log(K) T

) β(1+α)

2β+d , bin side

(

K log(K) T

)

1 2β+d .

For K = 2, matches lower bound: minimax optimal w.r.t. T.

  • Same bound with full monit [Audibert and Tsybakov, ’07]
  • No log(T): difficulty of nonparametric estimation washes away

the effects of exploration/exploitation.

  • α < 1: cannot attain fast rates for easy problems.
  • Adaptive partitioning !

39

slide-71
SLIDE 71

Suboptimality of (BSE) for α ≥ 1

µ1(ω) µ2(ω) µ3(ω)

40

slide-72
SLIDE 72

Suboptimality of (BSE) for α ≥ 1

µ1(ω) µ2(ω) µ3(ω)

40

slide-73
SLIDE 73

Adaptive BSE (ABSE)

Theorem [P. and Rigollet (’13)] For all α, E[RT(ABSE)] ≲ T (

K log(K) T

) β(1+α)

2β+d .

For K = 2, matches lower bound: minimax optimal w.r.t. T.

  • Same bound than (BSE) even for easy problems α ≥ 1.

41

slide-74
SLIDE 74

This is not the solution

  • 1. dimensions dependent bound: T1−

β 2β+d

d = +∞ and β = 0, lots of contexts, no regularity Online selection of models ? Ill-posed pb µ(·) not β-holder Estimation/Approx errors Performance = Approx Error + Regret(β, d, T)

  • 2. Non-stationarity of arms: Value are not i.i.d., evolve with time.
  • Ex. ads for movies.

Cumulative objectives clearly not the solution. Discount ? How, why, at which speeds ?

  • 3. Non-stationarity of sets of arms:

Arms arrive and disappears How incorporate a new arm ? which index ?

42

slide-75
SLIDE 75

This was really not the solution

  • 1. Non-stationarity of sets of arms:

Arms arrive and disappears How incorporate a new arm ? which index ?

  • 2. Contexts (covariates) are not in Rd

Rather descriptions, texts, id, images...How to embed ? training set is influenced by algorithms...

43

slide-76
SLIDE 76

Different Timing

44

slide-77
SLIDE 77

Example of Repeated Auctions

Ad slot sold by lemonde.fr. 2nd-price auctions

  • Several (marketing) companies places bids
  • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...)
  • criteo chooses ad of a client, Microsoft or Cdiscount or Boooking
  • criteo paid by the client if the user clicks on the ad

Main Problem: Repeated auctions with unknown private valuation Learn valuations, find which ad to display & good strategies

45

slide-78
SLIDE 78

Repeated auctions

  • 1. Can be modeled as a bandit pb with Extra Structure
  • 2. Actually, Criteo (Google, Facebook) paid if the user buys

something after the click Needs several ”costly” auctions to seal a deal Auctions lost can also help to seal deal (competitor displays ad for free) Optimal strategy in repeated auctions, learn it ? (POMDP ?) Reward timing per user, decision timing by opportunities

46

slide-79
SLIDE 79

Other examples - repeated A/B tests

  • Companies test new technologies (algo, hardware, etc.) before

putting in productions. Sequences of AB tests Timing of Decisions: each day, continue, stop or validate the current AB test Timing of Rewards: Total improvements of implemented techno.

  • The longer AB test are, the more confident (reduces variance)

but less and less implementation Online tradeoff risks/performances

47

slide-80
SLIDE 80

Delays

48

slide-81
SLIDE 81

Rewards are not observed immediately

  • Clinical trials: have to wait 6 months to see results.

A trial length is 3 year : 6 phases Regret is still √ T

  • Marketing (ad displays), only see if users buy

No feedback is either no sale (forever) or no sale yet Build estimators with censured/missing data Feasible with iid data... but they are not!

49

slide-82
SLIDE 82

Combinatorial Structure

50

slide-83
SLIDE 83

Large Decision spaces

  • Choose not to display 1 ad, but 4, 6, 10...
  • Paid if sales after click (even if unrelated)

Lots of correlations (between products, positions, colors/style of banner, time, etc.) Some products are seen, other are not (carrousels...)

  • Too many possibilities of (almost) equal performances

Compete with the best RT ≤ √ KT but at least top 5%, RT ≤ √ log(K) 1

5%T ?? 51

slide-84
SLIDE 84

Bandit theory is quite neat To be ”applied”, or relevant, need LOTS of work Anybody is welcome to join & collaborate!

Model selection, Feature extractions, Missing Data, Censured Data, Combinatorial Optimization, New techniques estimators..

52