Efficient tracking of a growing number of experts Jaouad Mourtada - - PowerPoint PPT Presentation

efficient tracking of a growing number of experts
SMART_READER_LITE
LIVE PREVIEW

Efficient tracking of a growing number of experts Jaouad Mourtada - - PowerPoint PPT Presentation

Setting Growing experts in the specialist setting Growing experts and sequences of experts Efficient tracking of a growing number of experts Jaouad Mourtada & Odalric-ambrym Maillard CMAP, cole Polytechnique & Sequel, INRIA Lille


slide-1
SLIDE 1

Setting Growing experts in the specialist setting Growing experts and sequences of experts

Efficient tracking of a growing number of experts

Jaouad Mourtada & Odalric-ambrym Maillard

CMAP, École Polytechnique & Sequel, INRIA Lille – Nord Europe

ALT 2017, Kyoto University

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-2
SLIDE 2

Setting Growing experts in the specialist setting Growing experts and sequences of experts Prediction with expert advice Sequentially incoming forecasters

1

Setting

2

Growing experts in the specialist setting

3

Growing experts and sequences of experts

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-3
SLIDE 3

Setting Growing experts in the specialist setting Growing experts and sequences of experts Prediction with expert advice Sequentially incoming forecasters

Prediction with expert advice

Well studied, standard framework for online learning (see [Cesa-Bianchi and Lugosi, 2006]) Aim: combine the forecasts of several experts = ⇒ predict almost as well as the best of them Adversarial/worst case setting (no stochasticity assumption

  • n the signal)

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-4
SLIDE 4

Setting Growing experts in the specialist setting Growing experts and sequences of experts Prediction with expert advice Sequentially incoming forecasters

Formal setting

X prediction space, Y signal space, ℓ : X × Y → R loss function Experts i = 1, . . . , M Prediction with expert advice At each time step t = 1, 2, . . .

1 Experts i = 1, . . . , M output predictions xi,t ∈ X 2 Forecaster predicts xt ∈ X 3 Environment chooses signal value yt ∈ Y 4 Experts i = 1, . . . , M incur loss ℓi,t := ℓ(xi,t, yt), forecaster

gets loss ℓt := ℓ(xt, yt)

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-5
SLIDE 5

Setting Growing experts in the specialist setting Growing experts and sequences of experts Prediction with expert advice Sequentially incoming forecasters

Formal setting

Prediction with expert advice At each time step t = 1, 2, . . .

1 Experts i = 1, . . . , M output predictions xi,t ∈ X 2 Forecaster predicts xt ∈ X 3 Environment chooses signal value yt ∈ Y 4 Experts i = 1, . . . , M incur loss ℓi,t := ℓ(xi,t, yt), forecaster

gets loss ℓt := ℓ(xt, yt) Goal: strategy for the Forecaster with controlled worst-case regret Ri,T = LT − Li,T =

T

  • t=1

ℓt −

T

  • t=1

ℓi,t

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-6
SLIDE 6

Setting Growing experts in the specialist setting Growing experts and sequences of experts Prediction with expert advice Sequentially incoming forecasters

Assumption on the loss function

Assumption (η-Exp-concavity) Loss function ℓ is η-exp-concave for some η > 0, i.e. for every y ∈ Y , the function exp(−η ℓ(·, y)) : X → R+ is concave. Important examples: Logarithmic, or self-information loss: X = P(Y ), ℓ(p, y) = − log p({y}) Square loss on a bounded domain: X = Y = [a, b], ℓ(x, y) = (x − y)2, η =

1 2(b−a)2

NOT the absolute loss ℓ(x, y) = |x − y| on [0, 1]2

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-7
SLIDE 7

Setting Growing experts in the specialist setting Growing experts and sequences of experts Prediction with expert advice Sequentially incoming forecasters

The exponential weights algorithm

xit : prediction of expert i at time t Exponential weights/Hedge algorithm xt =

M

  • i=1

vi,t xi,t vi,t = πie−η Li,t−1 M

j=1 πje−η Lj,t−1

with π = (πi)1iM a prior probability distribution on the experts Start with v1 = π. At end of round t 1, after predicting and seeing losses ℓi,t, update vt+1 by setting it to the posterior distribution v m

t :

vi,t+1 = vm

i,t =

vi,t e−η ℓi,t M

j=1 vj,t e−η ℓj,t

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-8
SLIDE 8

Setting Growing experts in the specialist setting Growing experts and sequences of experts Prediction with expert advice Sequentially incoming forecasters

Regret of the Hedge algorithm

Proposition (Vovk, Littlestone & Warmuth) If ℓ is η-exp-concave, the Exponential Weights algorithm with prior π achieves the regret bound: ∀i = 1, . . . , M, LT − Li,T 1 η log 1 πi . (1) In particular, if π = 1

M 1 is uniform,

LT min

1iM Li,T + 1

η log M . (2)

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-9
SLIDE 9

Setting Growing experts in the specialist setting Growing experts and sequences of experts Prediction with expert advice Sequentially incoming forecasters

Sequentially incoming forecasters

What if new experts (algorithms, methods, new data/variables. . . ) become available over time ? How to incorporate them, with formal regret guarantees ? Proposed setting: Growing set of experts. Mt increases

  • ver time, and is unknown in advance; at time t, new

experts i = Mt−1 + 1, . . . , Mt start issuing predictions expert 1 •

  • . . .
  • expert 2 •
  • expert 3

. . .

  • τ1 = τ2 = 1

τ3 = 2 τ4 = τ5 = 4 rounds t

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-10
SLIDE 10

Setting Growing experts in the specialist setting Growing experts and sequences of experts Prediction with expert advice Sequentially incoming forecasters

Objective

Design forecasting strategies for the “Growing number of experts” setting, with emphasis on: computationally inexpensive strategies: ideal complexity O(Mt) at step t anytime strategies: no fixed time horizon T no a priori knowledge of Mt no free parameters to tune regret bounds against several classes of competitors, that are adaptive to the parameters of the comparison class

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-11
SLIDE 11

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

1

Setting

2

Growing experts in the specialist setting

3

Growing experts and sequences of experts

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-12
SLIDE 12

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

Growing experts

Recall the framework: At time t, experts i = 1, . . . , Mt issue predictions; i.e. at time t, mt := Mt − Mt−1 new experts i = Mt−1 + 1, . . . , Mt enter τi = inf{t 1 | i Mt} entry time of expert i First notion of regret = constant experts: for each i, Ri,T =

T

  • t=τi

(ℓt − ℓi,t) − → “specialist trick”

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-13
SLIDE 13

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

expert 1 •

  • . . .
  • expert 2 •
  • expert 3

. . .

  • τ1 = τ2 = 1

τ3 = 2 τ4 = τ5 = 4

  • — comparison expert

rounds t

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-14
SLIDE 14

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

The specialist setting

Introduced by [Freund et al., 1997] Specialists i = 1, . . . , M; at each time step t, only a subset At ⊂ {1, . . . , M} of active specialists output a prediction xi,t Goal: minimize “regret” with respect to each specialist i Ri,T =

  • tT : i∈At

(ℓt − ℓi,t)

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-15
SLIDE 15

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

The “specialist trick” [Chernov and Vovk, 2009]

General method to turn an “expert” algorithm into a “specialist” algorithm Idea: “complete” specialists’ predictions by making inactive specialists i ∈ At predict the same as the forecaster xi,t := xt

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-16
SLIDE 16

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

The “specialist trick” [Chernov and Vovk, 2009]

General method to turn an “expert” algorithm into a “specialist” algorithm Idea: “complete” specialists’ predictions by making inactive specialists i ∈ At predict the same as the forecaster xi,t := xt Circular ? xt = M

i=1 vi,t xi,t. . .

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-17
SLIDE 17

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

The “specialist trick” [Chernov and Vovk, 2009]

General method to turn an “expert” algorithm into a “specialist” algorithm Idea: “complete” specialists’ predictions by making inactive specialists i ∈ At predict the same as the forecaster xi,t := xt Circular ? xt = M

i=1 vi,t xi,t. . .

(Unique) Solution: For i ∈ At, define xi,t :=

  • i∈At vi,t xi,t
  • i∈At vi,t

= ⇒ xt =

  • i∈At vi,t xi,t
  • i∈At vi,t

= xi,t for each i ∈ At

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-18
SLIDE 18

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

The “specialist trick” [Chernov and Vovk, 2009]

General method to turn an “expert” algorithm into a “specialist” algorithm Idea: “complete” specialists’ predictions by making inactive specialists i ∈ At predict the same as the forecaster xi,t := xt (Unique) Solution: For i ∈ At, define xi,t :=

  • i∈At vi,t xi,t
  • i∈At vi,t

By construction T

t=1(ℓt − ℓi,t) = tT : i∈At(ℓt − ℓi,t) +

Hedge regret bound = ⇒ regret for SpecialistHedge with prior π ∀i = 1, . . . , M,

  • tT : i∈At

(ℓt − ℓi,t) 1 η log 1 πi .

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-19
SLIDE 19

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

Growing experts and specialists

Specialists can abstain from predicting = ⇒ can handle experts who have not entered yet Growing experts can be viewed as specialists: At = {1, . . . , Mt} SpecialistHedge gives a regret bound for Ri,T Exactly which total set of specialists ?

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-20
SLIDE 20

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

Which total set of specialists ?

Naive choice : Both T and MT known in advance = ⇒ set of specialists {1, . . . , MT}

Prior π = (π1, . . . , πMT ) ; SpecialistHedge with prior π yields regret Ri,T 1

η log 1 πi for

i = 1, . . . , MT (e.g.

1 η log MT)

Problem: not anytime + requires knowledge of MT

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-21
SLIDE 21

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

Which total set of specialists ?

Naive choice : Both T and MT known in advance = ⇒ set of specialists {1, . . . , MT} Better choice : set of specialists N∗

Prior = probability distribution π = (π1, π2, . . . ) on N∗ Keeping track of e−ηLt, we only need to maintain the weights

  • f entered experts

Yields anytime algorithm GrowingHedge with O(Mt) per-round complexity + regret bound Ri,T 1

η log 1 πi ∀i, ∀T

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-22
SLIDE 22

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

Which total set of specialists ?

Naive choice : Both T and MT known in advance = ⇒ set of specialists {1, . . . , MT} Better choice : set of specialists N∗, GrowingHedge with normalized prior π Slightly better : GrowingHedge with unnormalized prior π

Observation : ∀T 1, GrowingHedge coincides up to time T with SpecialistHedge on {1, . . . , MT} with prior ( π1

ΠMT , . . . , πMT ΠMT ), where ΠMT := MT i=1 πi

Remains true even for arbitrary (non-summable) prior π ∈ (R∗

+)N∗ : more flexibility + simpler bounds

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-23
SLIDE 23

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

Growing Hedge algorithm

Growing Hedge Set wi,1 = πi for i = 1, . . . , M1. For t = 1, 2, . . .

Given predictions xi,t from experts 1 i Mt, predict xt = Mt

i=1 wi,t xi,t

Mt

i=1 wi,t

Update weights by wi,t+1 = wi,te−η ℓi,t for i = 1, . . . , Mt and introduce wi,t+1 = πie−η Lt for i = Mt + 1, . . . , Mt+1

Anytime, efficient algorithm, agnostic to Mt ; πi only used from time τi

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-24
SLIDE 24

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

GrowingHedge: regret bound

Proposition With arbitrary prior π, GrowingHedge achieves regret bound ∀T 1, ∀i = 1, . . . , MT,

T

  • t=τi

(ℓt − ℓi,t) 1 η log   1 πi

MT

  • j=1

πj   Prior πi = 1 gives Ri,T 1

η log MT (but now anytime).

Prior πi = 1/(τimτi) : depends on entry time τi and number of new experts mτi, both revealed at step t = τi. Regret bound: Ri,T 1 η log mτi + 1 η log τi + 1 η log(1 + log T) .

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-25
SLIDE 25

Setting Growing experts in the specialist setting Growing experts and sequences of experts Regret against constant experts The specialist setting From SpecialistHedge to GrowingHedge

Summary

Regret against constant experts: naturally handled by the specialist setting Small subtlety in the choice of the set of specialists + extension to unnormalized prior (more general/flexible strategies with unified analysis) Using Hedge as base algorithm = ⇒ simple and efficient: only maintain weights for entered experts But somewhat limited: does not work as seamlessly for more complex base algorithms/comparison classes

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-26
SLIDE 26

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

1

Setting

2

Growing experts in the specialist setting

3

Growing experts and sequences of experts

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-27
SLIDE 27

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

Different perspective on growing experts

“Specialist” or “abstention trick” = ⇒ GrowingHedge Controls regret w.r.t constant experts Ri,T = T

t=τi(ℓt − ℓi,t)

expert 1 •

  • . . .
  • expert 2 •
  • expert 3

. . .

  • τ1 = τ2 = 1

τ3 = 2 τ4 = τ5 = 4

  • — comparison expert

rounds t

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-28
SLIDE 28

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

Different perspective on growing experts

“Specialist” or “abstention trick” = ⇒ GrowingHedge Controls regret w.r.t constant experts Implies (by summing) regret bound against sequences of “fresh” experts (i1, . . . , iT), i.e. sequences that only switch to new experts Ri,T = T

t=τi(ℓt − ℓi,t)

expert 1 •

  • . . .
  • expert 2 •
  • expert 3

. . .

  • τ1 = τ2 = 1

τ3 = 2 τ4 = τ5 = 4

  • — comparison expert

rounds t

  • Jaouad Mourtada & Odalric-ambrym Maillard

Efficient tracking of a growing number of experts

slide-29
SLIDE 29

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

Different perspective on growing experts

“Specialist” or “abstention trick” = ⇒ GrowingHedge Controls regret w.r.t constant experts Implies regret bound against sequences of “fresh” experts (i1, . . . , iT), i.e. sequences that only switch to new experts What about other comparison sequences (i1, . . . , iT) ? expert 1 •

  • . . .
  • expert 2 •
  • expert 3

. . .

  • τ1 = τ2 = 1

τ3 = 2 τ4 = τ5 = 4

  • — comparison expert

rounds t

  • Jaouad Mourtada & Odalric-ambrym Maillard

Efficient tracking of a growing number of experts

slide-30
SLIDE 30

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

Comparing to sequences of experts (fixed M)

Tracking the best expert [Herbster and Warmuth, 1998]: comparing to sequences of experts (i1, . . . , iT) (k ≪ T shifts) RT(i1, . . . , iT) :=

  • 1tT

(ℓt − ℓit,t) Inefficient solution : aggregate over MT sequences of experts = ⇒ oracle regret bound of ≈ 1

η(k + 1) log M + 1 ηk log T k

Efficient Fixed Share algorithm [Herbster and Warmuth, 1998] = ⇒ optimal regret bound with O(M) per-round complexity Can be seen as aggregation of sequences under Markov chain prior [Vovk, 1999]

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-31
SLIDE 31

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

Aggregating sequences of experts

Key fact. When the prior π on sequences of experts is Markovian with transition probabilities θt(it | it−1) π(i1, . . . , iT) = θ1(i1) θ2(i2 | i1) · · · θT(iT | iT−1) Hedge collapses to efficient algorithm MarkovHedge with update vi,t+1 =

M

  • j=1

θt+1(i | j) vm

j,t

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-32
SLIDE 32

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

The “muting trick”

In order to transport to the “growing experts” setting, we need xt =

  • i

vi,txi,t to be well-defined Trick: since θt can be chosen at time t, take θt to only transition to entered experts Prior π (defined recursively) puts all mass to admissible sequences of experts with it Mt for all t Amounts to set vi,t = 0 (“muting”) for experts i that have not entered yet “Dual” to “specialist trick”, but more versatile

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-33
SLIDE 33

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

FreshMarkovHedge

Only switch to new experts (all mass to sequences of fresh experts) Turns out to be equivalent to GrowingHedge under unnormalized prior Using proper transition probabilities, regret e.g. of the form 1 η

  • (k + 1) log max

1tT mt + (k + 1) log T

  • for sequences of fresh experts with k shifts

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-34
SLIDE 34

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

GrowingMarkovHedge

Transition to both new and incumbent experts Again anytime, with O(Mt) per-round complexity With proper choice of transition probabilities, regret 1 η

  • (k + 1) log max

1tT mt + (k + k1 + 2) log T

  • w.r.t. sequences with k switches, among which k1 to

incumbent experts, for all (i.e. adaptive to) k and k1

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-35
SLIDE 35

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

Tracking a small pool of good experts

GrowingMarkovHedge covers all admissible sequences, with essentially optimal regret bound But regret bound can be quite large: of order log Mt at each

  • switch. Can we do better for some sequences ?

Tracking a small subset of good experts (Freund, [Bousquet and Warmuth, 2002]): n ≪ M good experts, sparse sequences with k ≪ T shifts among these n experts Particularly important for a growing number of experts, as MT → ∞

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-36
SLIDE 36

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

Tracking a small pool of good experts

Tracking a small subset of good experts (Freund, [Bousquet and Warmuth, 2002]): n ≪ M good experts, sparse sequences with k ≪ T shifts among these n experts expert 1 •

  • . . .
  • expert 2 •
  • expert 3

. . .

  • τ1 = τ2 = 1

τ3 = 2 τ4 = τ5 = 4

  • — comparison expert

rounds t

  • Particularly important for a growing number of experts,

as MT → ∞

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-37
SLIDE 37

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

The “sparse” case: fixed M

Ad-hoc Mixing Past Posterior (MPP) algorithm [Bousquet and Warmuth, 2002], with (up to some tuning) regret bound ≈ n log M

n + k log n + 2k log T

(log n regret per switch instead of log M) Interpreted by [Koolen et al., 2012] as an aggregation of a structured class of specialists + new algorithm (no tuning)

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-38
SLIDE 38

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

Small pool of experts in the growing experts setting

Slight reformulation of [Koolen et al., 2012]’s algorithm: aggregation of sequences of specialists (i, a) with i expert and a ∈ {0, 1} ; (i, 0) always inactive, (i, 1) always active More flexibility, necessary to extend to the “growing” setting Markov prior: transitions only occur between (i, 0) and (i, 1) (both ways) + “muting trick”: zero mass to (i, 1) as long as i > Mt Combines the “specialist” and “sequences of experts” viewpoints GrowingSleepingHedge: Anytime and efficient + regret bound (up to time T, k shifts among n base experts) of ≈ 1 η

  • n log max1tT mt

n + 2k log T

  • Jaouad Mourtada & Odalric-ambrym Maillard

Efficient tracking of a growing number of experts

slide-39
SLIDE 39

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

Conclusion

Specialist setting/trick: most natural approach But can be somewhat less appealing/seamless beyond constant experts Sequences of experts = more flexible approach (recovers “Growing Hedge” as a particular case) Generic algorithms (esp. efficient aggregation of structured classes of experts) + encode the “growing” structure in the prior (can be done on the fly) Leads to efficient and simple anytime algorithms with adaptive regret bounds for various comparison classes + conceptually transparent proofs

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-40
SLIDE 40

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

Thank you !

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-41
SLIDE 41

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

References I

Bousquet, O. and Warmuth, M. K. (2002). Tracking a small set of experts by mixing past posteriors. The Journal of Machine Learning Research, 3:363–396. Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press, Cambridge, New York, USA. Chernov, A. and Vovk, V. (2009). Prediction with expert evaluators’ advice. In Algorithmic Learning Theory (ALT), pages 8–22.

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-42
SLIDE 42

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

References II

Freund, Y., Schapire, R. E., Singer, Y., and Warmuth, M. K. (1997). Using and combining predictors that specialize. In ACM Symposium on Theory of Computing (STOC), pages 334–343. Herbster, M. and Warmuth, M. K. (1998). Tracking the best expert. Machine Learning, 32(2):151–178. Koolen, W. M., Adamskiy, D., and Warmuth, M. K. (2012). Putting bayes to sleep. In Advances in Neural Information Processing Systems (NIPS), pages 135–143.

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts

slide-43
SLIDE 43

Setting Growing experts in the specialist setting Growing experts and sequences of experts Aggregating sequences of experts The “muting trick” Combining specialists and sequences of experts

References III

Vovk, V. (1999). Derandomizing stochastic prediction strategies. Machine Learning, 35(3):247–282.

Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts