following the flattened leader
play

Following the Flattened Leader lowski 1 unwald 1 Steven de Rooij 2 - PowerPoint PPT Presentation

Following the Flattened Leader lowski 1 unwald 1 Steven de Rooij 2 Wojciech Kot Peter Gr 1 National Research Institute for Mathematics and Computer Science (CWI) The Netherlands 2 University of Cambridge COLT 2010 1 / 14 Outline 1


  1. Following the Flattened Leader lowski 1 unwald 1 Steven de Rooij 2 Wojciech Kot� Peter Gr¨ 1 National Research Institute for Mathematics and Computer Science (CWI) The Netherlands 2 University of Cambridge COLT 2010 1 / 14

  2. Outline 1 Sequential prediction with log-loss. Set of experts = exponential family. 2 / 14

  3. Outline 1 Sequential prediction with log-loss. Set of experts = exponential family. 2 Prediction strategies: Bayes strategy: “Follow the leader” strategy: achieves optimal regret simple to compute/update usually hard to calculate suboptimal 2 / 14

  4. Outline 1 Sequential prediction with log-loss. Set of experts = exponential family. 2 Prediction strategies: Bayes strategy: “Follow the leader” strategy: achieves optimal regret simple to compute/update usually hard to calculate suboptimal 3 Our contribution “Follow the flattened leader” strategy: A slight modification of “follow the leader”. achieves performance of Bayes retains simplicity of ML 2 / 14

  5. Outline 1 Sequential prediction with log-loss. Set of experts = exponential family. 2 Prediction strategies: Bayes strategy: “Follow the leader” strategy: achieves optimal regret simple to compute/update usually hard to calculate suboptimal 3 Our contribution “Follow the flattened leader” strategy: A slight modification of “follow the leader”. achieves performance of Bayes retains simplicity of ML 4 Applications: prediction, coding, model selection. 2 / 14

  6. Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . 3 / 14

  7. Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. 3 / 14

  8. Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. In each iteration, after observing x n = x 1 , x 2 , . . . , x n , predict x n +1 by assigning a distribution P ( ·| x n ) . 3 / 14

  9. Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. In each iteration, after observing x n = x 1 , x 2 , . . . , x n , predict x n +1 by assigning a distribution P ( ·| x n ) . After x n +1 is revealed, incur log-loss − log P ( x n +1 | x n ) . 3 / 14

  10. Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. In each iteration, after observing x n = x 1 , x 2 , . . . , x n , predict x n +1 by assigning a distribution P ( ·| x n ) . After x n +1 is revealed, incur log-loss − log P ( x n +1 | x n ) . Regret w.r.t. the best “expert” from M : n n � � R ( P, x n ) = − log P ( x i | x i − 1 ) − inf − log P µ ( x i | x i − 1 ) . µ ∈ Θ i =1 i =1 3 / 14

  11. Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. In each iteration, after observing x n = x 1 , x 2 , . . . , x n , predict x n +1 by assigning a distribution P ( ·| x n ) . After x n +1 is revealed, incur log-loss − log P ( x n +1 | x n ) . Regret w.r.t. the best “expert” from M : n n � � R ( P, x n ) = − log P ( x i | x i − 1 ) − inf − log P µ ( x i | x i − 1 ) . µ ∈ Θ i =1 i =1 Process generating the outcomes: adversarial: only boundedness assumptions on x n , stochastic: X 1 , X 2 , . . . i.i.d. ∼ P ∗ , possibly P ∗ / ∈ M , R ( P, X n ) is a random variable. 3 / 14

  12. Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . 4 / 14

  13. Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). 4 / 14

  14. Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). “Follow the leader” prediction strategy: P ( ·| x i ) = P ˆ µ i ( · ) 4 / 14

  15. Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). “Follow the leader” prediction strategy: P ( ·| x i ) = P ˆ µ i ( · ) ⇐ µ 0 undefined, P ( x 2 | x 1 ) = 0 . . . = ˆ 4 / 14

  16. Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). “Follow the leader” prediction strategy: P ( ·| x i ) = P ˆ µ i ( · ) ⇐ µ 0 undefined, P ( x 2 | x 1 ) = 0 . . . = ˆ i = #1+1 P ( ·| x i ) = P ˆ i ( · ) , ˆ µ ◦ n +2 (Laplace’s rule of succesion). µ ◦ 1 2 , 2 3 , 1 2 , 3 5 , 1 2 , 4 7 , 5 8 , 5 9 , 3 7 7 µ ◦ ˆ i : 5 , 11 , 12 . 4 / 14

  17. Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). “Follow the leader” prediction strategy: P ( ·| x i ) = P ˆ µ i ( · ) ⇐ µ 0 undefined, P ( x 2 | x 1 ) = 0 . . . = ˆ i = #1+1 P ( ·| x i ) = P ˆ i ( · ) , ˆ µ ◦ n +2 (Laplace’s rule of succesion). µ ◦ 1 2 , 2 3 , 1 2 , 3 5 , 1 2 , 4 7 , 5 8 , 5 9 , 3 7 7 µ ◦ ˆ i : 5 , 11 , 12 . If x ∞ such that for large n , ˆ µ n bounded away from { 0 , 1 } : R ( P, x n ) = 1 2 log n + O (1) . 4 / 14

  18. Problem Statement 5 / 14

  19. Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . 5 / 14

  20. Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). 5 / 14

  21. Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). R ( P bayes , x n ) = k 2 log n + O (1) (asympt. optimal). 5 / 14

  22. Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). R ( P bayes , x n ) = k 2 log n + O (1) (asympt. optimal). Plug-in strategy: µ : X ∞ → Θ P plug-in ( x n +1 | x n ) = P ¯ µ ( x n ) ( x n +1 ) , ¯ U plug-in ( x n +1 | x n ) ∈ M (in-model strategy). 5 / 14

  23. Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). R ( P bayes , x n ) = k 2 log n + O (1) (asympt. optimal). Plug-in strategy: µ : X ∞ → Θ P plug-in ( x n +1 | x n ) = P ¯ µ ( x n ) ( x n +1 ) , ¯ U plug-in ( x n +1 | x n ) ∈ M (in-model strategy). µ ( x n ) = ˆ ML plug-in strategy (“follow the leader”) if ¯ µ ◦ n : n = n 0 x 0 + � n i =1 x i µ ◦ ˆ (smoothed ML estimator) n 0 + n 5 / 14

  24. Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). R ( P bayes , x n ) = k 2 log n + O (1) (asympt. optimal). Plug-in strategy: µ : X ∞ → Θ P plug-in ( x n +1 | x n ) = P ¯ µ ( x n ) ( x n +1 ) , ¯ U plug-in ( x n +1 | x n ) ∈ M (in-model strategy). µ ( x n ) = ˆ ML plug-in strategy (“follow the leader”) if ¯ µ ◦ n : n = n 0 x 0 + � n i =1 x i µ ◦ ˆ (smoothed ML estimator) n 0 + n R ( P plug-in , , x n ) ≥ c k 2 log n + O (1) , worst case: c ≫ 1 . 5 / 14

  25. Contribution Bayes strategy: Plug-in strategy (incl. ML): (strategy outside the model) (strategy in the model) asympt. optimal regret: suboptimal: k c k 2 log n + O (1) 2 log n + O (1) usually hard to calculate simple to compute/update “Follow the Flattened Leader” A slight modification (“flattening”) of the ML plug-in strategy, “almost” in the model, achieving optimal regret. achieves performance of Bayes retains simplicity of ML 6 / 14

  26. Motivating Example: Why Bayes is better than ML? M = {N ( µ, 1): µ ∈ R } . 7 / 14

  27. Motivating Example: Why Bayes is better than ML? M = {N ( µ, 1): µ ∈ R } . ML strategy prediction: Bayes strategy prediction: � � 1 µ ◦ N (ˆ µ ◦ N ˆ n , 1 + n , 1) n + 1 7 / 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend