online learning and online convex optimization
play

Online Learning and Online Convex Optimization Nicol` o - PowerPoint PPT Presentation

Online Learning and Online Convex Optimization Nicol` o Cesa-Bianchi Universit` a degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary My beautiful regret 1 A supposedly fun game Ill play again 2 The joy of


  1. Online Learning and Online Convex Optimization Nicol` o Cesa-Bianchi Universit` a degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49

  2. Summary My beautiful regret 1 A supposedly fun game I’ll play again 2 The joy of convex 3 N. Cesa-Bianchi (UNIMI) Online Learning 2 / 49

  3. Summary My beautiful regret 1 A supposedly fun game I’ll play again 2 The joy of convex 3 N. Cesa-Bianchi (UNIMI) Online Learning 3 / 49

  4. Machine learning Classification / regression tasks Predictive models h mapping data instances X to labels Y (e.g., binary classifier) � � Training data S T = ( X 1 , Y 1 ) , . . . , ( X T , Y T ) (e.g., email messages with spam vs. nonspam annotations) Learning algorithm A (e.g., Support Vector Machine) maps training data S T to model h = A ( S T ) Evaluate the risk of the trained model h with respect to a given loss function N. Cesa-Bianchi (UNIMI) Online Learning 4 / 49

  5. Two notions of risk View data as a statistical sample: statistical risk � �� � A ( S T ) , ( X , Y ) E ℓ � ���� �� ���� � � �� �� �� � test trained example model � � Training set S T = ( X 1 , Y 1 ) , . . . , ( X T , Y T ) and test example ( X , Y ) drawn i.i.d. from the same unknown and fixed distribution N. Cesa-Bianchi (UNIMI) Online Learning 5 / 49

  6. Two notions of risk View data as a statistical sample: statistical risk � �� � A ( S T ) , ( X , Y ) E ℓ � ���� �� ���� � � �� �� �� � test trained example model � � Training set S T = ( X 1 , Y 1 ) , . . . , ( X T , Y T ) and test example ( X , Y ) drawn i.i.d. from the same unknown and fixed distribution View data as an arbitrary sequence: sequential risk T � � � ℓ A ( S t − 1 ) , ( X t , Y t ) � ������ �� ������ � � ����� �� ����� � t = 1 test trained example model Sequence of models trained on growing prefixes � � S t = ( X 1 , Y 1 ) , . . . , ( X t , Y t ) of the data sequence N. Cesa-Bianchi (UNIMI) Online Learning 5 / 49

  7. Regrets, I had a few Learning algorithm A maps datasets to models in a given class H Variance error in statistical learning � �� � �� � � E ℓ A ( S T ) , ( X , Y ) − inf h ∈ H E ℓ h , ( X , Y ) compare to expected loss of best model in the class N. Cesa-Bianchi (UNIMI) Online Learning 6 / 49

  8. Regrets, I had a few Learning algorithm A maps datasets to models in a given class H Variance error in statistical learning � �� � �� � � E ℓ A ( S T ) , ( X , Y ) − inf h ∈ H E ℓ h , ( X , Y ) compare to expected loss of best model in the class Regret in online learning T T � � � � � � ℓ A ( S t − 1 ) , ( X t , Y t ) − inf ℓ h , ( X t , Y t ) h ∈ H t = 1 t = 1 compare to cumulative loss of best model in the class N. Cesa-Bianchi (UNIMI) Online Learning 6 / 49

  9. Incremental model update A natural blueprint for online learning algorithms For t = 1, 2, . . . Apply current model h t − 1 to next data element ( X t , Y t ) 1 Update current model: h t − 1 → h t ∈ H (local optimization) 2 N. Cesa-Bianchi (UNIMI) Online Learning 7 / 49

  10. Incremental model update A natural blueprint for online learning algorithms For t = 1, 2, . . . Apply current model h t − 1 to next data element ( X t , Y t ) 1 Update current model: h t − 1 → h t ∈ H (local optimization) 2 Goal: control regret T T � � � � � � ℓ h t − 1 , ( X t , Y t ) − inf ℓ h , ( X t , Y t ) h ∈ H t = 1 t = 1 N. Cesa-Bianchi (UNIMI) Online Learning 7 / 49

  11. Incremental model update A natural blueprint for online learning algorithms For t = 1, 2, . . . Apply current model h t − 1 to next data element ( X t , Y t ) 1 Update current model: h t − 1 → h t ∈ H (local optimization) 2 Goal: control regret T T � � � � � � ℓ h t − 1 , ( X t , Y t ) − inf ℓ h , ( X t , Y t ) h ∈ H t = 1 t = 1 View this as a repeated game between a player generating predictors h t ∈ H and an opponent generating data ( X t , Y t ) N. Cesa-Bianchi (UNIMI) Online Learning 7 / 49

  12. Summary My beautiful regret 1 A supposedly fun game I’ll play again 2 The joy of convex 3 N. Cesa-Bianchi (UNIMI) Online Learning 8 / 49

  13. Theory of repeated games James Hannan David Blackwell (1922–2010) (1919–2010) Learning to play a game (1956) Play a game repeatedly against a possibly suboptimal opponent N. Cesa-Bianchi (UNIMI) Online Learning 9 / 49

  14. Zero-sum 2-person games played more than once 1 2 . . . M N × M known loss matrix ℓ ( 1, 1 ) ℓ ( 1, 2 ) 1 . . . Row player (player) 2 ℓ ( 2, 1 ) ℓ ( 2, 2 ) . . . has N actions . . . ... . . . . . . Column player (opponent) N has M actions For each game round t = 1, 2, . . . Player chooses action i t and opponent chooses action y t The player su ff ers loss ℓ ( i t , y t ) ( = gain of opponent) Player can learn from opponent’s history of past choices y 1 , . . . , y t − 1 N. Cesa-Bianchi (UNIMI) Online Learning 10 / 49

  15. Prediction with expert advice t = 1 t = 2 . . . 1 ℓ 1 ( 1 ) ℓ 2 ( 1 ) . . . ℓ 1 ( 2 ) ℓ 2 ( 2 ) 2 . . . . . . ... . . . . . . ℓ 1 ( N ) ℓ 2 ( N ) N Volodya Vovk Manfred Warmuth Opponent’s moves y 1 , y 2 , . . . define a sequential prediction problem with a time-varying loss function ℓ ( i t , y t ) = ℓ t ( i t ) N. Cesa-Bianchi (UNIMI) Online Learning 11 / 49

  16. Playing the experts game A sequential decision problem N actions Unknown deterministic assignment of losses to actions � � ∈ [ 0, 1 ] N for t = 1, 2, . . . ℓ t = ℓ t ( 1 ) , . . . , ℓ t ( N ) ? ? ? ? ? ? ? ? ? For t = 1, 2, . . . N. Cesa-Bianchi (UNIMI) Online Learning 12 / 49

  17. Playing the experts game A sequential decision problem N actions Unknown deterministic assignment of losses to actions � � ∈ [ 0, 1 ] N for t = 1, 2, . . . ℓ t = ℓ t ( 1 ) , . . . , ℓ t ( N ) ? ? ? ? ? ? ? ? ? For t = 1, 2, . . . Player picks an action I t (possibly using randomization) and 1 incurs loss ℓ t ( I t ) N. Cesa-Bianchi (UNIMI) Online Learning 12 / 49

  18. Playing the experts game A sequential decision problem N actions Unknown deterministic assignment of losses to actions � � ∈ [ 0, 1 ] N for t = 1, 2, . . . ℓ t = ℓ t ( 1 ) , . . . , ℓ t ( N ) .7 .3 .2 .4 .1 .6 .7 .4 .9 For t = 1, 2, . . . Player picks an action I t (possibly using randomization) and 1 incurs loss ℓ t ( I t ) Player gets feedback information: ℓ t ( 1 ) , . . . , ℓ t ( N ) 2 N. Cesa-Bianchi (UNIMI) Online Learning 12 / 49

  19. Regret analysis Regret � T � T � � def ℓ t ( i ) want R T = E ℓ t ( I t ) − min = o ( T ) i = 1,..., N t = 1 t = 1 N. Cesa-Bianchi (UNIMI) Online Learning 13 / 49

  20. Regret analysis Regret � T � T � � def ℓ t ( i ) want R T = E ℓ t ( I t ) − min = o ( T ) i = 1,..., N t = 1 t = 1 Lower bound using random losses [Experts’ paper, 1997] ℓ t ( i ) → L t ( i ) ∈ { 0, 1 } independent random coin flip � T � � = T For any player strategy E L t ( I t ) 2 t = 1 Then the expected regret is � �� � � � 1 T � � T ln N E max 2 − L t ( i ) = 1 − o ( 1 ) 2 i = 1,..., N t = 1 for N , T → ∞ N. Cesa-Bianchi (UNIMI) Online Learning 13 / 49

  21. Exponentially weighted forecaster (Hedge) At time t pick action I t = i with probability proportional to � � t − 1 � exp − η ℓ s ( i ) s = 1 the sum at the exponent is the total loss of action i up to now Regret bound [Experts’ paper, 1997] � � T ln N If η = ( ln N ) / ( 8 T ) then R T � 2 Matching lower bound including constants � Dynamic choice η t = ( ln N ) / ( 8 t ) only loses small constants N. Cesa-Bianchi (UNIMI) Online Learning 14 / 49

  22. The nonstochastic bandit problem ? ? ? ? ? ? ? ? ? N. Cesa-Bianchi (UNIMI) Online Learning 15 / 49

  23. The nonstochastic bandit problem ? ? ? ? ? ? ? ? ? For t = 1, 2, . . . Player picks an action I t (possibly using randomization) and 1 incurs loss ℓ t ( I t ) N. Cesa-Bianchi (UNIMI) Online Learning 15 / 49

  24. The nonstochastic bandit problem ? .3 ? ? ? ? ? ? ? For t = 1, 2, . . . Player picks an action I t (possibly using randomization) and 1 incurs loss ℓ t ( I t ) Player gets partial information: Only ℓ t ( I t ) is revealed 2 N. Cesa-Bianchi (UNIMI) Online Learning 15 / 49

  25. The nonstochastic bandit problem ? .3 ? ? ? ? ? ? ? For t = 1, 2, . . . Player picks an action I t (possibly using randomization) and 1 incurs loss ℓ t ( I t ) Player gets partial information: Only ℓ t ( I t ) is revealed 2 Player still competing agaist best o ffl ine action � T � T � � R T = E ℓ t ( I t ) − min ℓ t ( i ) i = 1,..., N t = 1 t = 1 N. Cesa-Bianchi (UNIMI) Online Learning 15 / 49

  26. The Exp3 algorithm [Auer et al., 2002] Hedge with estimated losses � � t − 1 � � P t ( I t = i ) ∝ exp − η ℓ s ( i ) i = 1, . . . , N s = 1  ℓ t ( i )  � � if I t = i � ℓ t ( i ) = P t ℓ t ( i ) observed  0 otherwise Only one non-zero component in � ℓ t N. Cesa-Bianchi (UNIMI) Online Learning 16 / 49

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend