Online Algorithms: Learning & Optimization with No Regret. - PowerPoint PPT Presentation

Online Algorithms: Learning & Optimization with No Regret. CS/CNS/EE 253 Daniel Golovin 1 CS/CNS/EE 253

The Setup Optimization: ● Model the problem (objective, constraints) ● Pick best decision from a feasible set. Learning: ● Model the problem (objective, hypothesis class) ● Pick best hypothesis from a feasible set. 2 CS/CNS/EE 253

Online Learning/Optimization Choose an action Get f t ( x t ) and feedback x t 2 X f t : X ! [0 ; 1] ● Same feasible set X in each round t ● Different Reward Models: ● Stochastic, Arbitrary but Oblivious, Adaptive and Arbitrary 3 CS/CNS/EE 253

Concrete Example: Commuting Pick a path x t from home to school. Pay cost f t ( x t ) := P e 2 x t c t ( e ) Then see all edge costs for that round. Dealing with Limited Feedback: later in the course. 4 CS/CNS/EE 253

Other Applications ● Sequential decision problems ● Streaming algorithms for optimization/learning with large data sets ● Combining weak learners into strong ones (“boosting”) ● Fast approximate solvers for certain classes of convex programs ● Playing repeated games 5 CS/CNS/EE 253

Binary prediction with a perfect expert ● n hypotheses (“experts”) h 1 ; h 2 ; : : : ; h n ● Guaranteed that some hypothesis is perfect. ● Each round, get a data point p t and classifications h i ( p t ) 2 f 0 ; 1 g ● Output binary prediction x t , observe correct label ● Minimize # mistakes Any Suggestions? 6 CS/CNS/EE 253

A Weighted Majority Algorithm ● Each expert “votes” for it's classification. ● Only votes from experts who have never been wrong are counted. ● Go with the majority # mistakes M · log 2 ( n ) Weights w it = I ( h i correct on ¯rst t rounds ). W t = P i w it . W 0 = n , W T ¸ 1 Mistake on round t implies W t +1 · W t = 2 So 1 · W T · W 0 = 2 M = n= 2 M 7 CS/CNS/EE 253

Weighted Majority [Littlestone & Warmuth '89] What if there's no perfect expert? ● Each expert i has a weight w(i) , “votes” for it's classification in {-1, 1}. Go with the weighted majority, predict sign( ∑ i w i x i ). Halve weights of wrong experts. Let m = # mistakes of best expert. How many mistakes M do we make? Weights w it = (1 = 2) ( # mistakes by i on ¯rst t rounds) . Let W t := P i w it . Note W 0 = n , W T ¸ (1 = 2) m Mistake on round t implies W t +1 · 3 4 W t So (1 = 2) m · W T · W 0 (3 = 4) M = n ¢ (3 = 4) M Thus (4 = 3) M · n ¢ 2 m and M · 2 : 41( m + log 2 ( n )). 8 CS/CNS/EE 253

Can we do better? M · 2 : 41( m + log 2 ( n )) Experts\Time 1 2 3 4 e 1 ´ ¡ 1 0 1 0 1 e 2 ´ 1 1 0 1 0 ● No deterministic algorithm can get M < 2m. ● What if there are more than 2 choices? 9 CS/CNS/EE 253

Regret “Maybe all one can do is hope to end up with the right regrets.” – Arthur Miller ● Notation: Define loss or cost functions c t and define the regret of x 1 , x 2 , ... , x T as X T X T c t ( x ¤ ) c t ( x t ) ¡ R T = t =1 t =1 P T where x ¤ = argmin x 2 X t =1 c t ( x ) A sequence has \no-regret" if R T = o ( T ). ● Questions: ● How can we improve Weighted Majority? ● What is the lowest regret we can hope for? 10 CS/CNS/EE 253

The Hedge/WMR Algorithm* [Freund & Schapire '97] X Hedge( ² ) p t ( i ) := w it = w jt Initialize w i 0 = 1 for all i . j In each round t : Choose expert e t from categorical distribution p t Select x t = x ( e t ; t ), the advice/prediction of e t . For each i , set w i;t +1 = w it (1 ¡ ² ) c t ( x ( e i ;t )) ● How does this compare to WM? * Pedantic note: Hedge is often called “Randomized Weighted Majority”, and abbreviated “WMR”, though WMR was published in the context of binary classification, unlike Hedge. 11 CS/CNS/EE 253

The Hedge/WMR Algorithm X Hedge( ² ) p t ( i ) := w it = w jt Initialize w i 0 = 1 for all i . j In each round t : Choose expert e t from categorical distribution p t Select x t = x ( e t ; t ), the advice/prediction of e t . For each i , set w i;t +1 = w it (1 ¡ ² ) c t ( x ( e i ;t )) Randomization Influence shrinks exponentially with cumulative loss. Intuitively: Either we do well on a round, or total weight drops, and total weight can't drop too much unless every expert is lousy. 12 CS/CNS/EE 253

Hedge Performance Theorem: Let x 1 ; x 2 ; : : : be the choices of Hedge( ² ). Then " T # µ ¶ X 1 OPT T + ln( n ) E · c t ( x t ) 1 ¡ ² ² t =1 P T where OPT T := min i t =1 c t ( x ( e i ; t )). ³p ´ p ln( n ) = OPT OPT ln( n )) If ² = £ , the regret is £( 13 CS/CNS/EE 253

Hedge Analysis Intuitively: Either we do well on a round, or total weight drops, and total weight can't drop too much unless every expert is lousy. Let W t := P i w it . Then W 0 = n and W T +1 ¸ (1 ¡ ² ) OPT . X w it (1 ¡ ² ) c t ( x it ) W t +1 = (1) i X W t p t ( i )(1 ¡ ² ) c t ( x it ) [def of p t ( i )] = (2) i X [Bernoulli's ineq] · W t p t ( i ) (1 ¡ ² ¢ c t ( x it )) (3) If x > ¡ 1 ; r 2 (0 ; 1) i then (1 + x ) r · 1 + rx = W t (1 ¡ ² ¢ E [ c t ( x t )]) (4) [1 ¡ x · e ¡ x ] · W t ¢ exp ( ¡ ² ¢ E [ c t ( x t )]) (5) 14 CS/CNS/EE 253

Hedge Analysis Ã ! T X E [ c t ( x t )] W T +1 =W 0 · exp ¡ ² t =1 Ã ! X T E [ c t ( x t )] W 0 =W T +1 ¸ exp ² t =1 Recall W 0 = n and W T +1 ¸ (1 ¡ ² ) OPT . " T # µ W 0 ¶ X ¡ OPT ¢ ln(1 ¡ ² ) · 1 · ln( n ) E c t ( x t ) ² ln W T +1 ² ² t =1 + OPT · ln( n ) 1 ¡ ² ² 15 CS/CNS/EE 253

Lower Bound ³p ´ p ln( n ) = OPT OPT ln( n )) If ² = £ , the regret is £( Can we do better? Let c t ( x ) » Bernoulli(1/2) for all x and t . Let Z i := P T t =1 c t ( x ( e i ; t )). Then Z i » Bin( T; 1 = 2) is roughly normally distributed, p with ¾ = 1 T . 2 P [ Z i · ¹ ¡ k¾ ] = exp ¡ ¡ £( k 2 ) ¢ We get about ¹ = T= 2, best choice is likely p p OPT ln( n )). to get ¹ ¡ £( T ln( n )) = ¹ ¡ £( 16 CS/CNS/EE 253

What have we shown? ● Simple algorithm that learns to do nearly as ● Simple algorithm that learns to do nearly as well as best fixed choice. well as best fixed choice. ● Hedge can exploit any pattern that the best choice ● Hedge can exploit any pattern that the best choice does. does. ● Works for Adaptive Adversaries. ● Works for Adaptive Adversaries. ● Suitable for playing repeated games. Related ideas ● Suitable for playing repeated games. Related ideas appearing in Algorithmic Game Theory literature. appearing in Algorithmic Game Theory literature. 17 CS/CNS/EE 253

Related Questions ● Optimize and get no-regret against richer classes of strategies/experts: – All distributions over experts – All sequences of experts that have K transitions [Auer et al '02] – Various classes of functions of input features [Blum & Mansour '05] ● E.g., consider time of day when choosing driving route. – Arbitrary convex set of experts, metric space of experts, etc, with linear, convex, or Lipschitz costs. [Zinkevich '03, Kleinberg et al '08] – All policies of a K-state initially unknown Markov Decision Process that models the world. [Auer et al '08] R n – Arbitrary sets of strategies in with linear costs that we can optimize offline. [Hannan'57, Kalai & Vempala '02] 18 CS/CNS/EE 253

Related Questions ● Other notions of regret (see e.g., [Blum & Mansour '05]) ● Time selection functions: – get low regret on mondays, rainy days, etc. ● Sleeping experts: – if rule “if(P) then predict Q” is right 90% of the time it applies, be right 89% of the time P applies. ● Internal regret & swap regret: – If you played x 1 , ..., x T then have no regret against g ( x 1 ), ..., g ( x T ) for every g:X→X 19 CS/CNS/EE 253

Sleeping Experts [Freund et al '97, Blum '97, Blum & Mansour '05] ● if rule “if(P) then predict Q” is right 90% of the time it applies, be right 89% of the time P applies. Get this for every rule simultaneously. ● Idea: Generate lots of hypotheses that “specialize” on certain inputs, some good, some lousy, and combine them into a great classifier. ● Many applications: ● Document classification, Spam filtering, Adaptive Uis, ... – if (“physics” in D) then classify D as “science”. ● Predicates can overlap. 20 CS/CNS/EE 253

Sleeping Experts ● Predicates can overlap ● E.g., predict college major given the classes C you're enrolled in? – if(ML-101, CS-201 in C) then CS – if(ML-101, Stats-201 in C) then Stats ● What do we predict for students enrolled in ML-101, CS-201, and Stats-201? 21 CS/CNS/EE 253

Sleeping Experts [Algorithm from Blum & Mansour '05] SleepingExperts( ¯ , E , F ) Input: ¯ 2 (0 ; 1), experts E , time selection functions F Initialize w 0 e;f = 1 for all e 2 E ; f 2 F . In each round t : e = P Let w t f f ( t ) w t e;f . Let W t = P e w t e . Let p t e = w t e =W t . Choose expert e t from categorical distribution p t Select x t = x ( e t ; t ), the advice/prediction of e t . For each e 2 E ; f 2 F e;f ¯ f ( t )( c t ( e ) ¡ ¯ E [ c t ( e t )]) w t +1 e;f = w t 22 CS/CNS/EE 253

Sleeping Experts [Algorithm from Blum & Mansour '05] e;f ¯ f ( t )( c t ( e ) ¡ ¯ E [ c t ( e t )]) w t +1 e;f = w t X Ensures total sum of weights w t e;f · nm for all t can never increase. e;f Y ¯ f ( t )( c t ( e ) ¡ ¯ E [ c t ( e t )]) w T e;f = t ¸ 0 P t ¸ 0 [ f ( t )( c t ( e ) ¡ ¯ E [ c t ( e t )])] = ¯ · nm 23 CS/CNS/EE 253

Online Algorithms: Learning & Optimization with No Regret. - PowerPoint PPT Presentation

Online Algorithms: Learning & Optimization with No Regret. CS/CNS/EE 253 Daniel Golovin 1 CS/CNS/EE 253 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning:

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Regret Bounds for Lifelong Learning Pierre Alquier Groupe de Travail de Machine learning du CMLA

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Intro to Online Learning Instructor: Haifeng Xu Outline Online Learning/Optimization

Online Learning and Online Convex Optimization Nicol` o Cesa-Bianchi Universit` a degli Studi

to No-Regret Online Learning Stephane Ross Joint work with Drew Bagnell & Geoff Gordon

Efficient Online Portfolio with Logarithmic Regret Haipeng Luo (USC) Chen-Yu Wei (USC) Kai Zheng

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech

Cautious R Regret M Minimization: Online O Optimization w with L Long-Term B Budg udget Co

Online Learning and Online Investing Jia Mao February 20, 2006 Jia Mao () Online Learning and

Algorithms for unconstrained local optimization Fabio Schoen 2008

Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback Alekh Agarwal

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Composability of Regret Minimizers Gabriele Farina 1 Christian Kroer 2 Tuomas Sandholm 1,3,4,5 1

A Randomized Online Algorithm for Bandwidth Utilization Sanjeev Arora and Bo Brinkman Princeton

A N ad hoc network is a dynamic multihop wireless net- denote corresponding changes in available

Predicting Prepositions for SMT Marion Weller, Alexander Fraser and Sabine Schulte im Walde

Assessing the Costs and Benefits of Green Growth: Update on U.S. and Mexican Approaches Winston

Masoumeh Ebrahimi 1 , Masoud Daneshtalab 1 , Fahimeh Farahnakian 1 , Juha Plosila 1 , Pasi

Review of Internet Architecture and Protocols Professor Guevara Noubir Northeastern University

Towards traffic Towards traffic-aware routing using o a ds t a o a ds t a c c a a e out aware

Routing in Wireless and Adversarial Networks Christian Scheideler Institut fr Informatik

Online Algorithms: Learning & Optimization with No Regret. - PowerPoint PPT Presentation

Online Algorithms: Learning & Optimization with No Regret. CS/CNS/EE 253 Daniel Golovin 1 CS/CNS/EE 253 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning:

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Regret Bounds for Lifelong Learning Pierre Alquier Groupe de Travail de Machine learning du CMLA

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Intro to Online Learning Instructor: Haifeng Xu Outline Online Learning/Optimization

Online Learning and Online Convex Optimization Nicol` o Cesa-Bianchi Universit` a degli Studi

to No-Regret Online Learning Stephane Ross Joint work with Drew Bagnell &amp; Geoff Gordon

Efficient Online Portfolio with Logarithmic Regret Haipeng Luo (USC) Chen-Yu Wei (USC) Kai Zheng

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech

Cautious R Regret M Minimization: Online O Optimization w with L Long-Term B Budg udget Co

Online Learning and Online Investing Jia Mao February 20, 2006 Jia Mao () Online Learning and

Algorithms for unconstrained local optimization Fabio Schoen 2008

Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback Alekh Agarwal

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Composability of Regret Minimizers Gabriele Farina 1 Christian Kroer 2 Tuomas Sandholm 1,3,4,5 1

A Randomized Online Algorithm for Bandwidth Utilization Sanjeev Arora and Bo Brinkman Princeton

A N ad hoc network is a dynamic multihop wireless net- denote corresponding changes in available

Predicting Prepositions for SMT Marion Weller, Alexander Fraser and Sabine Schulte im Walde

Assessing the Costs and Benefits of Green Growth: Update on U.S. and Mexican Approaches Winston

Masoumeh Ebrahimi 1 , Masoud Daneshtalab 1 , Fahimeh Farahnakian 1 , Juha Plosila 1 , Pasi

Review of Internet Architecture and Protocols Professor Guevara Noubir Northeastern University

Towards traffic Towards traffic-aware routing using o a ds t a o a ds t a c c a a e out aware

Routing in Wireless and Adversarial Networks Christian Scheideler Institut fr Informatik

to No-Regret Online Learning Stephane Ross Joint work with Drew Bagnell & Geoff Gordon