Online Learning Tomaso Poggio and Lorenzo Rosasco 9.520 Class 15 - PowerPoint PPT Presentation

Online Learning Tomaso Poggio and Lorenzo Rosasco 9.520 Class 15 March 30 2011 T. Poggio and L. Rosasco Online Learning

About this class Goal To introduce theory and algorithms for online learning. T. Poggio and L. Rosasco Online Learning

Plan Different views on online learning From batch to online least squares Other loss functions Theory T. Poggio and L. Rosasco Online Learning

(Batch) Learning Algorithms A learning algorithm A is a map from the data space into the hypothesis space and f S = A ( S ) , where S = S n = ( x 0 , y 0 ) . . . . ( x n − 1 , y n − 1 ) . We typically assume that: A is deterministic, A does not depend on the ordering of the points in the training set. notation: note the weird numbering of the training set! T. Poggio and L. Rosasco Online Learning

Online Learning Algorithms The pure online learning approach is O ( 1 ) in time and memory with respect to the data. let f 1 = init for n = 1 , . . . f n + 1 = A ( f n , ( x n , y n )) The algorithm works sequentially and has a recursive definition. T. Poggio and L. Rosasco Online Learning

Online Learning Algorithms (cont.) A related approach (similar to transductive learning) is typically O ( 1 ) in time but not in memory with respect to the data. let f 1 = init for n = 1 , . . . f n + 1 = A ( f n , S n , ( x n , y n )) Also in this case the algorithm works sequentially and has a recursive definition, but it requires storing the past data S n . T. Poggio and L. Rosasco Online Learning

Why Online Learning? Different motivations/perspectives that often corresponds to different theoretical framework. Biologically plausibility. Stochastic approximation. Incremental Optimization. Non iid data, game theoretic view. T. Poggio and L. Rosasco Online Learning

Online Learning and Stochastic Approximation Our goal is to minimize the expected risk � I [ f ] = E ( x , y ) [ V ( f ( x ) , y )] = V ( f ( x ) , y ) d µ ( x , y ) over the hypothesis space H , but the data distribution is not known. The idea is to use the samples to build an approximate solution and to update such a solution as we get more data. T. Poggio and L. Rosasco Online Learning

Online Learning and Stochastic Approximation (cont.) More precisely if we are given samples ( x i , y i ) i in a sequential fashion at the n − th step we have an approximation G ( f , ( x n , y n )) of the gradient of I [ f ] then we can define a recursion by let f 1 = init for n = 1 , . . . f n + 1 = f n + γ n ( G ( f n , ( x n , y n )) T. Poggio and L. Rosasco Online Learning

Incremental Optimization Here our goal is to solve empirical risk minimization I S [ f ] , or regularized empirical risk minimization S [ f ] = I S [ f ] + λ � f � 2 I λ over the hypothesis space H , when the number of points is so big (say n = 10 8 − 10 9 ) that standard solvers would not be feasible. Memory is the main constraint here. T. Poggio and L. Rosasco Online Learning

Incremental Optimization (cont.) In this case we can consider let f 1 = init for t = 1 , . . . f t +! = f t + γ t ( G ( f t , ( x n t , y n t )) where here G ( f t , ( x n t , y n t )) is a pointwise estimate of I S or I λ S . Epochs Note that in this case the number of iteration is decoupled to the index of training set points and we can look at the data more than once, that is consider different epochs . T. Poggio and L. Rosasco Online Learning

Non i.i.d. data, game theoretic view If the data are not i.i.d. we can consider a setting when the data is a finite sequence that we will be disclosed to us in a sequential (possibly adversarial) fashion. Then we can see learning as a two players game where at each step nature chooses a samples ( x i , y i ) at each step a learner chooses an estimator f n . The goal of the learner is to perform as well as if he could view the whole sequence. T. Poggio and L. Rosasco Online Learning

Plan Different views on online learning From batch to online least squares Other loss functions Theory T. Poggio and L. Rosasco Online Learning

Recalling Least Squares We start considering a linear kernel so that n − 1 I S [ f ] = 1 � ( y i − x T i w ) = � Y − Xw � 2 n i = 0 Remember that in this case n − 1 w n = ( X T X ) − 1 X T Y = C − 1 � x i y i . n i = 0 (Note that if we regularize we have ( C n + λ I ) − 1 in place of C − 1 n . notation: note the weird numbering of the training set! T. Poggio and L. Rosasco Online Learning

A Recursive Least Squares Algorithm Then we can consider w n + 1 = w n + C − 1 n + 1 x n [ y n − x T n w n ] . Proof n ( � n − 1 w n = C − 1 i = 0 x i y i ) n + 1 ( � n − 1 w n + 1 = C − 1 i = 0 x i y i + x n y n ) w n + 1 − w n = C − 1 n + 1 ( x n y n ) + C − 1 n + 1 ( C n − C n + 1 ) C − 1 � n − 1 i = 0 x i y i n C n + 1 − C n = x n x T n . T. Poggio and L. Rosasco Online Learning

A Recursive Least Squares Algorithm (cont.) We derived the algorithm w n + 1 = w n + C − 1 n + 1 x n [ y n − x T n w n ] . The above approach is recursive; requires storing all the data; requires inverting a matrix ( C i ) i at each step. T. Poggio and L. Rosasco Online Learning

A Recursive Least Squares Algorithm (cont.) The following matrix equality allows to alleviate the computational burden. Matrix Inversion Lemma [ A + BCD ] − 1 = A − 1 − A − 1 B [ DA − 1 B + C − 1 ] − 1 DA − 1 Then − C − 1 n C − 1 n x n x T n C − 1 n + 1 = C − 1 . n n C − 1 1 + x T n x n T. Poggio and L. Rosasco Online Learning

A Recursive Least Squares Algorithm (cont.) Moreover n x n − C − 1 n x n x T n C − 1 C − 1 C − 1 n + 1 x n = C − 1 n n x n = x n , n C − 1 n C − 1 1 + x T 1 + x T n x n n x n we can derive the algorithm C − 1 n x n [ y n − x T w n + 1 = w n + n w n ] . n C − 1 1 + x T n x n Since the above iteration is equivalent to empirical risk minimization (ERM) the conditions ensuring its convergence – as n → ∞ – are the same as those for ERM. T. Poggio and L. Rosasco Online Learning

Online Learning Tomaso Poggio and Lorenzo Rosasco 9.520 Class 15 - PowerPoint PPT Presentation

Online Learning Tomaso Poggio and Lorenzo Rosasco 9.520 Class 15 March 30 2011 T. Poggio and L. Rosasco Online Learning About this class Goal To introduce theory and algorithms for online learning. T. Poggio and L. Rosasco Online Learning

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Online Learning and Online Investing Jia Mao February 20, 2006 Jia Mao () Online Learning and

Teaching with Online Platforms What is an Online Learning Platform? A n Online Learning Platform

ONLINE ADVERTISING What is SIBC online? SIBC Online is a leading online news source for the

Online Learning with Kernel Losses Aldo Pacchiano UC Berkeley Joint work with Niladri Chatterji

Online Learning Online Learning yeah ! yeah ! Any time, anywhere learning -- free

ONLINE LEARNING SERIES 2018 ONLINE LEARNING SERIES 2018 LEARNING AND CAPACITY DEVELOPMENT FOR

Efficient Online Learning using A Private Oracle Alon Gonen, UCSD Elad Hazan, Princeton Shay

SECURE-ONLINE (ZEKER-ONLINE) Quality mark for online cloud services Tom Vreeburg Boardmember

How DGD.online helps prepare DG documentation easily CIFFA Webinar, June 25, 2019 DGD.online

Getting Online Getting Online Domain Names Email Google My Business Listing

Online Identity & Social Media by: Nicole Santarsiero What is Online Identity? -Online

2008 Online Awards Awards Banquet Better Newspaper Online Contest 2008 Best Online Advertising

2013 IRS Online Services Update IRS Online Services Update Jim Weaver Director, Product

ONLINE PROCESS SIMULATION ONLINE PROCESS SIMULATION ONLINE, REAL-TIME AND PREDICTIVE PROCESS DATA

Benefits of Online Reporting WHY YOU SHOULD BE REPORTING ONLINE Implementation Benefits of Online

An Internet measurement platform for the e-learning community Olivier.Fourmaux@upmc.fr

Priority Technology Holdings, Inc. Slides Supplementing Fourth Quarter and Full Year 2019

Computational Social Choice Ulle Endriss Institute for Logic, Language and Computation

E-COMMERCE SEARCH: APPLICATIONS AND REQUIREMENTS OTTO @ MICES 2019 // Berlin 24.06.2019 1

Innovative EC Systems: From E- Government to

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization Zhenxun

Enhanced e-Learning Experience by Pushing the Limits of Semantic Web Technologies Andrea

Online Learning Your guide: Avrim Blum Carnegie Mellon University [Machine Learning Summer School

Online Learning Tomaso Poggio and Lorenzo Rosasco 9.520 Class 15 - PowerPoint PPT Presentation

Online Learning Tomaso Poggio and Lorenzo Rosasco 9.520 Class 15 March 30 2011 T. Poggio and L. Rosasco Online Learning About this class Goal To introduce theory and algorithms for online learning. T. Poggio and L. Rosasco Online Learning

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Online Learning and Online Investing Jia Mao February 20, 2006 Jia Mao () Online Learning and

Teaching with Online Platforms What is an Online Learning Platform? A n Online Learning Platform

ONLINE ADVERTISING What is SIBC online? SIBC Online is a leading online news source for the

Online Learning with Kernel Losses Aldo Pacchiano UC Berkeley Joint work with Niladri Chatterji

Online Learning Online Learning yeah ! yeah ! Any time, anywhere learning -- free

ONLINE LEARNING SERIES 2018 ONLINE LEARNING SERIES 2018 LEARNING AND CAPACITY DEVELOPMENT FOR

Efficient Online Learning using A Private Oracle Alon Gonen, UCSD Elad Hazan, Princeton Shay

SECURE-ONLINE (ZEKER-ONLINE) Quality mark for online cloud services Tom Vreeburg Boardmember

How DGD.online helps prepare DG documentation easily CIFFA Webinar, June 25, 2019 DGD.online

Getting Online Getting Online Domain Names Email Google My Business Listing

Online Identity &amp; Social Media by: Nicole Santarsiero What is Online Identity? -Online

2008 Online Awards Awards Banquet Better Newspaper Online Contest 2008 Best Online Advertising

2013 IRS Online Services Update IRS Online Services Update Jim Weaver Director, Product

ONLINE PROCESS SIMULATION ONLINE PROCESS SIMULATION ONLINE, REAL-TIME AND PREDICTIVE PROCESS DATA

Benefits of Online Reporting WHY YOU SHOULD BE REPORTING ONLINE Implementation Benefits of Online

An Internet measurement platform for the e-learning community Olivier.Fourmaux@upmc.fr

Priority Technology Holdings, Inc. Slides Supplementing Fourth Quarter and Full Year 2019

Computational Social Choice Ulle Endriss Institute for Logic, Language and Computation

E-COMMERCE SEARCH: APPLICATIONS AND REQUIREMENTS OTTO @ MICES 2019 // Berlin 24.06.2019 1

Innovative EC Systems: From E- Government to

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization Zhenxun

Enhanced e-Learning Experience by Pushing the Limits of Semantic Web Technologies Andrea

Online Learning Your guide: Avrim Blum Carnegie Mellon University [Machine Learning Summer School

Online Identity & Social Media by: Nicole Santarsiero What is Online Identity? -Online