on the generalization ability of online learning
play

On the Generalization Ability of Online Learning Algorithms for - PowerPoint PPT Presentation

On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions Purushottam Kar , Bharath Sriperumbudur , Prateek Jain and Harish Karnick Indian Institute of Technology Kanpur Center for Mathematical


  1. On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions Purushottam Kar ∗ , Bharath Sriperumbudur † , Prateek Jain § and Harish Karnick ∗ ∗ Indian Institute of Technology Kanpur † Center for Mathematical Sciences, University of Cambridge § Microsoft Research India International Conference on Machine Learning 2013

  2. Pointwise Loss Functions Loss functions for classification, regression .. ℓ : H × Z → R .. look at only one point z = ( x , y ) at a time Examples : • Hinge loss: ℓ ( h , z ) = [1 − y · h ( x )] + • ǫ -insensitive loss: ℓ ( h , z ) = [ | y − h ( x ) | − ǫ ] + • Logistic loss: ℓ ( h , z ) = ln (1 + exp ( y · h ( x ))) ICML 2013 Online Learning for Pairwise Loss Functions Introduction 2/11

  3. Metric Learning for Classification learned metric Metric needs to be penalized for bringing blue and red points together ICML 2013 Online Learning for Pairwise Loss Functions Introduction 3/11

  4. Metric Learning for Classification learned metric Metric needs to be penalized for bringing blue and red points together • Loss function needs to consider two data points at a time ◦ .. in other words, a pairwise loss function 1 − d 2 � � �� • Example : ℓ ( d M , z 1 , z 2 ) = φ y 1 y 2 M ( x 1 , x 2 ) where φ is the hinge loss function ICML 2013 Online Learning for Pairwise Loss Functions Introduction 3/11

  5. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Examples : • Mahalanobis metric learning • Bipartite ranking / maximizing area under ROC curve • Preference learning • Two-stage Multiple kernel learning • Similarity (indefinite kernel) learning ICML 2013 Online Learning for Pairwise Loss Functions Introduction 4/11

  6. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Online Learning for Pairwise Loss Functions ? • Algorithmic Challenges ◦ Attempts to reduce to pointwise learning ◦ Treat pairs ( z i , z j ) as elements of a superdomain ˜ Z = Z × Z ? • Problem : one does not receive pairs in the data stream ! • Solution : an online learning model for pairwise loss functions ICML 2013 Online Learning for Pairwise Loss Functions Introduction 4/11

  7. Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner • At each time t , adversary gives us a single data point z t = ( x t , y t ) • Loss ℓ t on hypothesis h t − 1 calculated by pairing z t with past points ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

  8. Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner • At each time t , adversary gives us a single data point z t = ( x t , y t ) • Loss ℓ t on hypothesis h t − 1 calculated by pairing z t with past points . . . ] [ . . . Buffer B z 0 z 1 z 2 z 3 ( z t , z 1 ) ( z t , z 2 ) . . . ( z t , z t − 1 ) • Pair up with all previous points • Incur loss 1 ˆ L ∞ t ( h t − 1 ) = t − 1 ( ℓ ( h t − 1 , z t , z 1 ) + ℓ ( h t − 1 , z t , z 2 ) + . . . + ℓ ( h t − 1 , z t , z t − 1 )) ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

  9. Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner • At each time t , adversary gives us a single data point z t = ( x t , y t ) • Loss ℓ t on hypothesis h t − 1 calculated by pairing z t with (some) past points [ ] Finite Buffer B • Capacity to store s data items at a time ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

  10. Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner • At each time t , adversary gives us a single data point z t = ( x t , y t ) • Loss ℓ t on hypothesis h t − 1 calculated by pairing z t with (some) past points [ z i 0 z i 5 ] Finite Buffer B z i 1 z i 2 z i 3 z i 4 • Can pair up only with buffer points ( z t , z i 1 ) ( z t , z i 2 ) . . . ( z t , z i 5 ) • Incur loss t ( h t − 1 ) = 1 L buf ˆ s ( ℓ ( h t − 1 , z t , z i 1 ) + ℓ ( h t − 1 , z t , z i 2 ) + . . . + ℓ ( h t − 1 , z t , z i s )) ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

  11. Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Regret Bounds in this Model : • How well are we able to do on all possible pairs ◦ All-pairs Regret Bound : n − 1 n 1 1 � L ∞ ˆ � L ∞ ˆ t ( h ) + R ∞ t ( h t ) ≤ inf n n − 1 n − 1 h ∈H t =1 t =2 ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

  12. Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Regret Bounds in this Model : • How well are we able to do on all possible pairs ◦ All-pairs Regret Bound : n − 1 n 1 1 � L ∞ ˆ � L ∞ ˆ t ( h ) + R ∞ t ( h t ) ≤ inf n n − 1 n − 1 h ∈H t =1 t =2 • How well are we able to do on pairs that we have seen ◦ Finite-buffer Regret Bound : n − 1 n 1 1 � L buf ˆ � L buf ˆ t ( h ) + R buf t ( h t ) ≤ inf n n − 1 n − 1 h ∈H t =1 t =2 ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

  13. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Offline Learning for Pairwise Loss Functions ? • Online techniques used for several batch applications ◦ PEGASOS, LASVM .. ◦ Even more important for pairwise loss functions • Expensive latency costs in sampling i.i.d. pairs from disk. ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

  14. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Offline Learning for Pairwise Loss Functions ? • Problem : Generalization Bounds for Online Algorithms ◦ Online learning process generates hypothesis ¯ h ◦ Generalization performance L ( h ) := E z 1 , z 2 � ℓ ( h , z 1 , z 2 ) � ◦ Wish to bound excess risk : E n = L (¯ h ) − inf h ∈H L ( h ) • Solution : Online-to-batch conversion bounds ◦ Bound E n for learned predictor in terms of in terms of R buf or R ∞ n n ◦ Problem (for later): Existing OTB techniques dont work here ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

  15. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R • Online AUC Maximization [ Zhao et al, ICML 2011 ] ◦ Use classical stream sampling algorithm RS ◦ All-pairs regret bound needs fixing ◦ Finite-buffer regret bound holds (implicit) ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

  16. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R • Online AUC Maximization • OLP: Online Learning for PLF [ Zhao et al, ICML 2011 ] [ This work ] ◦ Use classical stream sampling ◦ Use a novel stream sampling algorithm RS algorithm RS-x ◦ All-pairs regret bound needs ◦ Guaranteed sublinear regret w.r.t fixing all-pairs ◦ Finite-buffer regret bound holds ◦ Finite-buffer regret bound holds (implicit) ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

  17. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R • OTB conversion Bounds for PLF [ Wang et al, COLT 2012 ] ◦ Work only w.r.t all-pairs regret bounds ◦ Unable to handle [ Zhao et al, ICML 2011 ] ◦ Bounds depend linearly on input dimension ◦ Dont handle sparse learning formulations ◦ Basic rates of convergence ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

  18. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R • OTB conversion Bounds for PLF • OTB conversion Bounds for PLF [ Wang et al, COLT 2012 ] [ This work ] ◦ Work only w.r.t all-pairs regret ◦ Work with all-pairs and finite-buffer bounds regret ◦ Unable to handle ◦ Able to handle [ Zhao et al, ICML 2011 ] [ Zhao et al, ICML 2011 ] ◦ Bounds depend linearly on input ◦ Bounds independent of input dimension dimension ◦ Dont handle sparse learning ◦ Handle sparse learning formulations formulations ◦ Fast rates for strongly convex ◦ Basic rates of convergence pairwise loss functions ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

  19. Online Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Learning Algorithm : • Hypothesis update • Buffer update ◦ Guarantees Regret Bounds : • Finite-buffer regret • All-pairs regret ICML 2013 Online Learning for Pairwise Loss Functions Our Contributions 7/11

  20. Online Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Learning Algorithm : OLP : O nline L earning for P airwise Loss Functions • Hypothesis update 1. Start off with h 0 = 0 and empty buffer B • Buffer update At each time step t = 1 . . . n ◦ Guarantees 2. Receive new training point z t Construct loss function ℓ t = ˆ L buf 3. Regret Bounds : t � � h t − 1 − η 4. h t ← Π Ω √ t ∇ h ℓ t ( h t − 1 ) • Finite-buffer regret • All-pairs regret 5. Update buffer B with z t 6. Return ¯ � n − 1 h = 1 t =0 h t n ICML 2013 Online Learning for Pairwise Loss Functions Our Contributions 7/11

  21. Online Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Learning Algorithm : RS-x : R eservoir S ampling with Repla x ement • Hypothesis update z 0 • Buffer update [ ] ◦ Guarantees Regret Bounds : • Finite-buffer regret • All-pairs regret ICML 2013 Online Learning for Pairwise Loss Functions Our Contributions 7/11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend