parameter free convex learning through coin betting
play

Parameter-Free Convex Learning through Coin Betting Francesco - PowerPoint PPT Presentation

Parameter-Free Convex Learning through Coin Betting Francesco Orabona and Dvid Pl Yahoo Research, NY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Are You Still


  1. Parameter-Free Convex Learning through Coin Betting Francesco Orabona and Dávid Pál Yahoo Research, NY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  2. Are You Still Tuning/Learning/Adapting Hyperparameters? Standard Machine Learning procedures Regularized empirical risk minimization: N λ 2 ∥ w ∥ 2 + ∑ arg min f ( w , x i , y i ) w ∈ R d i =1 where f is convex in w . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  3. Are You Still Tuning/Learning/Adapting Hyperparameters? Standard Machine Learning procedures Regularized empirical risk minimization: N λ 2 ∥ w ∥ 2 + ∑ arg min f ( w , x i , y i ) w ∈ R d i =1 where f is convex in w . ■ How do you choose the regularizer weight λ ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  4. Are You Still Tuning/Learning/Adapting Hyperparameters? Standard Machine Learning procedures Stochastic approximation: w t = w t − 1 − η t ∇ f ( w t − 1 , x t , y t ) where f is convex in w . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  5. Are You Still Tuning/Learning/Adapting Hyperparameters? Standard Machine Learning procedures Stochastic approximation: w t = w t − 1 − η t ∇ f ( w t − 1 , x t , y t ) where f is convex in w . ■ How do you choose the learning rate η t ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  6. Wasn’t machine learning about learning automatically from data? ■ There is a history of 7 years of parameter-free algorithms that do not have learning rates nor regularizers to tune . ■ But they were very unintuitive and complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  7. One Coin to Rule Them All is equivalent to Online Coin betting algorithms give rise to optimal and parameter-free learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  8. Simple Algorithm & Good Results cpusmall dataset, absolute loss 16 SGD 14 KT-based ■ Parameter-free 12 Test loss ■ Extremely simple algorithm 10 ■ Same complexity of SGD 8 ■ Kernelizable 6 4 −1 0 1 2 3 10 10 10 10 10 Learning rate SGD See how at the poster! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend