Regularization Regularization is a general approach to add a - PowerPoint PPT Presentation

Regularization ◮ Regularization is a general approach to add a “complexity parameter” to a learning algorithm. Requires that the model parameters be continuous. (i.e., Regression OK, IAML: Regularization and Ridge Regression Decision trees not.) ◮ If we penalize polynomials that have large values for their coefficients we will get less wiggly solutions Nigel Goddard E ( w ) = | y − Φ w | 2 + λ | w | 2 ˜ School of Informatics ◮ Solution is w = (Φ T Φ + λ I ) − 1 Φ T y ˆ Semester 1 ◮ This is known as ridge regression ◮ Rather than using a discrete control parameter like M (model order) we can use a continuous parameter λ ◮ Caution: Don’t shrink the bias term! (The one that corresponds to the all 1 feature.) 1 / 12 2 / 12 The effect of regularization for M = 9 Regularized Loss Function ln λ = − 18 1 M = 9 1 t t ◮ The overall cost function is the 0 0 sum of two parabolic bowls. The sum is also a parabolic −1 −1 bowl. ◮ The combined minimum lies 0 1 0 1 x x on the line between the minimum of the squared error 1 ln λ = 0 and the origin. t ◮ The regularizer just shrinks 0 the weights. −1 0 1 Credit: Geoff Hinton x Figure credit: Chris Bishop, PRML 3 / 12 4 / 12

M = 9 1 Training For standard linear regression, we had Test ◮ Define the task : regression E RMS 0.5 ◮ Decide on the model structure : linear regression model ◮ Decide on the score function : squared error (likelihood) ◮ Decide on optimization/search method to optimize the score function: calculus (analytic solution) 0 −35 −30 −25 −20 ln λ Chris Bishop, PRML 5 / 12 6 / 12 A Control-Parameter-Setting Procedure But with ridge regression we have ◮ Define the task : regression ◮ Regularization was a way of adding a “capacity control” ◮ Decide on the model structure : linear regression model parameter. ◮ Decide on the score function : squared error with ◮ But how do we set the value? e.g., the regularization quadratic regularizaton parameter λ ◮ Decide on optimization/search method to optimize the ◮ Won’t work to do it on the training set (why not?) score function: calculus (analytic solution) ◮ Two choices to consider: ◮ Validation set Notice how you can train the same model structure with ◮ Cross-validation different score functions. This is the first time we have seen this. This is important. 7 / 12 8 / 12

Using a validation set Example of using a validation set Consider polynomial regression: ◮ Split the labelled data into a training set, validation set, and 1. For each m = 1 , 2 , . . . M (you choose M in advance a test set. 2. Train the polynomial regression using ◮ Training set: Use for training φ ( x ) = ( 1 , x , x 2 , . . . , x m ) T on training set (e.g., by ◮ Validation set: Tune the “control parameters” according to minimizing squared error). This produces a predictor f m ( x ) . performance on the validation set 3. Measure the error of f m on the validation set ◮ Test set: to check how the final model performs 4. End for ◮ No right answers, but for example, could choose 60% 5. Choose the f m with the best validation error. training, 20% validation, 20% test 6. Measure the error of f m on the test set to see how well you should expect it to perform 9 / 12 10 / 12 Continuous Control Parameters Continuous Control Parameters ◮ For a discrete control parameter like polynomial order m ◮ For a discrete control parameter like polynomial order m we could simply search all values. we could simply search all values. ◮ What about a quadratic regularization parameter λ . What ◮ What about a quadratic regularization parameter λ . What do we do then? do we do then? ◮ Pick a grid of values to search. In practice you want the grid to vary geometrically for this sort of parameter. e.g., Try λ ∈ { 0 . 01 , 0 . 1 , 0 . 5 , 1 . 0 , 5 . 0 , 10 . 0 } . Don’t bother trying 2 . 0, 3 . 0, 7 . 0. 11 / 12 12 / 12

Regularization Regularization is a general approach to add a - PowerPoint PPT Presentation

Regularization Regularization is a general approach to add a complexity parameter to a learning algorithm. Requires that the model parameters be continuous. (i.e., Regression OK, IAML: Regularization and Ridge Regression Decision trees

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

LIC-Based Regularization of Multi-Valued Images David Tschumperl CNRS UMR 6072 (GREYC/ENSICAEN)

Regularization of optimal control problems Daniel Wachsmuth (RICAM Linz) joint work with Gerd

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

10. Regularization More on tradeoffs Regularization Effect of using different norms

Regularization Methods for System Identification Input Design Biqiang MU Academy of Mathematics

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley

Regularization of ill-posed problems Uno H amarik University of Tartu, Estonia Content 1.

Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, 2011 L. Rosasco Manifold

Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented

The Attention Economy What is the attention economy? A business model where you (as the

Forces and Motion Click on the topic to go to that section Motion Motion Graphs of Motion

TD TDDD89 2018 2018 Academic Writing Pamela Vang IEI facksprk What is academic writing?

ENS Pierre-Gilles de Gennes, Gabriel Lipp- mann, Louis Nel, Jean-Baptiste Per- rin, Paul

Bishops Stortford Bowling Club 22 Matches representing Club B Bowls Men: Top Club,

Specification of Dynamic Strategy Switching Soumya Paul Joint work with R. Ramanujam and S. Simon

Math 1710 Class 11 Normal Approximation Accurate Prop CLT Dr. Allen Back Failure Picture A

Update in Hospital Medicine 2016 VS. Brad Sharpe, MD UCSF Division of Hospital Medicine Update

Regularization Regularization is a general approach to add a - PowerPoint PPT Presentation

Regularization Regularization is a general approach to add a complexity parameter to a learning algorithm. Requires that the model parameters be continuous. (i.e., Regression OK, IAML: Regularization and Ridge Regression Decision trees

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

LIC-Based Regularization of Multi-Valued Images David Tschumperl CNRS UMR 6072 (GREYC/ENSICAEN)

Regularization of optimal control problems Daniel Wachsmuth (RICAM Linz) joint work with Gerd

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

10. Regularization More on tradeoffs Regularization Effect of using different norms

Regularization Methods for System Identification Input Design Biqiang MU Academy of Mathematics

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley

Regularization of ill-posed problems Uno H amarik University of Tartu, Estonia Content 1.

Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, 2011 L. Rosasco Manifold

Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented

The Attention Economy What is the attention economy? A business model where you (as the

Forces and Motion Click on the topic to go to that section Motion Motion Graphs of Motion

TD TDDD89 2018 2018 Academic Writing Pamela Vang IEI facksprk What is academic writing?

ENS Pierre-Gilles de Gennes, Gabriel Lipp- mann, Louis Nel, Jean-Baptiste Per- rin, Paul

Bishops Stortford Bowling Club 22 Matches representing Club B Bowls Men: Top Club,

Specification of Dynamic Strategy Switching Soumya Paul Joint work with R. Ramanujam and S. Simon

Math 1710 Class 11 Normal Approximation Accurate Prop CLT Dr. Allen Back Failure Picture A

Update in Hospital Medicine 2016 VS. Brad Sharpe, MD UCSF Division of Hospital Medicine Update

Regularization Overview Regularization Overview Problems & Multicollinearity We will