10 regularization
play

10. Regularization More on tradeoffs Regularization Effect of - PowerPoint PPT Presentation

CS/ECE/ISyE 524 Introduction to Optimization Spring 201718 10. Regularization More on tradeoffs Regularization Effect of using different norms Example: hovercraft revisited Laurent Lessard (www.laurentlessard.com) Review of


  1. CS/ECE/ISyE 524 Introduction to Optimization Spring 2017–18 10. Regularization ❼ More on tradeoffs ❼ Regularization ❼ Effect of using different norms ❼ Example: hovercraft revisited Laurent Lessard (www.laurentlessard.com)

  2. Review of tradeoffs Recap of tradeoffs: ❼ We want to make both J 1 ( x ) and J 2 ( x ) small subject to constraints. ❼ Choose a parameter λ > 0, solve minimize J 1 ( x ) + λ J 2 ( x ) x subject to: constraints ❼ Each λ > 0 yields a solution ˆ x λ . ❼ Can visualize tradeoff by plotting J 2 (ˆ x λ ) vs J 1 (ˆ x λ ). This is called the Pareto curve . 10-2

  3. Multi-objective tradeoff ❼ Similar procedure if we have more than two costs we’d like to make small, e.g. J 1 , J 2 , J 3 ❼ Choose parameters λ > 0 and µ > 0. Then solve: minimize J 1 ( x ) + λ J 2 ( x ) + µ J 3 ( x ) x subject to: constraints ❼ Each λ > 0 and µ > 0 yields a solution ˆ x λ,µ . ❼ Can visualize tradeoff by plotting J 3 (ˆ x λ,µ ) vs J 2 (ˆ x λ,µ ) vs J 1 (ˆ x λ,µ ) on a 3D plot. You then obtain a Pareto surface . 10-3

  4. Minimum-norm as a regularization ❼ When Ax = b is underdetermined ( A is wide), we can resolve ambiguity by adding a cost function, e.g. min-norm LS: � x � 2 minimize x subject to: Ax = b ❼ Alternative approach: express it as a tradeoff! � Ax − b � 2 + λ � x � 2 minimize x Tradeoffs of this type are called regularization and λ is called the regularization parameter or regularization weight ❼ If we let λ → ∞ , we just obtain ˆ x = 0 ❼ If we let λ → 0, we obtain the minimum-norm solution! 10-4

  5. Proof of minimum-norm equivalence � Ax − b � 2 + λ � x � 2 minimize x Equivalent to the least squares problem: � A 2 � � � b �� √ � � minimize x − � � 0 λ I x � � Solution is found via pseudoinverse (for tall matrix) �� A � T � A �� − 1 � A � T � b � √ √ √ x = ˆ 0 λ I λ I λ I = ( A T A + λ I ) − 1 A T b 10-5

  6. Proof of minimum-norm equivalence Solution of 2-norm regularization is: x = ( A T A + λ I ) − 1 A T b ˆ ❼ Can’t simply set λ → 0 because A is wide , and therefore A T A will not be invertible. ❼ Use the fact that: A T AA T + λ A T can be factored two ways: ( A T A + λ I ) A T = A T AA T + λ A T = A T ( AA T + λ I ) ( A T A + λ I ) A T = A T ( AA T + λ I ) A T ( AA T + λ I ) − 1 = ( A T A + λ I ) − 1 A T 10-6

  7. Proof of minimum-norm equivalence Solution of 2-norm regularization is: x = ( A T A + λ I ) − 1 A T b ˆ Also equal to: x = A T ( AA T + λ I ) − 1 b ˆ ❼ Since AA T is invertible, we can take the limit λ → 0 by just setting λ = 0. x = A T ( AA T ) − 1 b . This is the exact solution to ❼ In the limit: ˆ the minimum-norm least squares problem we found before! 10-7

  8. Tradeoff visualization � Ax − b � 2 + λ � x � 2 minimize x λ → 0 � 0 , � A † b � 2 � � x � 2 λ → ∞ � � b � 2 , 0 � � Ax − b � 2 10-8

  9. Regularization Regularization: Additional penalty term added to the cost function to encourage a solution with desirable properties. Regularized least squares: � Ax − b � 2 + λ R ( x ) minimize x ❼ R ( x ) is the regularizer (penalty function) ❼ λ is the regularization parameter ❼ The model has different names depending on R ( x ). 10-9

  10. Regularization � Ax − b � 2 + λ R ( x ) minimize x 1. If R ( x ) = � x � 2 = x 2 1 + x 2 2 + · · · + x 2 n It is called: L 2 regularization , Tikhonov regularization , or Ridge regression depending on the application. It has the effect of smoothing the solution. 2. If R ( x ) = � x � 1 = | x 1 | + | x 2 | + · · · + | x n | It is called: L 1 regularization or LASSO . It has the effect of sparsifying the solution (ˆ x will have few nonzero entries). 3. R ( x ) = � x � ∞ = max {| x 1 | , | x 2 | , . . . , | x n |} It is called L ∞ regularization and it has the effect of equalizing the solution (makes most components equal). 10-10

  11. Norm balls For a norm �·� p , the norm ball of radius r is the set: B r = { x ∈ R n | � x � p ≤ r } 1.5 1.5 1.5 1.0 1.0 1.0 0.5 0.5 0.5 - 1.5 - 1.0 - 0.5 0.5 1.0 1.5 - 1.5 - 1.0 - 0.5 0.5 1.0 1.5 - 1.5 - 1.0 - 0.5 0.5 1.0 1.5 - 0.5 - 0.5 - 0.5 - 1.0 - 1.0 - 1.0 - 1.5 - 1.5 - 1.5 � x � 2 ≤ 1 � x � 1 ≤ 1 � x � ∞ ≤ 1 x 2 + y 2 ≤ 1 | x | + | y | ≤ 1 max {| x | , | y |} ≤ 1 10-11

  12. Simple example Consider the minimum-norm problem for different norms: minimize � x � p x subject to: Ax = b ❼ set of solutions to Ax = b 2.5 is an affine subspace 2.0 x 1.5 ❼ solution is point belonging 1.0 to smallest norm ball 0.5 - 1 1 2 3 4 ❼ for p = 2, this occurs at - 0.5 the perpendicular distance 10-12

  13. Simple example 2.5 x 2.0 ❼ for p = 1, this occurs at 1.5 one of the axes. 1.0 0.5 ❼ sparsifying behavior - 1 1 2 3 4 - 0.5 2.5 ❼ for p = ∞ , this occurs at 2.0 1.5 x equal values of 1.0 coordinates 0.5 ❼ equalizing behavior - 1 1 2 3 4 - 0.5 10-13

  14. Another simple example Suppose we have data points { y 1 , . . . , y m } ⊂ R , and we would like to find the best estimator for the data, according to different norms. Suppose data is sorted: y 1 ≤ · · · ≤ y m . �     � y 1 x � � . . � � . . minimize  − �  .   .  � �    � x � � y m x � � p x = 1 ❼ p = 2: ˆ m ( y 1 + · · · + y m ). This is the mean of the data. ❼ p = 1: ˆ x = y ⌈ m / 2 ⌉ . This is the median of the data. x = 1 ❼ p = ∞ : ˆ 2 ( y 1 + y m ). This is the mid-range of the data. Julia demo: Data Norm.ipynb 10-14

  15. Example: hovercraft revisited One-dimensional version of the hovercraft problem: ❼ Start at x 1 = 0 with v 1 = 0 (at rest at position zero) ❼ Finish at x 50 = 100 with v 50 = 0 (at rest at position 100) ❼ Same simple dynamics as before: x t +1 = x t + v t for: t = 1 , 2 , . . . , 49 v t +1 = v t + u t ❼ Decide thruster inputs u 1 , u 2 , . . . , u 49 . ❼ This time: minimize � u � p 10-15

  16. Example: hovercraft revisited minimize � u � p x t , v t , u t subject to: x t +1 = x t + v t for t = 1 , . . . , 49 v t +1 = v t + u t for t = 1 , . . . , 49 x 1 = 0 , x 50 = 100 v 1 = 0 , v 50 = 0 ❼ This model has 150 variables, but very easy to understand. ❼ We can simplify the model considerably... 10-16

  17. Model simplification x t +1 = x t + v t for: t = 1 , 2 , . . . , 49 v t +1 = v t + u t v 50 = v 49 + u 49 = v 48 + u 48 + u 49 = . . . = v 1 + ( u 1 + u 2 + · · · + u 49 ) 10-17

  18. Model simplification x t +1 = x t + v t for: t = 1 , 2 , . . . , 49 v t +1 = v t + u t x 50 = x 49 + v 49 = x 48 + 2 v 48 + u 48 = x 47 + 3 v 47 + 2 u 47 + u 48 = . . . = x 1 + 49 v 1 + (48 u 1 + 47 u 2 + · · · + 2 u 47 + u 48 ) 10-18

  19. Model simplification x t +1 = x t + v t for: t = 1 , 2 , . . . , 49 v t +1 = v t + u t Constraint can be rewritten as:   u 1 u 2 � 48 � � x 50 − x 1 − 49 v 1 � 47 2 1 0 . . .    =  .  . 1 1 1 1 1 v 50 − v 1 . . .   .  u 49 so we don’t need the intermediate variables x t and v t ! Julia demo: Hover 1D.ipynb 10-19

  20. Results 1. Minimizing � u � 2 2 (smooth) 0.3 0.2 0.1 Thrust 0.0 0.1 0.2 0.3 0 10 20 30 40 50 Time 2. Minimizing � u � 1 (sparse) 3 2 1 Thrust 0 1 2 3 0 10 20 30 40 50 Time 3. Minimizing � u � ∞ (equalized) 0.20 0.15 0.10 0.05 Thrust 0.00 0.05 0.10 0.15 0.20 0 10 20 30 40 50 10-20 Time

  21. Tradeoff studies 1. Minimizing � u � 2 2 + λ � u � 1 (smooth and sparse) 0.4 0.2 Thrust 0.0 0.2 0.4 0 10 20 30 40 50 Time 2. Minimizing � u � ∞ + λ � u � 1 (equalized and sparse) 0.6 0.4 0.2 Thrust 0.0 0.2 0.4 0.6 0 10 20 30 40 50 Time 3. Minimizing � u � 2 2 + λ � u � ∞ (equalized and smooth) 0.3 0.2 0.1 Thrust 0.0 0.1 0.2 0.3 0 10 20 30 40 50 10-21 Time

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend