lecture 08 ridge regression equivalent formulations and
play

Lecture 08: Ridge Regression, Equivalent Formulations and KKT - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . . Lecture 08: Ridge Regression, Equivalent Formulations and KKT Conditions Instructor: Prof. Ganesh Ramakrishnan February 4, 2016 . . . . . . . . . . . . . . . . . . . . . .


  1. . . . . . . . . . . . . . . . . . Lecture 08: Ridge Regression, Equivalent Formulations and KKT Conditions Instructor: Prof. Ganesh Ramakrishnan February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . . 1 / 9

  2. . . . . . . . . . . . . . . . . . Recap: Duality and KKT conditions For the previously mentioned formulation of the problem, KKT w primal February 4, 2016 . . . . . . . . . . . . . 2 / 9 . . . . . . . . . . conditions for all difgerentiable functions (i.e. f , g i , h j ) with ˆ optimal and ( ˆ λ, ˆ µ ) dual optimal point are: i =1 ˆ ∇ f ( ˆ w ) + ∑ m λ i ∇ g i ( ˆ w ) + ∑ p j =1 ˆ µ j ∇ h j ( ˆ w ) = 0 g i ( ˆ w ) ≤ 0; 1 ≤ i ≤ m ˆ λ i ≥ 0; 1 ≤ i ≤ m ˆ λ i g i ( ˆ w ) = 0; 1 ≤ i ≤ m h j ( ˆ w ) = 0; 1 ≤ j ≤ p

  3. . . . . . . . . . . . . . . . . Solving we get, From the second KKT condition we get, From the third KKT condition, From the fourth condition February 4, 2016 . . . . . . . . . . . . . . . . . . . . . 3 / 9 . . . Bound on λ in the regularized least square solution To minimize the error function subject to constraint | w | ≤ ξ , we apply KKT conditions at the point of optimality w ∗ ∇ w ∗ ( f ( w ) + λ g ( w )) = 0 (the fjrst KKT condition). Here, f ( w ) = ( φ w − Y ) T ( φ w − Y ) and, g ( w ) = ∥ w ∥ 2 − ξ . w ∗ = ( φ T φ + λ I ) − 1 φ T y ∥ w ∗ ∥ 2 ≤ ξ λ ≥ 0 λ ∥ w ∗ ∥ 2 = λξ

  4. . . . . . . . . . . . . . . . . . optimal solution. Consider, Using the triangle inequality we obtain, February 4, 2016 . . . . . . . . . . . . . . . . . . 4 / 9 . . . . . Bound on λ in the regularized least square solution Values of w ∗ and λ that satisfy all these equations would yield an ( φ T φ + λ I ) − 1 φ T y = w ∗ We multiply ( φ T φ + λ I ) on both sides and obtain, ∥ ( φ T φ ) w ∗ + ( λ I ) w ∗ ∥ = ∥ φ T y ∥ ∥ ( φ T φ ) w ∗ ∥ + ( λ ) ∥ w ∗ ∥ ≥ ∥ ( φ T φ ) w ∗ + ( λ I ) w ∗ ∥ = ∥ φ T y ∥

  5. . . . . . . . . . . . . . . . . . in the previous equation, i.e. February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . 5 / 9 . Bound on λ in the regularized least square solution ∥ ( φ T φ ) w ∗ ∥ ≤ α ∥ w ∗ ∥ for some α for fjnite | ( φ T φ ) w ∗ ∥ . Substituting ( α + λ ) ∥ w ∗ ∥ ≥ ∥ φ T y ∥ λ ≥ ∥ φ T y ∥ ∥ w ∗ ∥ − α Note that when ∥ w ∗ ∥ → 0 , λ → ∞ . (Any intuition?) Using ∥ w ∗ ∥ 2 ≤ ξ we get, λ ≥ ∥ φ T y ∥ √ ξ − α This is not the exact solution of λ but the bound proves the existence of λ for some ξ and φ .

  6. . . . . . . . . . . . . . . . . Alternative objective function earlier: This is equivalent to solving often referred to as Ridge regression . February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . . . 6 / 9 Substituting g ( w ) = ∥ w ∥ 2 − ξ , in the fjrst KKT equation considered ∇ w ∗ ( f ( w ) + λ · ( ∥ w ∥ 2 − ξ )) = 0 min ( ∥ Φ w − y ∥ 2 + λ ∥ w ∥ 2 ) for the same choice of λ . This form of regularized regression is

  7. . . . . . . . . . . . . . . . . Regression so far Linear Regression: Ridge Regression: to reduce overfjtting on the training examples (we penalize model complexity) February 4, 2016 . . . . . . . . . . . . . . . . . . . . 7 / 9 . . . . ▶ y i = w ⊤ ϕ ( x i ) + b + ϵ i , where: y i ∈ R , and ϵ i is the error term i =1 ( y i − w ⊤ ϕ ( x i ) − b ) 2 ∑ n ▶ Objective: min w , b i =1 ( y i − w ⊤ ϕ ( x i ) − b ) 2 + λ ∥ w ∥ 2 ∑ n ▶ min w , b ▶ Here, regularization is applied on the linear regression objective

  8. . . . . . . . . . . . . . . . . . Closed-form solutions to regression Linear regression and Ridge regression both have closed-form solutions February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . . 8 / 9 ▶ For linear regression, w ∗ = ( ϕ ⊤ ϕ ) − 1 ϕ ⊤ y ▶ For ridge regression, w ∗ = ( ϕ ⊤ ϕ + λ I ) − 1 ϕ ⊤ y (for linear regression, λ = 0 )

  9. . . . . . . . . . . . . . . . Claim: Error obtained on training data after minimizing ridge regression regression Goal: Do well on unseen (test) data as well. Therefore, high training error might be acceptable if test error can be lower February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . 9 / 9 ≥ error obtained on training data after minimizing linear

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend