Lecture 08: Ridge Regression, Equivalent Formulations and KKT - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . . Lecture 08: Ridge Regression, Equivalent Formulations and KKT Conditions Instructor: Prof. Ganesh Ramakrishnan February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . . 1 / 9

. . . . . . . . . . . . . . . . . Recap: Duality and KKT conditions For the previously mentioned formulation of the problem, KKT w primal February 4, 2016 . . . . . . . . . . . . . 2 / 9 . . . . . . . . . . conditions for all difgerentiable functions (i.e. f , g i , h j ) with ˆ optimal and ( ˆ λ, ˆ µ ) dual optimal point are: i =1 ˆ ∇ f ( ˆ w ) + ∑ m λ i ∇ g i ( ˆ w ) + ∑ p j =1 ˆ µ j ∇ h j ( ˆ w ) = 0 g i ( ˆ w ) ≤ 0; 1 ≤ i ≤ m ˆ λ i ≥ 0; 1 ≤ i ≤ m ˆ λ i g i ( ˆ w ) = 0; 1 ≤ i ≤ m h j ( ˆ w ) = 0; 1 ≤ j ≤ p

. . . . . . . . . . . . . . . . Solving we get, From the second KKT condition we get, From the third KKT condition, From the fourth condition February 4, 2016 . . . . . . . . . . . . . . . . . . . . . 3 / 9 . . . Bound on λ in the regularized least square solution To minimize the error function subject to constraint | w | ≤ ξ , we apply KKT conditions at the point of optimality w ∗ ∇ w ∗ ( f ( w ) + λ g ( w )) = 0 (the fjrst KKT condition). Here, f ( w ) = ( φ w − Y ) T ( φ w − Y ) and, g ( w ) = ∥ w ∥ 2 − ξ . w ∗ = ( φ T φ + λ I ) − 1 φ T y ∥ w ∗ ∥ 2 ≤ ξ λ ≥ 0 λ ∥ w ∗ ∥ 2 = λξ

. . . . . . . . . . . . . . . . . optimal solution. Consider, Using the triangle inequality we obtain, February 4, 2016 . . . . . . . . . . . . . . . . . . 4 / 9 . . . . . Bound on λ in the regularized least square solution Values of w ∗ and λ that satisfy all these equations would yield an ( φ T φ + λ I ) − 1 φ T y = w ∗ We multiply ( φ T φ + λ I ) on both sides and obtain, ∥ ( φ T φ ) w ∗ + ( λ I ) w ∗ ∥ = ∥ φ T y ∥ ∥ ( φ T φ ) w ∗ ∥ + ( λ ) ∥ w ∗ ∥ ≥ ∥ ( φ T φ ) w ∗ + ( λ I ) w ∗ ∥ = ∥ φ T y ∥

. . . . . . . . . . . . . . . . . in the previous equation, i.e. February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . 5 / 9 . Bound on λ in the regularized least square solution ∥ ( φ T φ ) w ∗ ∥ ≤ α ∥ w ∗ ∥ for some α for fjnite | ( φ T φ ) w ∗ ∥ . Substituting ( α + λ ) ∥ w ∗ ∥ ≥ ∥ φ T y ∥ λ ≥ ∥ φ T y ∥ ∥ w ∗ ∥ − α Note that when ∥ w ∗ ∥ → 0 , λ → ∞ . (Any intuition?) Using ∥ w ∗ ∥ 2 ≤ ξ we get, λ ≥ ∥ φ T y ∥ √ ξ − α This is not the exact solution of λ but the bound proves the existence of λ for some ξ and φ .

. . . . . . . . . . . . . . . . Alternative objective function earlier: This is equivalent to solving often referred to as Ridge regression . February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . . . 6 / 9 Substituting g ( w ) = ∥ w ∥ 2 − ξ , in the fjrst KKT equation considered ∇ w ∗ ( f ( w ) + λ · ( ∥ w ∥ 2 − ξ )) = 0 min ( ∥ Φ w − y ∥ 2 + λ ∥ w ∥ 2 ) for the same choice of λ . This form of regularized regression is

. . . . . . . . . . . . . . . . Regression so far Linear Regression: Ridge Regression: to reduce overfjtting on the training examples (we penalize model complexity) February 4, 2016 . . . . . . . . . . . . . . . . . . . . 7 / 9 . . . . ▶ y i = w ⊤ ϕ ( x i ) + b + ϵ i , where: y i ∈ R , and ϵ i is the error term i =1 ( y i − w ⊤ ϕ ( x i ) − b ) 2 ∑ n ▶ Objective: min w , b i =1 ( y i − w ⊤ ϕ ( x i ) − b ) 2 + λ ∥ w ∥ 2 ∑ n ▶ min w , b ▶ Here, regularization is applied on the linear regression objective

. . . . . . . . . . . . . . . . . Closed-form solutions to regression Linear regression and Ridge regression both have closed-form solutions February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . . 8 / 9 ▶ For linear regression, w ∗ = ( ϕ ⊤ ϕ ) − 1 ϕ ⊤ y ▶ For ridge regression, w ∗ = ( ϕ ⊤ ϕ + λ I ) − 1 ϕ ⊤ y (for linear regression, λ = 0 )

. . . . . . . . . . . . . . . Claim: Error obtained on training data after minimizing ridge regression regression Goal: Do well on unseen (test) data as well. Therefore, high training error might be acceptable if test error can be lower February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . 9 / 9 ≥ error obtained on training data after minimizing linear

Lecture 08: Ridge Regression, Equivalent Formulations and KKT - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . . Lecture 08: Ridge Regression, Equivalent Formulations and KKT Conditions Instructor: Prof. Ganesh Ramakrishnan February 4, 2016 . . . . . . . . . . . . . . . . . . . . . .

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Equivalent Circuits: Voltage R eq Thevenin Theorem Thevenin Equivalent Circuit

Mount Sutro Mount Sutro South Ridge & Edgewood Avenue South Ridge & Edgewood Avenue

Blue Ridge Blue Ridge $858,700,000 in new investment since 2010 Blue Ridge Anecdotal Market

Electrically Conductive Formulations Electrically Conductive Formulations Filled Nano Size Silver

Use of Alternative Use of Alternative Cement Cement Formulations Formulations Don Getzlaf

Use of Alternative Use of Alternative Cement Cement Formulations Formulations Don Getzlaf

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Equivalent Curves in Surfaces Anja Bankovi c University of Illinois Equivalent Curves Fix a

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

On Ridge Functions Allan Pinkus Technion September 23, 2013 Allan Pinkus (Technion) Ridge

Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007

Sketched Ridge Regression: Optimization and Statistical Perspectives Shusen Wang Alex Gittens

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Survey of Machine Learning Methods Pedro Rodriguez CU Boulder PhD Student in Large-Scale Machine

Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why

COMS 4721: Machine Learning for Data Science Lecture 4, 1/26/2017 Prof. John Paisley Department

POIR 613: Computational Social Science Pablo Barber a School of International Relations

Optimization MS Maths Big Data Alexandre Gramfort alexandre.gramfort@telecom-paristech.fr

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

Lecture 3: Kernel Regression Curse of Dimensionality Aykut Erdem February 2016 Hacettepe

5. Summary of linear regression so far Main points Model/function/predictor class of linear

Lecture 08: Ridge Regression, Equivalent Formulations and KKT - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . . Lecture 08: Ridge Regression, Equivalent Formulations and KKT Conditions Instructor: Prof. Ganesh Ramakrishnan February 4, 2016 . . . . . . . . . . . . . . . . . . . . . .

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Equivalent Circuits: Voltage R eq Thevenin Theorem Thevenin Equivalent Circuit

Mount Sutro Mount Sutro South Ridge &amp; Edgewood Avenue South Ridge &amp; Edgewood Avenue

Blue Ridge Blue Ridge $858,700,000 in new investment since 2010 Blue Ridge Anecdotal Market

Electrically Conductive Formulations Electrically Conductive Formulations Filled Nano Size Silver

Use of Alternative Use of Alternative Cement Cement Formulations Formulations Don Getzlaf

Use of Alternative Use of Alternative Cement Cement Formulations Formulations Don Getzlaf

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Equivalent Curves in Surfaces Anja Bankovi c University of Illinois Equivalent Curves Fix a

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

On Ridge Functions Allan Pinkus Technion September 23, 2013 Allan Pinkus (Technion) Ridge

Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007

Sketched Ridge Regression: Optimization and Statistical Perspectives Shusen Wang Alex Gittens

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Survey of Machine Learning Methods Pedro Rodriguez CU Boulder PhD Student in Large-Scale Machine

Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why

COMS 4721: Machine Learning for Data Science Lecture 4, 1/26/2017 Prof. John Paisley Department

POIR 613: Computational Social Science Pablo Barber a School of International Relations

Optimization MS Maths Big Data Alexandre Gramfort alexandre.gramfort@telecom-paristech.fr

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

Lecture 3: Kernel Regression Curse of Dimensionality Aykut Erdem February 2016 Hacettepe

5. Summary of linear regression so far Main points Model/function/predictor class of linear

Mount Sutro Mount Sutro South Ridge & Edgewood Avenue South Ridge & Edgewood Avenue