regularization overview regularization overview problems
play

Regularization Overview Regularization Overview Problems & - PowerPoint PPT Presentation

Regularization Overview Regularization Overview Problems & Multicollinearity We will discuss three popular methods for obtaining better estimates of the linear model coefficients Regularization Techniques Principal


  1. Regularization Overview Regularization Overview • Problems & Multicollinearity • We will discuss three popular methods for obtaining “better” estimates of the linear model coefficients • Regularization Techniques – Principal components regression • Principal Components Analysis – Ridge regression • Principal Components Regression – Stepwise regression • Ridge Regression • These methods generate biased estimates • Stepwise Regression • Nonetheless, they may be more accurate if • Cross-validation Error – The data is strongly collinear – p is close to n J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 1 J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 2 Multicollinearity Multicollinearity Continued • In many real applications, the model input variables are not • For example, suppose our statistical model is independent of one another y = 3 x 1 + 2 x 2 + ε • Like scaling, if they are closely related to one another the matrix inverse A T A may be ill-conditioned • If x 1 = 2 x 2 (perfectly correlated), then this statistical model has many equivalent representations • This is similar to dividing by a very small number y = 3 x 1 + 2 x 2 + ε • This can cause very large model coefficients and ultimately unstable predictions y = 4 x 1 + ε • This problem occurs if two or more inputs have a linear y = 2 x 1 + 4 x 2 + ε relationship to one another: • The data cannot tell us which one of these models is correct � x i ≈ α j x j • There are a number of measures that can be taken to reduce this j � = i effect for some coefficients α j • We will discuss four of them • Generally, this problem is called multicollinearity J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 3 J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 4

  2. Example 1: Multicollinearity Singular Value Decomposition n × p V T N = 20; n × p = U A Σ x1 = rand(N,1); n × n p × p x2 = 5*x1; om = [-1 2 3]’; % True process coefficients • The A matrix can be decomposed as a product of three different matrices A = [ones(N,1) x1 x2]; y = A*om + 0.1*randn(N,1); % Statistical model • U and V are unitary matrices b = y; w = inv(A’*A)*A’*b % Regression model coefficients U T U = I n × n = UU T V T V = I p × p = V V T • Σ is a diagonal matrix This returns ⎡ 0 0 ⎤ σ 1 . . . Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 3.801412e-018. 0 σ 2 . . . 0 > In Multicollinearity at 11 ⎢ ⎥ ⎢ . . . ⎥ ... . . . ⎢ ⎥ . . . w = ⎢ ⎥ � � Σ + -1.0088 ⎢ ⎥ n × p = 0 0 = Σ . . . σ p ⎢ ⎥ 0 31.8756 ⎢ ⎥ 0 0 . . . 0 -1.0408 ⎢ ⎥ . . . ⎢ ... ⎥ . . . ⎢ ⎥ . . . ⎣ ⎦ 0 0 . . . 0 J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 5 J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 6 Singular Value Decomposition Continued 1 Singular Value Decomposition & PCA V T • The matrix U can be written as n × p = U + A Σ + p × p n × p p × p � � • The V matrix can be written in terms of its column vectors U = U + U − n × p n × ( n − p ) ⎡ ⎤ | | | p × p = V v 1 v 2 . . . v p • This enables us to decompose the A matrix slightly differently ⎣ ⎦ | | | n × p V T V T n × p = U p × p = U + A Σ Σ + • The square of the singular values ( σ 2 n × n p × p i ) represents the 2nd moment n × p p × p of the data along projections of A onto the vectors v i • The elements along the diagonal of Σ + are called the singular • The input vectors are rotated to the directions that maximize the values of A estimated second moment of the projected data • They are nonnegative || Av 1 || 2 = ( Av 1 ) T ( Av 1 ) = v 1 A T Av 1 v 1 = argmax • Usually they are ordered such that v T 1 v 1 =1 σ 1 ≥ σ 2 ≥ σ 3 ≥ · · · ≥ σ p ≥ 0 • Locating these vectors and their projected variances is called principal components analysis J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 7 J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 8

  3. Example 2: PCA without Centering Example 2: MATLAB Code Principal Components Analysis Without Centering function [] = PCACentering (); %clear; 0.9 rand(’state ’ ,8); 0.8 randn(’state ’ ,11); 0.7 NP = 100; % Number of points x1 = 0.08*randn(NP ,1); % Input 1 0.6 x2 = -x1 + 0.04*randn(NP ,1); % Input 2 x1 = x1 + .5; 0.5 x2 = x2 + .5; x 2 0.4 A = [x1 x2 ones(NP ,1)]; [U,S,V] = svd(A); % Singular Value Decomposition 0.3 V(: ,1) = -V(: ,1); 0.2 figure; FigureSet (1 ,5 ,5); 0.1 ax = axes(’Position ’ ,[0.1 0.1 0.8 0.8 ]); h = plot(x1 ,x2 ,’r.’); set(h,’MarkerSize ’ ,6); 0 hold on; xlim ([-0.10 1.00 ]); −0.1 ylim ([-0.10 1.00 ]); −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 x 1 AxisLines; % Function in my collection J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 9 J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 10 p1 = [0 0]; % Starting point p1 = xc; p2 = V(1:2 ,1)*S(1 ,1)/15; % Ending point p2 = xc + V(: ,1)*S(1 ,1)/2; h = DrawArrow ([p1 (1) p2 (1)] ,[ p1 (2) p2 (2)]); % Function in my collection h = DrawArrow ([p1 (1) p2 (1)] ,[ p1 (2) p2 (2)]); set(h,’HeadStyle ’,’plain ’); set(h,’HeadStyle ’,’plain ’); p1 = [0 0]; p1 = xc; p2 = V(1:2 ,2)*S(2 ,2)/15; p2 = xc + V(: ,2)*S(2 ,2)/2; h = DrawArrow ([p1 (1) p2 (1)] ,[ p1 (2) p2 (2)]); h = DrawArrow ([p1 (1) p2 (1)] ,[ p1 (2) p2 (2)]); set(h,’HeadStyle ’,’plain ’); set(h,’HeadStyle ’,’plain ’); hold off; hold off; xlabel(’x_1 ’); axis(’square ’); ylabel(’x_2 ’); xlabel(’x_1 ’); title(’Principal Components Analysis Without Centering ’); ylabel(’x_2 ’); AxisSet (8); title(’Principal Components Analysis With Centering ’); print -depsc PCAUncentered.eps ; AxisSet (8); print -depsc PCACentered.eps ; x1c = mean(x1); % Find the average ( center) of x1 x2c = mean(x2); % Find the average ( center) of x2 xc = [x1c x2c]’; % Collect into a vector A = [x1 -x1c x2 -x2c ]; % Recreate the A matrix [U,S,V] = svd(A); figure; FigureSet (1 ,5 ,5); ax = axes(’Position ’ ,[0.1 0.1 0.8 0.8 ]); h = plot(x1 ,x2 ,’r.’); set(h,’MarkerSize ’ ,6); hold on; xlim ([-0.10 1.00 ]); ylim ([-0.10 1.00 ]); AxisLines; J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 11 J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 12

  4. Principal Components Analysis Example 3: PCA With Centering • In general, finding the directions of maximum variance is more Principal Components Analysis With Centering useful than finding the directions that maximize the second 0.9 moment 0.8 • This can be achieved by subtracting the average from all of the input vectors 0.7 ⎡ ⎤ | | | 0.6 A ′ x ′ x ′ x ′ n × ( p − 1) = . . . 1 2 p − 1 0.5 ⎣ ⎦ x 2 | | | 0.4 where x ′ i = x i − ¯ x i 0.3 • If A ′ is decomposed as A ′ = U + Σ + V T , then 0.2 σ 2 σ 2 1 = var( Av 1 ) 2 = var( Av 2 ) . . . 0.1 0 • Note that the column of ones is omitted from A ′ −0.1 • The vectors v i now represent the directions of maximum variance −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 x 1 J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 13 J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 14 PCA & SVD PCA & SVD Summary A = U + Σ + V T A T A = V T Λ V A = U + Σ + V T A T A = V T Λ V p p • Often PCA is calculated using eigenvalues and eigenvectors � � u i σ i v T u i v T � � A = i = σ i i instead of singular value decomposition i =1 i =1 • It can be shown that • A can be expressed as a sum of p rank-1 matrices A T A = V T Λ V • PCA is useful for compression where ⎡ ⎤ λ 1 0 . . . 0 • If most of the variance is captured by the first few principal 0 λ 2 . . . 0 ⎢ ⎥ components, then we can omit the other components with p × p = Λ ⎢ . . ⎥ ... . . ⎢ ⎥ minimal loss of information . . 0 ⎣ ⎦ 0 0 . . . λ p • Just truncate the sum to get an approximation of A • This is the same V matrix as computed using SVD on A ρ � u i v T � � A ≈ σ i • The eigenvalues are related to the singular values by λ i = σ 2 i i i =1 for some ρ < p J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 15 J. McNames Portland State University ECE 4/557 Regularization Ver. 1.27 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend