overview computational tricks
play

Overview Computational Tricks Objective Computationally efficient - PDF document

Overview Computational Tricks Objective Computationally efficient algorithms for quadratic regularisation Simon R uckinger Starting point A set of gene expression arrays X large p (columns), small n (rows). Seminar: Modeling,


  1. Overview Computational Tricks Objective Computationally efficient algorithms for quadratic regularisation Simon R¨ uckinger Starting point A set of gene expression arrays X → large p (columns), small n (rows). Seminar: Modeling, Simulation and Inference of Complex Output value y we want to predict from the expression values. Biological Systems (Dr. Korbinian Strimmer) Idea SoSe 06 Decompose X in a tricky way and reduce computational effort (Matrix Inversion). 9. Juni 2006 1 / 31 2 / 31 Table of contents Data Microarray data (p genes, n arrays)   x 11 . . . x 1 p . .  . .  X = 1 Motivation . .   x n 1 . . . x np 2 Singular value decomposition 3 Examples 4 Summary 3 / 31 4 / 31 What can we do with gene expression data? Predicting y via linear regression Output value Ordinary least squares Usually there exists an additional measure y for each array Linear regression could be the model of choice for predicting y as a (individual, time,...). This could be continuous measure. To get the model parameters we simply cancer class minimize the sum of squared differences between observed ( y i ) and predicted values ( x T biological species i β ). survival time n � ( y i − x T i β ) 2 any other quantitative/qualitative measure min β i =1 Making predictions Solution for classic linear regression A typical goal for a statistician would be to find a model which This leads to the well-known solution shows the association between gene expression and the output value y ˆ β = ( X T X ) − 1 X T y is able to make further predictions for y . 5 / 31 6 / 31 Linear regression does not work for p > n Quadratic Penalization Ridge Regression (Hoerl et al. , 1970) Normal equations A classic attempt for the p > n situation is to add a quadratic When p > n , the so-called normal equations penalty to the OLS-criteron. X T X β = X T y n � i β ) 2 + λβ T β ( y i − x T min which lead to β ˆ β = ( X T X ) − 1 X T y i =1 which leads to the solution do not have a unique solution for β . Instead they provide infinitely many solutions which ˆ β = ( X T X + λ I ) − 1 X T y all fit the training data perfectly usually are not able to make reliable predictions. Benefits In other words: ( X T X ) is not invertible. ( X T X + λ I ) is an invertible matrix. We get a unique solution for β . 7 / 31 8 / 31

  2. Expensive Computation Table of contents Matrix Inversion 1 Motivation ( X T X + λ I ) is a p × p matrix. Thus the computational cost of inverting this matrix is O ( p 3 ) operations. 2 Singular value decomposition p is large Typical expression arrays include between 1,000 and 20,000 genes. For 10,000 genes it means we have to invert a 3 Examples 10 , 000 × 10 , 000-matrix. This involves O (10 , 000 3 ) operations. 4 Summary How long does it take? Even on a modern PC this is a matter of hours or even days. 9 / 31 10 / 31 Getting ’tricky’ Getting ’tricky’ Singular value decomposition (SVD) Singular value decomposition (SVD) Every Matrix X ∈ R p × n can be decomposited in the following way Every Matrix X ∈ R p × n can be decomposited in the following way X = UDV T = RV T X = UDV T = RV T T T         R         � �� �                 d 1 ... d 1 ...         V V X = U X = U                     d n d n         � �� � � �� � � �� � � �� � � �� � � �� �     nxp nxn nxp nxn nxn nxn         � �� � � �� � pxn pxn 11 / 31 11 / 31 Applying SVD to ridge regression About Singular Value Decomposition Another solution for ridge regression Computational costs Plugging X = RV T into Getting the parameters for ridges regression via the SVD trick involves − 1 X T y ˆ β = ( X T X + λ I ) reducing p variables to n variables by SVD → O ( pn 2 ) � �� � pxp solving n dimensional ridge regression → O ( n 3 ) transforming back to p dimensions → O ( np ) one gets − 1 R T y ˆ β = V ( R T R + λ I ) Thus it saves inverting a p × p matrix with O ( p 3 ) operations. � �� � nxn Software Conclusion SVD is a standard linear algebra tool. It is implemented in most We can get the parameters for ridge regression by mathematical languages. In R applying SVD to X svd() fast.svd() → faster for small n, large p using R as a ’data-matrix’ ( library(corpcor) ) multiplying β by V 12 / 31 13 / 31 Applying SVD to other models I Applying SVD to other models II Linear predictors Quadratic regularization Many models ’connect’ the variables with the outcome through a Like linear regression, all models using a linear predictor and some linear predictor . E.g. kind of loss function don’t work properly when p ≫ n . logistic regression This can be fixed by adding a quadratic penalty linear discriminant analysis n � L ( y i , β 0 + x T i β ) + λβ T β. support-vector machines min β 0 ,β i =1 Loss functions The good news is The parameters are estimated by minimizing a loss function � n i =1 L ( y i , β 0 + x T The SVD trick can be used with all models in the above i β ) which can be the mentioned form. squared error All aspects of model evaluation can be performed in this negative log-likelihood reduced space. etc... 14 / 31 15 / 31

  3. Reduced space computations Cross-validation I Theorem 1 Let X = RV T be the SVD of X, and denote by r i the i th row of R, Where to get λ from? a vector of n predictor values for the ith observation. Consider the The regularization parameter λ is often selected by k-fold pair of optimization problems: cross-validation. n � ( ˆ β 0 , ˆ How does cross-validation work? L ( y i , β 0 + x T i β ) + λβ T β β ) = argmin β 0 ,β ∈ R p divide the training data into k groups of size n i =1 k fit the model to k − 1 and test it on 1 k of the data n k � ( ˆ θ 0 , ˆ L ( y i , θ 0 + r T i θ ) + λθ T θ θ ) = argmin repeat this k seperate times θ 0 ,θ ∈ R n i =1 average the results Then ˆ β 0 = ˆ θ 0 , and ˆ β = V ˆ θ . This is done for a series of values for λ , and a preferred value is chosen. Proof see Hastie et al. , 2004 16 / 31 17 / 31 Cross-validation II Illustration / Eigengenes X → R Corollary 1 The entire model-selection process via cross-validation can be performed using a single reduced data set R . Hence, when we perform cross-validation, we simply sample from the rows of R . Proof Cross-validation relies on predictions x T β , which are equivariant under orthogonal rotations. ✷ Eigengenes If the columns of X are centered, the columns or R are the principal components of X also called eigengenes . 18 / 31 19 / 31 Derivatives I Derivatives II Corollary 2 1st/2nd derivate Define If the loss function is based on a log-likelihood we can: n n � � L ( y i , β 0 + x T L ( y i , β 0 + r T L ( β ) = i β ) , L ( θ ) = i θ ) . perform score tests with its first derivative i =1 i =1 gain variances of the parameters from its second derivative (information matrix / hessian matrix) Then with θ = V T β L ( β ) = L ( θ ). ⇒ If L is differentiable, then Expensive computation ∂ L ( β ) = V ∂ L ( θ ) Differentation in the p-dimensional space is a waste of time. ∂θ ; ∂β SVD once again ∂ 2 L ( β ) ∂β∂β T = V ∂ 2 L ( θ ) ∂θ∂θ T V T , SVD helps us to obtain the p-dimensional derivates from the n-dimensional versions. with the partial derivatives in the right-hand side evaluated at θ = V T β 20 / 31 21 / 31 Derivatives III Parenthesis: Woodbury matrix identity Matrix inversion lemma ( A + UCV ) − 1 = A − 1 − A − 1 U ( C − 1 + VA − 1 U ) − 1 VA − 1 Proof of Corollary 2 where The euqivalence A is p × p L ( β ) = L ( θ ) U is p × n follows immediately from the identity C is n × n X = RV T , V is n × p and the fact that x T and r T are the i th rows of X and R . i i Link to SVD The derivatives are simple applications of the chain rule to For L ( β ) = L ( θ ). ✷ A = λ I UCV = X T X = ( VDU T )( UDV T ) = VD 2 V T we see the coherency with SVD. 22 / 31 23 / 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend