big data lecture 2 high dimensional regression with the
play

Big Data - Lecture 2 High dimensional regression with the Lasso S. - PowerPoint PPT Presentation

Introduction Sparse High Dimensional Regression Lasso estimation Application Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014 S. Gadat Big Data - Lecture 2 Introduction Sparse High


  1. Introduction Sparse High Dimensional Regression Lasso estimation Application Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014 S. Gadat Big Data - Lecture 2

  2. Introduction Sparse High Dimensional Regression Lasso estimation Application Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014 S. Gadat Big Data - Lecture 2

  3. Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff Schedule 1 Introduction Motivation Trouble with large dimension Goals Important balance: bias-variance tradeoff 2 Sparse High Dimensional Regression Sparsity Inducing sparsity 3 Lasso estimation Lasso Estimator Solving the lasso - MM method Statistical results 4 Application S. Gadat Big Data - Lecture 2

  4. Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - Linear Model In a standard linear model, we have at our disposal ( X i , Y i ) supposed to be linked with Y i = X t i θ 0 + ǫ i , 1 ≤ i ≤ n. We aim to recover the unknown θ 0 . Generically, ( ǫ i ) 1 ≤ i ≤ n is assumed to be i.i.d. replications of a centered and squared integrale noise E [ ǫ 2 ] < ∞ E [ ǫ ] = 0 From a statistical point of view, we expect to find among the p variables that describe X important ones. Typical example: Y i expression level of one gene on sample i X i = ( X i, 1 , . . . , X i,p ) biological signal (DNA micro-arrays) observed on sample i Discover a cognitive link between DNA and the gene expression level. S. Gadat Big Data - Lecture 2

  5. Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - Micro-array analysis - Biological datasets One measures micro-array datasets built from a huge amount of profile genes expression. Number of genes p (of order thousands). Number of samples n (of order hundred). Diagnostic help: healthy or ill? Select among the genes meaningful elements? Find an algorithm with good prediction of the response? S. Gadat Big Data - Lecture 2

  6. Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - Linear Model From a matricial point of view, the linear model can we written as follows: Y ∈ R n , X ∈ M n,p ( R ) , θ 0 ∈ R p Y = Xθ 0 + ǫ, In this lecture, we will consider situations where p varies (typically increases) with n . S. Gadat Big Data - Lecture 2

  7. Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - Linear Model Standard approach: n >> p The M.L.E. in the Gaussian case is the Least Squares Estimator: β ∈ R p � Y − Xβ � 2 ˆ θ n := arg min 2 , given by θ n = ( X t X ) − 1 X t Y ˆ Proposition ˆ θ n is an unbiased estimator of θ 0 such that � X ( θn − θ 0) � 2 If ǫ ∼ N (0 , σ 2 ) : 2 ∼ χ 2 σ 2 p � � � X ( θ n − θ 0 ) � 2 = σ 2 p 2 E n n � X ( θn − θ 0) � 2 is generally neglictible comparing to σ 2 p 2 Most of the time, n n Main requirement: X t X must be full rank (invertible)! S. Gadat Big Data - Lecture 2

  8. Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - Trouble with large dimension p >> n X t X is an p × p matrix, but its rank is lower than n . If n << p , then rk ( X t X ) ≤ n << p. Consequence: the Gram matrix X t X is not invertible and even very ill-conditionned (the most of its eigenvalues are equal to 0 !) The linear model ˆ θ n completely fails. One standard “improvement”: use the ridge regression with an additional penalty: θ Ridge ˆ β ∈ R p � Y − Xβ � 2 2 + λ � β � 2 = arg min n 2 The ridge regression is a particular case of penalized regression. The penalization is still convex w.r.t. β and can be easily solved. We will attempt to describe a better suited penalized regression for high dimensional regression. Our goal: find a method that permits to find ˆ θ n : Select features among the p variables. Can be easily computed with numerical softs. Possess some statistical guarantees. S. Gadat Big Data - Lecture 2

  9. Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - Objective of high dimensional regression Remark: Inconsistency of the standard linear model (and even ridge regression) when p >> n . � � X (ˆ θ n − θ ) � 0 when ( n, p ) �− → + ∞ with p >> n. E Important and nowadays questions: What is a good framework for high dimensional regression ? A good model is required. How can we estimate? An efficient algorithm is necessary. How can we measure the performances: prediction of Y ? Feature selection in θ ? What are we looking for? Statistical guarantees? Some mathematical theorems? S. Gadat Big Data - Lecture 2

  10. Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - bias-variance tradeoff In high dimension: Optimize the fit to the observed data? Reduce the variability? Standard question: find the best curve... In what sense? S. Gadat Big Data - Lecture 2

  11. Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - bias-variance tradeoff Several regressions: Left: fit the best line (1-D regression) Middle: fit the best quadratic polynomial Right: fit the best 10-degree polynomial Now I am interested in the prediction at point x = 0 . 5 . What is the best? S. Gadat Big Data - Lecture 2

  12. Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - bias-variance tradeoff If we are looking for the best possible fit, a high dimensional regressor will be convenient. Nevertheless, our goal is to generally to predict y for new points x and the matching criterion is C ( ˆ f ) := E ( X,Y ) [ Y − ˆ f ( X )] 2 . It is a quadratic loss here, and should be replaced by other criteria (in classification for example). S. Gadat Big Data - Lecture 2

  13. Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - bias-variance tradeoff When the degree increases, the fit to the observed data (red curve) is always decreasing. Over the rest of the population, the generalization error starts decreasing, and after increases. Too simple sets of functions cannot contain the good function, and optimization over simple sets introduces abias. Too complex sets of functions contain the good function but are too rich and generates high variance. S. Gadat Big Data - Lecture 2

  14. Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - bias-variance tradeoff The former balance is illustrated by a very simple theorem. Y = f ( X ) + ǫ with E [ ǫ ] = 0 . Theorem For any estimator ˆ f , one has � � 2 + E � � 2 + E [ Y − f ( X )] 2 f ( X )] 2 = E C ( ˆ f ) = E [ Y − ˆ Y − E [ ˆ E [ ˆ f ( X )] − ˆ f ( X )] f ( X ) The blue term is a bias term. The red term is a variance term. The green term is the Bayes risk and is independent on the estimator ˆ f . Statistical principle: The empirical squared loss � Y − ˆ f ( X ) � 2 2 ,n mimics the bias. Important need to introduce something a variance control of estimation Statistical penalty to mimic the variance. there is an important need to control the variance of estimation. S. Gadat Big Data - Lecture 2

  15. Introduction Sparse High Dimensional Regression Sparsity Lasso estimation Inducing sparsity Application Schedule 1 Introduction Motivation Trouble with large dimension Goals Important balance: bias-variance tradeoff 2 Sparse High Dimensional Regression Sparsity Inducing sparsity 3 Lasso estimation Lasso Estimator Solving the lasso - MM method Statistical results 4 Application S. Gadat Big Data - Lecture 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend