Big Data - Lecture 2 High dimensional regression with the Lasso S. - PowerPoint PPT Presentation

Introduction Sparse High Dimensional Regression Lasso estimation Application Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014 S. Gadat Big Data - Lecture 2

Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff Schedule 1 Introduction Motivation Trouble with large dimension Goals Important balance: bias-variance tradeoff 2 Sparse High Dimensional Regression Sparsity Inducing sparsity 3 Lasso estimation Lasso Estimator Solving the lasso - MM method Statistical results 4 Application S. Gadat Big Data - Lecture 2

Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - Linear Model In a standard linear model, we have at our disposal ( X i , Y i ) supposed to be linked with Y i = X t i θ 0 + ǫ i , 1 ≤ i ≤ n. We aim to recover the unknown θ 0 . Generically, ( ǫ i ) 1 ≤ i ≤ n is assumed to be i.i.d. replications of a centered and squared integrale noise E [ ǫ 2 ] < ∞ E [ ǫ ] = 0 From a statistical point of view, we expect to find among the p variables that describe X important ones. Typical example: Y i expression level of one gene on sample i X i = ( X i, 1 , . . . , X i,p ) biological signal (DNA micro-arrays) observed on sample i Discover a cognitive link between DNA and the gene expression level. S. Gadat Big Data - Lecture 2

Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - Micro-array analysis - Biological datasets One measures micro-array datasets built from a huge amount of profile genes expression. Number of genes p (of order thousands). Number of samples n (of order hundred). Diagnostic help: healthy or ill? Select among the genes meaningful elements? Find an algorithm with good prediction of the response? S. Gadat Big Data - Lecture 2

Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - Linear Model From a matricial point of view, the linear model can we written as follows: Y ∈ R n , X ∈ M n,p ( R ) , θ 0 ∈ R p Y = Xθ 0 + ǫ, In this lecture, we will consider situations where p varies (typically increases) with n . S. Gadat Big Data - Lecture 2

Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - Linear Model Standard approach: n >> p The M.L.E. in the Gaussian case is the Least Squares Estimator: β ∈ R p � Y − Xβ � 2 ˆ θ n := arg min 2 , given by θ n = ( X t X ) − 1 X t Y ˆ Proposition ˆ θ n is an unbiased estimator of θ 0 such that � X ( θn − θ 0) � 2 If ǫ ∼ N (0 , σ 2 ) : 2 ∼ χ 2 σ 2 p � � � X ( θ n − θ 0 ) � 2 = σ 2 p 2 E n n � X ( θn − θ 0) � 2 is generally neglictible comparing to σ 2 p 2 Most of the time, n n Main requirement: X t X must be full rank (invertible)! S. Gadat Big Data - Lecture 2

Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - Trouble with large dimension p >> n X t X is an p × p matrix, but its rank is lower than n . If n << p , then rk ( X t X ) ≤ n << p. Consequence: the Gram matrix X t X is not invertible and even very ill-conditionned (the most of its eigenvalues are equal to 0 !) The linear model ˆ θ n completely fails. One standard “improvement”: use the ridge regression with an additional penalty: θ Ridge ˆ β ∈ R p � Y − Xβ � 2 2 + λ � β � 2 = arg min n 2 The ridge regression is a particular case of penalized regression. The penalization is still convex w.r.t. β and can be easily solved. We will attempt to describe a better suited penalized regression for high dimensional regression. Our goal: find a method that permits to find ˆ θ n : Select features among the p variables. Can be easily computed with numerical softs. Possess some statistical guarantees. S. Gadat Big Data - Lecture 2

Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - Objective of high dimensional regression Remark: Inconsistency of the standard linear model (and even ridge regression) when p >> n . � � X (ˆ θ n − θ ) � 0 when ( n, p ) �− → + ∞ with p >> n. E Important and nowadays questions: What is a good framework for high dimensional regression ? A good model is required. How can we estimate? An efficient algorithm is necessary. How can we measure the performances: prediction of Y ? Feature selection in θ ? What are we looking for? Statistical guarantees? Some mathematical theorems? S. Gadat Big Data - Lecture 2

Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - bias-variance tradeoff In high dimension: Optimize the fit to the observed data? Reduce the variability? Standard question: find the best curve... In what sense? S. Gadat Big Data - Lecture 2

Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - bias-variance tradeoff Several regressions: Left: fit the best line (1-D regression) Middle: fit the best quadratic polynomial Right: fit the best 10-degree polynomial Now I am interested in the prediction at point x = 0 . 5 . What is the best? S. Gadat Big Data - Lecture 2

Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - bias-variance tradeoff If we are looking for the best possible fit, a high dimensional regressor will be convenient. Nevertheless, our goal is to generally to predict y for new points x and the matching criterion is C ( ˆ f ) := E ( X,Y ) [ Y − ˆ f ( X )] 2 . It is a quadratic loss here, and should be replaced by other criteria (in classification for example). S. Gadat Big Data - Lecture 2

Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - bias-variance tradeoff When the degree increases, the fit to the observed data (red curve) is always decreasing. Over the rest of the population, the generalization error starts decreasing, and after increases. Too simple sets of functions cannot contain the good function, and optimization over simple sets introduces abias. Too complex sets of functions contain the good function but are too rich and generates high variance. S. Gadat Big Data - Lecture 2

Introduction Motivation Sparse High Dimensional Regression Trouble with large dimension Lasso estimation Goals Application Important balance: bias-variance tradeoff I Introduction - bias-variance tradeoff The former balance is illustrated by a very simple theorem. Y = f ( X ) + ǫ with E [ ǫ ] = 0 . Theorem For any estimator ˆ f , one has � � 2 + E � � 2 + E [ Y − f ( X )] 2 f ( X )] 2 = E C ( ˆ f ) = E [ Y − ˆ Y − E [ ˆ E [ ˆ f ( X )] − ˆ f ( X )] f ( X ) The blue term is a bias term. The red term is a variance term. The green term is the Bayes risk and is independent on the estimator ˆ f . Statistical principle: The empirical squared loss � Y − ˆ f ( X ) � 2 2 ,n mimics the bias. Important need to introduce something a variance control of estimation Statistical penalty to mimic the variance. there is an important need to control the variance of estimation. S. Gadat Big Data - Lecture 2

Introduction Sparse High Dimensional Regression Sparsity Lasso estimation Inducing sparsity Application Schedule 1 Introduction Motivation Trouble with large dimension Goals Important balance: bias-variance tradeoff 2 Sparse High Dimensional Regression Sparsity Inducing sparsity 3 Lasso estimation Lasso Estimator Solving the lasso - MM method Statistical results 4 Application S. Gadat Big Data - Lecture 2

Big Data - Lecture 2 High dimensional regression with the Lasso S. - PowerPoint PPT Presentation

Introduction Sparse High Dimensional Regression Lasso estimation Application Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014 S. Gadat Big Data - Lecture 2 Introduction Sparse High

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Bayesian Regression with Input Noise for High Dimensional Data Jo-Anne Ting 1 , Aaron DSouza 2

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

High Dimensional Data Alark Joshi High dimensional data Data with multiple dimensions,

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010

Recent Developments in the Statistical Analysis of Interval Data The Case of Regression Ulrich

18 Concurrency Control Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer

Restarted Bayesian Online Change-point Detector achieves Optimal Detection Delay Reda ALAMI

Variance bounds for estimators in autoregressive models with constraints Wolfgang Wefelmeyer

Lecture 5 : Sparse Models Homework 3 discussion (Nima) Sparse Models Lecture - Reading :

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

COMMISSION Workshop for Deaf and Hard of Hearing people, and their families, friends and