Sparse Linear Models Trevor Hastie Stanford University PIMS Public - PowerPoint PPT Presentation

Stanford September 2013 Trevor Hastie, Stanford Statistics 1 Sparse Linear Models Trevor Hastie Stanford University PIMS Public Lecture Year of Statistics 2013 joint work with Jerome Friedman, Rob Tibshirani and Noah Simon

Year of Statistics Statistics in the news How IBM built Watson, its Jeopardy -playing supercomputer by Dawn Kawamoto DailyFinance 02/08/2011 Learning from its mis- takes According to David Ferrucci (PI of Watson DeepQA technology for �� IBM Research), Watson’s software is wired for more that handling natural lan- guage processing. “ It’s machine learning allows the computer to become smarter as it tries to answer questions — and to learn as it gets them right or wrong.” For TodayÕs Graduate, Just One Word: Statistics N By STEVE LOHR Data Science is everywhere. Published: August 5, 2009 SIGN IN TO RECOMMEND MOUNTAIN VIEW, Calif. Ñ At Harvard, Carrie Grimes majored in SIGN IN TO anthropology and archaeology and ventured to places like Honduras, E-MAIL where she studied Mayan settlement patterns by mapping where PRINT artifacts were found. But she was drawn to what she calls Òall the REPRINTS computer and math stuffÓ that was part of the job. Quote of the Day, There has never been a bet- SHARE Enlarge This Image ÒPeople think of field archaeology as New York Times, Indiana Jones, but much of what you really do is data analysis,Ó she said. August 5, 2009 ter time to be a statistician. Now Ms. Grimes does a different kind of digging. She works at Google, ”I keep saying that the where she uses statistical analysis of mounds of data to come up with ways to improve its search engine. sexy job in the next 10 Thor Swift for The New York Times Carrie Grimes, senior staff engineer at Google, uses statistical analysis of Ms. Grimes is an Internet-age statistician, one of many data to help improve the company's years will be statisticians. search engine. who are changing the image of the profession as a place for dronish number nerds. They are finding themselves Multimedia And I’m not kidding.” increasingly in demand Ñ and even cool. ÒI keep saying that the sexy job in the next 10 years will be Su — HAL VARIAN, chief statisticians,Ó said Hal Varian, chief economist at Google. ÒAnd IÕm not kidding.Ó economist at Google. 1 / 1

Year of Statistics Statistics in the news How IBM built Watson, its Jeopardy -playing supercomputer by Dawn Kawamoto DailyFinance 02/08/2011 Learning from its mis- takes According to David Ferrucci (PI of Watson DeepQA technology for �� IBM Research), Watson’s software is wired for more that handling natural lan- guage processing. “ It’s machine learning allows the computer to become smarter as it tries to answer questions — and to learn as it gets them right or wrong.” For TodayÕs Graduate, Just One Word: Statistics N By STEVE LOHR Data Science is everywhere. Published: August 5, 2009 SIGN IN TO RECOMMEND MOUNTAIN VIEW, Calif. Ñ At Harvard, Carrie Grimes majored in SIGN IN TO anthropology and archaeology and ventured to places like Honduras, E-MAIL where she studied Mayan settlement patterns by mapping where PRINT artifacts were found. But she was drawn to what she calls Òall the REPRINTS computer and math stuffÓ that was part of the job. Quote of the Day, There has never been a bet- SHARE Enlarge This Image ÒPeople think of field archaeology as New York Times, Indiana Jones, but much of what you really do is data analysis,Ó she said. August 5, 2009 ter time to be a statistician. Now Ms. Grimes does a different kind of digging. She works at Google, ”I keep saying that the where she uses statistical analysis of mounds of data to come up with ways to improve its search engine. sexy job in the next 10 Thor Swift for The New York Times Carrie Grimes, senior staff engineer at Google, uses statistical analysis of Ms. Grimes is an Internet-age statistician, one of many data to help improve the company's years will be statisticians. search engine. who are changing the image of the profession as a place for Nerds rule! dronish number nerds. They are finding themselves Multimedia And I’m not kidding.” increasingly in demand Ñ and even cool. ÒI keep saying that the sexy job in the next 10 years will be Su — HAL VARIAN, chief statisticians,Ó said Hal Varian, chief economist at Google. ÒAnd IÕm not kidding.Ó economist at Google. 1 / 1

Stanford September 2013 Trevor Hastie, Stanford Statistics 2 Linear Models for Wide Data As datasets grow wide —i.e. many more features than samples—the linear model has regained favor as the tool of choice. Document classification: bag-of-words easily leads to p = 20 K features and N = 5 K document samples. Much more if bigrams, trigrams etc, or documents from Facebook, Google, Yahoo! Genomics, microarray studies: p = 40 K genes are measured for each of N = 300 subjects. Genome-wide association studies: p =1–2M SNPs measured for N = 2000 case-control subjects. In examples like these we tend to use linear models — e.g. linear regression, logistic regression, Cox model. Since p ≫ N , we cannot fit these models using standard approaches.

Stanford September 2013 Trevor Hastie, Stanford Statistics 3 Forms of Regularization We cannot fit linear models with p > N without some constraints. Common approaches are Forward stepwise adds variables one at a time and stops when overfitting is detected. Regained popularity for p ≫ N , since it is the only feasible method among it’s subset cousins (backward stepwise, best-subsets). Ridge regression fits the model subject to constraint � p j =1 β 2 j ≤ t . Shrinks coefficients toward zero, and hence controls variance. Allows linear models with arbitrary size p to be fit, although coefficients always in row-space of X .

Stanford September 2013 Trevor Hastie, Stanford Statistics 4 Lasso regression (Tibshirani, 1995) fits the model subject to constraint � p j =1 | β j | ≤ t . Lasso does variable selection and shrinkage, while ridge only shrinks. ˆ ˆ β β β 2 β 2 β 1 β 1

Stanford September 2013 Trevor Hastie, Stanford Statistics 5 Lasso Coefficient Path 0 2 3 4 5 7 8 10 9 Standardized Coefficients 500 6 4 8 10 0 1 2 −500 5 0.0 0.2 0.4 0.6 0.8 1.0 || ˆ β ( λ ) || 1 / || ˆ β (0) || 1 i β ) 2 + λ || β || 1 Lasso: ˆ � N 1 i =1 ( y i − β 0 − x T β ( λ ) = argmin β N fit using lars package in R (Efron, Hastie, Johnstone, Tibshirani 2002)

Stanford September 2013 Trevor Hastie, Stanford Statistics 6 Ridge versus Lasso lcavol lcavol • • • 0.6 • 0.6 • • • • • • • 0.4 0.4 • • • svi svi lweight • • • • lweight • • • • pgg45 • • • • • • Coefficients • • • Coefficients • • pgg45 • • • • • • • • • lbph • • • 0.2 • • • • lbph 0.2 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 0.0 • • • 0.0 • • • • • • • • • • • • • gleason • • • • gleason • • • • • • • • • • • • • • • age • • age • −0.2 −0.2 • • lcp • lcp 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 || ˆ β ( λ ) || 1 / || ˆ β (0) || 1 df( λ )

Stanford September 2013 Trevor Hastie, Stanford Statistics 7 Cross Validation to select λ Poisson Family 97 97 96 95 92 90 86 79 71 62 47 34 19 9 8 6 4 3 2 0 1.5 Poisson Deviance 1.4 1.3 1.2 −7 −6 −5 −4 −3 −2 −1 log(Lambda) K-fold cross-validation is easy and fast. Here K=10, and the true model had 10 out of 100 nonzero coefficients.

Stanford September 2013 Trevor Hastie, Stanford Statistics 8 History of Path Algorithms Efficient path algorithms for ˆ β ( λ ) allow for easy and exact cross-validation and model selection. • In 2001 the LARS algorithm (Efron et al) provides a way to compute the entire lasso coefficient path efficiently at the cost of a full least-squares fit. • 2001 – 2008: path algorithms pop up for a wide variety of related problems: Group lasso (Yuan & Lin 2006), support-vector machine (Hastie, Rosset, Tibshirani & Zhu 2004), elastic net (Zou & Hastie 2004), quantile regression (Li & Zhu, 2007), logistic regression and glms (Park & Hastie, 2007), Dantzig selector (James & Radchenko 2008), ... • Many of these do not enjoy the piecewise-linearity of LARS, and seize up on very large problems.

Sparse Linear Models Trevor Hastie Stanford University PIMS Public - PowerPoint PPT Presentation

Stanford September 2013 Trevor Hastie, Stanford Statistics 1 Sparse Linear Models Trevor Hastie Stanford University PIMS Public Lecture Year of Statistics 2013 joint work with Jerome Friedman, Rob Tibshirani and Noah Simon Year of

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Lecture 5 : Sparse Models Homework 3 discussion (Nima) Sparse Models Lecture - Reading :

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Bibliography for Module 8 on Immunological Correlates of Protection Sixth Summer Institute in

HIV and Mental Health in Ontario Evan Collins MD FRCPC Staff Psychiatrist, Immunodeficiency

Developing Logic Models for School Improvement Systems Jenna Zacamy & Angelica Herrera 1 / 9

Hidden Markov Models George Konidaris gdk@cs.brown.edu Fall 2019 Recall: Bayesian Network Flu

Byron K. Lee MD Professor of Medicine Director of EP Laboratory Arrhythmias, Heart Failure, and

The incidence of MRSA infections in the United States: Is a more comprehensive tracking system

Health Affairs Committee Kristin Hahn-Cover, MD, Chief Quality Officer OPEN - HEALTH AFF - INFO

SEX DIFFERENCES IN OUTCOMES OF CARDIOGENIC SHOCK REQUIRING TEMPORARY PERCUTANEOUS MECHANICAL

Sambuz

Useful Links

Newsletter

Mail Us

Sparse Linear Models Trevor Hastie Stanford University PIMS Public - PowerPoint PPT Presentation

Stanford September 2013 Trevor Hastie, Stanford Statistics 1 Sparse Linear Models Trevor Hastie Stanford University PIMS Public Lecture Year of Statistics 2013 joint work with Jerome Friedman, Rob Tibshirani and Noah Simon Year of

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Lecture 5 : Sparse Models Homework 3 discussion (Nima) Sparse Models Lecture - Reading :

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Bibliography for Module 8 on Immunological Correlates of Protection Sixth Summer Institute in

HIV and Mental Health in Ontario Evan Collins MD FRCPC Staff Psychiatrist, Immunodeficiency

Developing Logic Models for School Improvement Systems Jenna Zacamy &amp; Angelica Herrera 1 / 9

Hidden Markov Models George Konidaris gdk@cs.brown.edu Fall 2019 Recall: Bayesian Network Flu

Byron K. Lee MD Professor of Medicine Director of EP Laboratory Arrhythmias, Heart Failure, and

The incidence of MRSA infections in the United States: Is a more comprehensive tracking system

Health Affairs Committee Kristin Hahn-Cover, MD, Chief Quality Officer OPEN - HEALTH AFF - INFO

SEX DIFFERENCES IN OUTCOMES OF CARDIOGENIC SHOCK REQUIRING TEMPORARY PERCUTANEOUS MECHANICAL

Sambuz

Useful Links

Newsletter

Mail Us

Developing Logic Models for School Improvement Systems Jenna Zacamy & Angelica Herrera 1 / 9