Overview Least Angle Regression Why is LARS imporant? Tim - - PowerPoint PPT Presentation

overview least angle regression
SMART_READER_LITE
LIVE PREVIEW

Overview Least Angle Regression Why is LARS imporant? Tim - - PowerPoint PPT Presentation

Overview Least Angle Regression Why is LARS imporant? Tim Hesterberg, Insightful Corp. Other packages GLARS package Issues 16 June 2006 Insightful Research This is joint work with Chris Fraley, with support from NIH SBIR


slide-1
SLIDE 1

Least Angle Regression

Tim Hesterberg, Insightful Corp. 16 June 2006 This is joint work with Chris Fraley, with support from NIH SBIR Phase I 1 R43 GM074313-01

Tim Hesterberg, Insightful Corp. Least Angle Regression

Overview

◮ Why is LARS imporant? ◮ Other packages ◮ GLARS package ◮ Issues ◮ Insightful Research

Tim Hesterberg, Insightful Corp. Least Angle Regression

Why is LARS important?

◮ Variable Selection in Regression

◮ Important ◮ Many approaches: stagewise, boosting, LASSO, regularization,

. . .

◮ Least Angle Regression — Efron, Hastie, Johnstone,

Tibshirani (2004) Annals (with discussion)

  • 1. Lasso
  • 2. Forward stagewise
  • 3. Least Angle Regression (LAR)

◮ Unifying explanation ◮ Fast implementation ◮ Fast way to choose tuning parameter Tim Hesterberg, Insightful Corp. Least Angle Regression

Ridge Regression

◮ Minimize (Yi − ˆ

Yi) + λ ˆ β2

j

theta beta −500 500 0.0 0.1 1.0 10.0 AGE SEX BMI BP S1 S2 S3 S4 S5 S6 Tim Hesterberg, Insightful Corp. Least Angle Regression

slide-2
SLIDE 2

LASSO

◮ Minimize (Yi − ˆ

Yi) + λ |ˆ βj|

◮ Forces small coefficients → 0; gives simpler models. ◮ Smaller penalty on large coefficients: less effect on important

terms

◮ Implementation is more complicated and slower

sum( |beta| ) Standardized Coefficients 1000 2000 3000 −500 500

LASSO

AGE SEX BMI BP S1 S2 S3 S4 S5 S6

sum( |beta| ) Standardized Coefficients 1000 2000 3000 −500 500

Ridge Regression

AGE SEX BMI BP S1 S2 S3 S4 S5 S6

Tim Hesterberg, Insightful Corp. Least Angle Regression

Forward Stagewise Regression

(Forward Stagewise = Least Squares Boosting)

  • 1. Initialize: standardize predictors, center y,

r = y, β1 = . . . = βp = 0

  • 2. Repeat many times

◮ Find the predictor xj most correlated with r ◮ δ = ǫsign(r · xj) ◮ ˆ

βj ← ˆ βj + δ

◮ r ← r − δxj Tim Hesterberg, Insightful Corp. Least Angle Regression

Forward Stagewise and LASSO

March 2003 Trevor Hastie, Stanford Statistics 6

✬ ✫ ✩ ✪

Prostate Cancer Data

0.0 0.5 1.0 1.5 2.0 2.5

  • 0.2

0.0 0.2 0.4 0.6 lcavol lweight age lbph svi lcp gleason pgg45 50 100 150 200 250

  • 0.2

0.0 0.2 0.4 0.6 lcavol lweight age lbph svi lcp gleason pgg45

t =

j |βj|

Coefficients Coefficients

Lasso Forward Stagewise

Iteration

Tim Hesterberg, Insightful Corp. Least Angle Regression

Similarity:

Are LASSO and infinitesimal forward stagewise identical?

◮ With orthogonal predictors, yes. ◮ Otherwise similar.

Least Angle Regression provides explanation, and fast implementation.

Tim Hesterberg, Insightful Corp. Least Angle Regression

slide-3
SLIDE 3

Stepwise, Forward Stagewise, Least Angle

Stepwise regression:

◮ Pick predictor most correlated with y ◮ Bring predictor completely into model (full LS

fit) Forward stagewise:

◮ Pick predictor most correlated with y ◮ Increment coefficient for predictor

Least Angle Regression:

◮ Pick predictor most correlated with y ◮ Bring predictor into model only to extent it is

better than others

◮ Move in least-squares direction until another

variable is as correlated

Tim Hesterberg, Insightful Corp. Least Angle Regression

Least Angle Regression

O X2 X1 A B C D E

C = projection of y onto space spanned by X1 and X2. B = first step for least-angle regression E = point on stagewise path

Tim Hesterberg, Insightful Corp. Least Angle Regression

LARS - other packages

lars : Efron and Hastie (S-PLUS and R)

◮ Linear regression

glmpath : Park and Hastie (R)

◮ GLM and Cox Proportional Hazards

Methods: plot, print, predict, cv, coef

Tim Hesterberg, Insightful Corp. Least Angle Regression

S+GLARS

◮ S-PLUS and R, open source

◮ Incorporate lars, glmpath ◮ Cleanup, consistent interface ◮ Incorporate future work by others; provide framework

◮ Extensions

◮ Numerically-accurate calculations ◮ Factors, splines, polynomials, interactions, . . . ◮ Other models (robust regression, . . . ), other penalties ◮ Missing data ◮ Massive data sets ◮ Diagnostics, tools for selecting tuning parameter

◮ User-friendly

◮ Consistent interface ◮ GUI ◮ Documentation Tim Hesterberg, Insightful Corp. Least Angle Regression

slide-4
SLIDE 4

Issues

◮ Money

◮ NIH funding: require commercial potential ◮ Insightful: indirect benefit

◮ Outside contributors ◮ Licensing; ability to ship with S-PLUS, I-Miner.

Tim Hesterberg, Insightful Corp. Least Angle Regression

Insightful Research Department

◮ Turn research into software for wide use

◮ Higher standards than academic software (ease of use,

robustness, testing)

◮ Collaboration ◮ Variety: resampling, missing data, group sequential designs,

simulation-based econometric software, functional data, stable distributions, proteomics, microarrays, frailty models, causal modeling

◮ External funding — SBIR grants (NIH, NSF, . . . )

◮ Somewhat easier funding ◮ Commercial potential ◮ Risk, research element

◮ We’re hiring ◮ We’re looking for good projects and collaborators

Tim Hesterberg, Insightful Corp. Least Angle Regression