Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker - PowerPoint PPT Presentation

Introduction CCA Lasso Sparse CCA Summary Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and Statistics, Lancaster University July 23, 2008

Introduction CCA Lasso Sparse CCA Summary Outline Introduction 1 Motivation CCA 2 Definition CCA as least squares problem Lasso 3 Definition Lasso algorithms The Lasso algorithms contrasted Sparse CCA 4 SCCA Algorithm for SCCA Example Summary 5

Introduction CCA Lasso Sparse CCA Summary Motivation SCCA improve the interpretation of CCA sparse principal component analysis (SCoTLASS by Jolliffe et al. (2003) and SPCA by Zou et al. (2004)) interesting data sets (market basket analysis) Sparsity shrinkage and model selection simultaneously (may reduce the prediction error, can be extended to high-dimensional data sets)

Introduction CCA Lasso Sparse CCA Summary Definition Canonical Correlation Analysis X1 Y1 S T Xp Yq seek linear combinations S = α T X and T = β T Y such that ρ = max α , β corr ( S , T ) S , T are the canonical variates α , β are called conical loadings Standard solution through eigen decomposition.

Introduction CCA Lasso Sparse CCA Summary CCA as least squares problem 1st dimension Theorem1 Let α , β be p , q dimensional vectors, respectively. � � α , � var ( α T X − β T Y ) ( � β ) = argmin α , β , subject to α T var ( X ) α = β T var ( Y ) β = 1 . α , � Then � β are proportional to the first dimensional ordinary canonical loadings.

Introduction CCA Lasso Sparse CCA Summary CCA as least squares problem 2nd dimension Theorem2 Let α , β be p , q dimensional vectors. � � α , � var ( α T X − β T Y ) ( � β ) = argmin α , β , α T var ( X ) α = β T var ( Y ) β = 1 and st α T 1 var ( X ) α = β T 1 var ( Y ) β = 0 where α 1 , β 1 are the first canonical loadings. α , � Then, � β are proportional to the second dimensional ordinary canonical loadings. The theorems establish an Alternating Least Squares algorithm for CCA.

Introduction CCA Lasso Sparse CCA Summary CCA as least squares problem ALS for CCA Let the objective function be Q ( α , β ) = var ( α T X − β T Y ) subject to α T var ( X ) α = β T var ( Y ) β = 1 . Q is continuous with closed and bounded domain ⇒ Q attains its infimum ALS algorithm Given � α � α , β ) subject to var ( β T Y ) = 1 ) β = arg min β Q ( � Given � β α = arg min α Q ( α , � β ) subject to var ( α T X ) = 1 ) � Q decreases over the iterations and is bounded from below ⇒ Q converges.

Introduction CCA Lasso Sparse CCA Summary Definition Lasso (least absolute shrinkage and selection operator) Introduced by Tibshirani (1996) Imposes the L 1 norm on the linear regression coefficients. Lasso � � � var ( Y − β T X ) β lasso = argmin β subject to � p j = 1 | β j | ≤ t The L 1 norm properties shrink the coefficients towards zero and exactly to zero if t is small enough.

Introduction CCA Lasso Sparse CCA Summary Lasso algorithms Lasso algorithms available in the literature Lasso by Tibshirani Expresses the problem as a least squares problem with 2 p inequality constraints Adapts the NNLS algorithm Lars-Lasso A modified version of Lars algorithm introduced by Efron et al. (2004) Lasso estimates are calculated such that the angle between the active covariates and the residuals is always equal.

Introduction CCA Lasso Sparse CCA Summary Lasso algorithms Proposed algorithm Lasso with positivity constraints Suppose that the sign of the coefficients does not change during shrinkage of the coefficients Positivity Lasso � � � var ( Y − β T X ) β lasso = argmin β subject to s t 0 β ≤ t and s 0 j β j ≥ 0 for i = 1 . . . , p where s 0 is the sign of the OLS estimate. simple algorithm, but quite general restricted version of Lasso algorithms, since the sign of the coefficients cannot change up to p + 1 constraints imposed, << 2 p constraints of Tibshirani’s Lasso

Introduction CCA Lasso Sparse CCA Summary Lasso algorithms Numerical solution The solution is given through quadratic programming methods, Positivity Lasso solution β = b 0 − λ var ( X ) − 1 s 0 + var ( X ) − 1 diag ( s 0 ) µ � b 0 is the OLS estimate. λ is the shrinkage parameter and there is a one to one correspondence between the λ and t µ is zero for active and positive for nonactive coefficients parameters λ and µ are calculated satisfying the KKT conditions under the positivity constraints

Introduction CCA Lasso Sparse CCA Summary The Lasso algorithms contrasted Diabetes data set 442 observations age, sex, body mass index, average blood pressure and six blood serum measurements disease progression one year after baseline Lars − Lasso Positivity Lasso 9 500 500 6 4 Coefficients Coefficients 8 0 1 0 1 2 −500 −500 5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 sum|b|/sum|bols| sum|b|/sum|bols|

Introduction CCA Lasso Sparse CCA Summary The Lasso algorithms contrasted Simulation studies We simulate 200 data sets consisting of 100 observations each from the following model, Y = β T X + σǫ, corr ( X i , X j ) = ρ | i − j | Dataset n p β σ ρ ( 3 , 1 . 5 , 0 , 0 , 2 , 0 , 0 , 0 ) T 1 100 8 3 0.50 ( 3 , 1 . 5 , 0 , 0 , 2 , 0 , 0 , 0 ) T 2 100 8 3 0.90 3 100 8 0 . 85 ∀ j 3 0.50 ( 5 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ) T 4 100 8 2 0.50 Table: Proportions of the cases Table: Proportions of agreement the correct model selected. between Pos-Lasso and Dataset Tibs-Lasso Lars-Lasso Pos-Lasso Dataset Tibs-Lasso Lars-Lasso 1 0.06 0.13 0.14 1 0.76 0.83 2 0.02 0.04 0.04 2 0.63 0.65 3 0.84 0.89 0.87 3 0.95 0.98 4 0.09 0.19 0.19 4 0.77 0.78

Introduction CCA Lasso Sparse CCA Summary SCCA ALS for CCA and Lasso First dimension Given the canonical variate T = β T Y , � � var ( T − α T X ) α = arg min α � st var ( α T X ) = 1 and || α || 1 ≤ t We seek an algorithm solving this optimization problem or Modify the Lasso algorithm in order to incorporate the equality constraint.

Introduction CCA Lasso Sparse CCA Summary Algorithm for SCCA ALS for CCA and Lasso Tibshirani’s Lasso NNLS algorithm cannot incorporate the equality constraint Lars Lasso the equality constraint violates the equiangular condition Positivity Lasso by additionally imposing positivity constraints the above optimization problem can be solved.

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker - PowerPoint PPT Presentation

Introduction CCA Lasso Sparse CCA Summary Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and Statistics, Lancaster University July 23, 2008 Introduction CCA Lasso Sparse CCA Summary Outline

Co-Curricular Activities (CCA) Mr Sim Chong Ghee HOD/ PE & CCA Scope CCA Policy

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

2013 North American CCA Board Meeting Rake, IA Dave Wilcox, CCA Chair, CCA Marketing &

Co-Curricular Activities (CCA) Mr Sunny Ng HOD PE and CCA 13 Jan 2018 TO SHARE: 1. The

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Public-Key Cryptography Lecture 12 CCA Secure PKE Hybrid Encryption CCA Secure PKE In SKE, to get

Using the lasso in Stata for inference in high-dimensional models David M. Drukker Executive

Using the lasso in Stata for inference in high-dimensional models David M. Drukker Executive

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

= + N 1 N M measurements M 1 sparse signal Problem : Solve for x Basis pursuit,

Clinical Research | Clinical Care Investigator Driven Industry Sponsored

Conficker / Downadup spreading vectors MS08-067 Vulnerability in Server service USB-Flash drives

Large-Scale Data Engineering Spark and MLLIB event.cwi.nl/lsde OVERVIEW OF SPARK

ASPR zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA ASPR TRACIE was developed as a

Parallel Numerical Algorithms Chapter 6 Structured and Low Rank Matrices Section 6.3

Swamp Reducing Technique for Tensor Decomposition Carmeliza Navasca Department of Mathematics

Meta-DermDiagnosis: Few-Shot Skin Disease Identification using Meta-Learning Kushagra Mahajan ,

Specker und Absolutheitsanspr uche Arne Hansen Universit` a della Svizzera italiana, Lugano

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker - PowerPoint PPT Presentation

Introduction CCA Lasso Sparse CCA Summary Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and Statistics, Lancaster University July 23, 2008 Introduction CCA Lasso Sparse CCA Summary Outline

Co-Curricular Activities (CCA) Mr Sim Chong Ghee HOD/ PE &amp; CCA Scope CCA Policy

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

2013 North American CCA Board Meeting Rake, IA Dave Wilcox, CCA Chair, CCA Marketing &amp;

Co-Curricular Activities (CCA) Mr Sunny Ng HOD PE and CCA 13 Jan 2018 TO SHARE: 1. The

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Public-Key Cryptography Lecture 12 CCA Secure PKE Hybrid Encryption CCA Secure PKE In SKE, to get

Using the lasso in Stata for inference in high-dimensional models David M. Drukker Executive

Using the lasso in Stata for inference in high-dimensional models David M. Drukker Executive

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

= + N 1 N M measurements M 1 sparse signal Problem : Solve for x Basis pursuit,

Clinical Research | Clinical Care Investigator Driven Industry Sponsored

Conficker / Downadup spreading vectors MS08-067 Vulnerability in Server service USB-Flash drives

Large-Scale Data Engineering Spark and MLLIB event.cwi.nl/lsde OVERVIEW OF SPARK

ASPR zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA ASPR TRACIE was developed as a

Parallel Numerical Algorithms Chapter 6 Structured and Low Rank Matrices Section 6.3

Swamp Reducing Technique for Tensor Decomposition Carmeliza Navasca Department of Mathematics

Meta-DermDiagnosis: Few-Shot Skin Disease Identification using Meta-Learning Kushagra Mahajan ,

Specker und Absolutheitsanspr uche Arne Hansen Universit` a della Svizzera italiana, Lugano

Co-Curricular Activities (CCA) Mr Sim Chong Ghee HOD/ PE & CCA Scope CCA Policy

2013 North American CCA Board Meeting Rake, IA Dave Wilcox, CCA Chair, CCA Marketing &