From simple structure to sparse components: a comparative - PowerPoint PPT Presentation

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain From simple structure to sparse components: a comparative introduction Nickolay T. Trendafilov Department of Mathematics and Statistics The Open University, UK 1 / 30

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Contents Intro/Motivation Classic vs. sparse PCA Simple structure rotation/concept in PCA (and FA) PCA interpretation via rotation methods Example: the Pitprop data Analyzing high-dimensional multivariate data Abandoning the rotation methods Algorithms for sparse component analysis Taxonomy of PCA subject to ℓ 1 constraint (LASSO) Function-constrained sparse components Orthonormal sparse loadings and correlated components Uncorrelated sparse components Application to simple structure rotation Thurstone’s 26 box problem Twenty-four psychological tests 2 / 30

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Intro/Motivation Why sparse PCA? Main goal: Analyzing high-dimensional multivariate data Main tools: Low-dimensional data representation, e.g. PCA Interpretation Main problems: PCA might be too slow results involve all input variables which complicates the interpretation 3 / 30

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Simple structure rotation/concept in PCA (and FA) Simple structure rotation in PCA (and FA) Steps Low-dimensional data approximation,... Followed by rotation of the PC loadings The rotation is found by optimizing certain criterion which defines/formalizes the perception for simple (interpretable) structure Drawbacks of the rotated components: still difficult to interpret loadings correlated components, which also do not explain decreasing amount of variance. 4 / 30

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Simple structure rotation/concept in PCA (and FA) The Thurstone’s simple structure concept... (Thurstone, 1947, p. 335) 1 Each row of the factor matrix should have at least one zero , 2 If there are r common factors each column of the factor matrix should have at least r zeros , 3 For every pair of columns of the factor matrix there should be several variables whose entries vanish in one column but not in the other, 4 For every pair of columns of the factor matrix, a large proportion of the variables should have vanishing entries in both columns when there are four or more factors, 5 For every pair of columns of the factor matrix there should be only a small number of variables with non-vanishing entries in both columns. 5 / 30

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Simple structure rotation/concept in PCA (and FA) ... and implementing it Rotation 1 Graphical (subjective?) 2 Analytical (too many criteria!) 3 Hyperplane counting: maxplane, functionplane, hyball, and recently revived as CLF/CLC 4 Hyperplane fitting rotations: promax, promaj, promin 5 Rotation to independent components: ICA as a rotation method (applicable for p ≫ n ) Main problems: 1 Formalizing the Thurstone’s rules into a single formula 2 Achieving vanishing entries, i.e. exact zeros 3 Correlated components 4 Do not explain decreasing amount of variance 5 Impractical for modern applications when p ≫ n 6 / 30

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA PCA interpretation via rotation methods The interpretation issue Traditionally, PCs are considered easily interpretable if there are plenty of small component loadings indicating the negligible importance of the corresponding variables. Jollife, 2002, p.269 The most common way of doing this is to ignore (effectively set to zero) coefficients whose absolute values fall below some threshold. Thus, implicitly, the PCs simplicity and interpretability are associated with the sparseness of the component loadings. 7 / 30

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA PCA interpretation via rotation methods The interpretation issue (continued) However, ignoring the small loadings is subjective and misleading, especially for PCs from covariance matrix (Cadima & Jollife, 1995). Cadima & Jollife, 1995 One of the reasons for this is that it is not just loadings but also the size (standard deviation) of each variable which determines the importance of that variable in the linear combination. Therefore it may be desirable to put more emphasis on simplicity than on variance maximization. 8 / 30

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Example: the Pitprop data The Pitprop data consist of 14 variables which were measured for each of 180 pitprops cut from Corsican pine timber. One variable is compressive strength , and the other 13 variables are physical measurements on the pitprops (Jeffers, 1967). Table: Jeffers’s Pitprop data: Loadings of the first six PCs and their interpretation by normalizing each column, and then, taking loadings greater than .7 only (Jeffers, 1967) Component loadings ( AD ) Jeffers’s interpretation Vars 1 2 3 4 5 6 1 2 3 4 5 6 topdiam .83 .34 -.28 -.10 .08 .11 1.0 length .83 .29 -.32 -.11 .11 .15 1.0 moist .26 .83 .19 .08 -.33 -.25 1.0 testsg .36 .70 .48 .06 -.34 -.05 .84 .73 ovensg .12 -.26 .66 .05 -.17 .56 1.0 1.0 ringtop .58 -.02 .65 -.07 .30 .05 .70 .99 ringbut .82 -.29 .35 -.07 .21 .00 .99 bowmax .60 -.29 -.33 .30 -.18 -.05 .72 bowdist .73 .03 -.28 .10 .10 .03 .88 whorls .78 -.38 -.16 -.22 -.15 -.16 .93 clear -.02 .32 -.10 .85 .33 .16 1.0 knots -.24 .53 .13 -.32 .57 -.15 1.0 diaknot -.23 .48 -.45 -.32 -.08 .57 1.0 9 / 30

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Example: the Pitprop data Table: Jeffers’s Pitprop data: Rotated loadings by varimax and their interpretation by normalizing each column, and then, taking loadings greater than .59 only VARIMAX loadings Normalized loadings greater than .55 Vars 1 2 3 4 5 6 1 2 3 4 5 6 topdiam .91 .26 -.01 .03 .01 .08 .97 length .94 .19 -.00 .03 .00 .10 1.0 moist .13 .96 -.14 .08 .08 .04 1.0 testsg .13 .95 .24 .03 .06 -.03 .98 ovensg -.14 .03 .90 -.03 -.18 -.03 1.0 ringtop .36 .19 .61 -.03 .28 -.49 .68 ringbut .62 -.02 .47 -.13 -.01 -.55 .66 bowmax .54 -.10 -.10 .11 -.56 -.23 -.64 bowdist .77 .03 -.03 .12 -.16 -.12 .82 whorls .68 -.10 .02 -.40 -.35 -.34 .73 clear .03 .08 -.04 .97 .00 -.00 1.0 knots -.06 .14 -.14 .04 .87 .09 1.0 diaknot .10 .04 -.07 -.01 .15 .93 1.0 10 / 30

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Analyzing high-dimensional multivariate data Abandoning the rotation methods Alternative to rotation: modify PCA to produce explicitly simple principal components The first method to directly construct sparse components was proposed by Hausman (1982): it finds PC loadings from a prescribed subset of values, say S = {− 1 , 0 , 1 } Jolliffe & Uddin (2000) were the first to modify the original PCs to additionally satisfy the Varimax criterion ( s implified co mponent t echnique, SCoT) Jolliffe, Trendafilov & Uddin (2003) were the first to modify the original PCs to additionally satisfy the LASSO constraint, which drives many loadings to exact zeros (SCoTLASS) 11 / 30

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Analyzing high-dimensional multivariate data Algorithms for sparse component analysis A great number of efficient numerical procedures: Zou, Hastie & Tibshirany (2006) transform the standard PCA into a regression form to propose fast algorithm (SPCA) for sparse PCA, applicable to large data Moghaddam, Weiss & Avidan (2006) use spectral bounds of submatrices of the sample correlation matrix to identify the subset of m variables explaining the maximum variance among all possible subsets of size m d’Aspremont, Ghaoui, Jordan & Lanckriet (2007) replace LASSO by cardinality constraint and apply semidefinite programming, SDP (sound theory, but not very fast!) d’Aspremont, Bach, & Ghaoui (2008) another SDP relaxation to construct more efficient greedy algorithm than d’Aspremont, et al. (2007) and Moghaddam, et al. (2006). 12 / 30

From simple structure to sparse components: a comparative - PowerPoint PPT Presentation

ERCIM12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain From simple structure to sparse components: a comparative introduction Nickolay T. Trendafilov Department of Mathematics and Statistics

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

WP3 EX-POST Case studies Comparative Analysis Report Deliverable no.: 3.2 Comparative Analysis

Classification of curves Simple, not closed Simple, closed Closed, not simple Not simple, not

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

COMPARATIVE HISTOLOGY SLIDE SETS Cat #: CH-COMP1 - COMPARATIVE EPITHELIUM & CONNECTIVE TISSUE

International Comparative Assessments 1 05/06/2015 1 International Comparative Assessments Key

Comparative Genomics Comparative Genomics Common Themes Gene and functional pathway

Comparative statics Comparative statics is the study of how endogenous variables respond to

Resumex COMPARATIVE OF EQUALITY AS + adjective + AS (to, tanto...quanto, como) COMPARATIVE OF

Topology of Positive Zero Sets of Bivariate Pentanomials Malachi Alexander 1 , Ashley De Luna 2

Estimation of Gain Factors for the Cold Neutron Source in European Research Reactor Han Jong Yoo*,

M E E T S the GPU Agenda - Maxwell Render overview - Maxwell for the GPU -Why? -Why now?

altered fractionated RT in HNSCC : w hat is the m agnitude of the benefit ? Jean Bourhis, MD PhD

Introduction Universities have largely ceased to enjoy their relative geographic monopolies.

Associations with Typical and Recent Dreams Jayne Gackenbach and Arielle Boyes Grant MacEwan

Water Quality Assessment www//teiemt.gr Professor Spanos Thomas 10/2015 Kavala Quality of Water

DRIVING CHANGE, POSITIONED FOR GROWTH Optimized LOM Plan Workshop Presentation May 1, 2019

From simple structure to sparse components: a comparative - PowerPoint PPT Presentation

ERCIM12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain From simple structure to sparse components: a comparative introduction Nickolay T. Trendafilov Department of Mathematics and Statistics

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

WP3 EX-POST Case studies Comparative Analysis Report Deliverable no.: 3.2 Comparative Analysis

Classification of curves Simple, not closed Simple, closed Closed, not simple Not simple, not

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

COMPARATIVE HISTOLOGY SLIDE SETS Cat #: CH-COMP1 - COMPARATIVE EPITHELIUM &amp; CONNECTIVE TISSUE

International Comparative Assessments 1 05/06/2015 1 International Comparative Assessments Key

Comparative Genomics Comparative Genomics Common Themes Gene and functional pathway

Comparative statics Comparative statics is the study of how endogenous variables respond to

Resumex COMPARATIVE OF EQUALITY AS + adjective + AS (to, tanto...quanto, como) COMPARATIVE OF

Topology of Positive Zero Sets of Bivariate Pentanomials Malachi Alexander 1 , Ashley De Luna 2

Estimation of Gain Factors for the Cold Neutron Source in European Research Reactor Han Jong Yoo*,

M E E T S the GPU Agenda - Maxwell Render overview - Maxwell for the GPU -Why? -Why now?

altered fractionated RT in HNSCC : w hat is the m agnitude of the benefit ? Jean Bourhis, MD PhD

Introduction Universities have largely ceased to enjoy their relative geographic monopolies.

Associations with Typical and Recent Dreams Jayne Gackenbach and Arielle Boyes Grant MacEwan

Water Quality Assessment www//teiemt.gr Professor Spanos Thomas 10/2015 Kavala Quality of Water

DRIVING CHANGE, POSITIONED FOR GROWTH Optimized LOM Plan Workshop Presentation May 1, 2019

COMPARATIVE HISTOLOGY SLIDE SETS Cat #: CH-COMP1 - COMPARATIVE EPITHELIUM & CONNECTIVE TISSUE