From simple structure to sparse components: a comparative - - PowerPoint PPT Presentation

from simple structure to sparse components a comparative
SMART_READER_LITE
LIVE PREVIEW

From simple structure to sparse components: a comparative - - PowerPoint PPT Presentation

ERCIM12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain From simple structure to sparse components: a comparative introduction Nickolay T. Trendafilov Department of Mathematics and Statistics


slide-1
SLIDE 1

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain

From simple structure to sparse components: a comparative introduction

Nickolay T. Trendafilov

Department of Mathematics and Statistics The Open University, UK

1 / 30

slide-2
SLIDE 2

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain

Contents

Intro/Motivation Classic vs. sparse PCA

Simple structure rotation/concept in PCA (and FA) PCA interpretation via rotation methods Example: the Pitprop data

Analyzing high-dimensional multivariate data

Abandoning the rotation methods Algorithms for sparse component analysis Taxonomy of PCA subject to ℓ1 constraint (LASSO)

Function-constrained sparse components

Orthonormal sparse loadings and correlated components Uncorrelated sparse components

Application to simple structure rotation

Thurstone’s 26 box problem Twenty-four psychological tests

2 / 30

slide-3
SLIDE 3

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Intro/Motivation

Why sparse PCA?

Main goal: Analyzing high-dimensional multivariate data Main tools:

Low-dimensional data representation, e.g. PCA Interpretation

Main problems:

PCA might be too slow results involve all input variables which complicates the interpretation

3 / 30

slide-4
SLIDE 4

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Simple structure rotation/concept in PCA (and FA)

Simple structure rotation in PCA (and FA)

Steps

Low-dimensional data approximation,... Followed by rotation of the PC loadings The rotation is found by optimizing certain criterion which defines/formalizes the perception for simple (interpretable) structure

Drawbacks of the rotated components:

still difficult to interpret loadings correlated components, which also do not explain decreasing amount of variance.

4 / 30

slide-5
SLIDE 5

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Simple structure rotation/concept in PCA (and FA)

The Thurstone’s simple structure concept...

(Thurstone, 1947, p. 335)

1 Each row of the factor matrix should have at least one

zero,

2 If there are r common factors each column of the factor

matrix should have at least r zeros,

3 For every pair of columns of the factor matrix there

should be several variables whose entries vanish in one column but not in the other,

4 For every pair of columns of the factor matrix, a large

proportion of the variables should have vanishing entries in both columns when there are four or more factors,

5 For every pair of columns of the factor matrix there

should be only a small number of variables with non-vanishing entries in both columns.

5 / 30

slide-6
SLIDE 6

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Simple structure rotation/concept in PCA (and FA)

... and implementing it

Rotation

1 Graphical (subjective?) 2 Analytical (too many criteria!) 3 Hyperplane counting: maxplane, functionplane, hyball,

and recently revived as CLF/CLC

4 Hyperplane fitting rotations: promax, promaj, promin 5 Rotation to independent components: ICA as a rotation

method (applicable for p ≫ n)

Main problems:

1 Formalizing the Thurstone’s rules into a single formula 2 Achieving vanishing entries, i.e. exact zeros 3 Correlated components 4 Do not explain decreasing amount of variance 5 Impractical for modern applications when p ≫ n

6 / 30

slide-7
SLIDE 7

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA PCA interpretation via rotation methods

The interpretation issue

Traditionally, PCs are considered easily interpretable if there are plenty of small component loadings indicating the negligible importance of the corresponding variables. Jollife, 2002, p.269 The most common way of doing this is to ignore (effectively set to zero) coefficients whose absolute values fall below some threshold. Thus, implicitly, the PCs simplicity and interpretability are associated with the sparseness of the component loadings.

7 / 30

slide-8
SLIDE 8

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA PCA interpretation via rotation methods

The interpretation issue (continued)

However, ignoring the small loadings is subjective and misleading, especially for PCs from covariance matrix (Cadima & Jollife, 1995). Cadima & Jollife, 1995 One of the reasons for this is that it is not just loadings but also the size (standard deviation) of each variable which determines the importance of that variable in the linear combination. Therefore it may be desirable to put more emphasis on simplicity than on variance maximization.

8 / 30

slide-9
SLIDE 9

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Example: the Pitprop data

The Pitprop data consist of 14 variables which were measured for each of 180 pitprops cut from Corsican pine timber. One variable is compressive strength, and the other 13 variables are physical measurements on the pitprops (Jeffers, 1967).

Table: Jeffers’s Pitprop data: Loadings of the first six PCs and their interpretation by normalizing each column, and then, taking loadings greater than .7 only (Jeffers, 1967)

Component loadings (AD) Jeffers’s interpretation Vars 1 2 3 4 5 6 1 2 3 4 5 6 topdiam .83 .34

  • .28
  • .10

.08 .11 1.0 length .83 .29

  • .32
  • .11

.11 .15 1.0 moist .26 .83 .19 .08

  • .33
  • .25

1.0 testsg .36 .70 .48 .06

  • .34
  • .05

.84 .73

  • vensg

.12

  • .26

.66 .05

  • .17

.56 1.0 1.0 ringtop .58

  • .02

.65

  • .07

.30 .05 .70 .99 ringbut .82

  • .29

.35

  • .07

.21 .00 .99 bowmax .60

  • .29
  • .33

.30

  • .18
  • .05

.72 bowdist .73 .03

  • .28

.10 .10 .03 .88 whorls .78

  • .38
  • .16
  • .22
  • .15
  • .16

.93 clear

  • .02

.32

  • .10

.85 .33 .16 1.0 knots

  • .24

.53 .13

  • .32

.57

  • .15

1.0 diaknot

  • .23

.48

  • .45
  • .32
  • .08

.57 1.0 9 / 30

slide-10
SLIDE 10

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Example: the Pitprop data

Table: Jeffers’s Pitprop data: Rotated loadings by varimax and their interpretation by normalizing each column, and then, taking loadings greater than .59 only

VARIMAX loadings Normalized loadings greater than .55 Vars 1 2 3 4 5 6 1 2 3 4 5 6 topdiam .91 .26

  • .01

.03 .01 .08 .97 length .94 .19

  • .00

.03 .00 .10 1.0 moist .13 .96

  • .14

.08 .08 .04 1.0 testsg .13 .95 .24 .03 .06

  • .03

.98

  • vensg
  • .14

.03 .90

  • .03
  • .18
  • .03

1.0 ringtop .36 .19 .61

  • .03

.28

  • .49

.68 ringbut .62

  • .02

.47

  • .13
  • .01
  • .55

.66 bowmax .54

  • .10
  • .10

.11

  • .56
  • .23
  • .64

bowdist .77 .03

  • .03

.12

  • .16
  • .12

.82 whorls .68

  • .10

.02

  • .40
  • .35
  • .34

.73 clear .03 .08

  • .04

.97 .00

  • .00

1.0 knots

  • .06

.14

  • .14

.04 .87 .09 1.0 diaknot .10 .04

  • .07
  • .01

.15 .93 1.0 10 / 30

slide-11
SLIDE 11

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Analyzing high-dimensional multivariate data Abandoning the rotation methods

Alternative to rotation: modify PCA to produce explicitly simple principal components

The first method to directly construct sparse components was proposed by Hausman (1982): it finds PC loadings from a prescribed subset of values, say S = {−1, 0, 1} Jolliffe & Uddin (2000) were the first to modify the

  • riginal PCs to additionally satisfy the Varimax criterion

(simplified component technique, SCoT) Jolliffe, Trendafilov & Uddin (2003) were the first to modify the original PCs to additionally satisfy the LASSO constraint, which drives many loadings to exact zeros (SCoTLASS)

11 / 30

slide-12
SLIDE 12

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Analyzing high-dimensional multivariate data Algorithms for sparse component analysis

A great number of efficient numerical procedures:

Zou, Hastie & Tibshirany (2006) transform the standard PCA into a regression form to propose fast algorithm (SPCA) for sparse PCA, applicable to large data Moghaddam, Weiss & Avidan (2006) use spectral bounds

  • f submatrices of the sample correlation matrix to identify

the subset of m variables explaining the maximum variance among all possible subsets of size m d’Aspremont, Ghaoui, Jordan & Lanckriet (2007) replace LASSO by cardinality constraint and apply semidefinite programming, SDP (sound theory, but not very fast!) d’Aspremont, Bach, & Ghaoui (2008) another SDP relaxation to construct more efficient greedy algorithm than d’Aspremont, et al. (2007) and Moghaddam, et al. (2006).

12 / 30

slide-13
SLIDE 13

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Analyzing high-dimensional multivariate data Algorithms for sparse component analysis

...more numerical procedures for sparse PCs:

Shen & Huang (2008) use the connection between PCA and SVD of the data, and extract sparse PCs through solving a low rank matrix approximation problem subject to regularization penalties which promote sparsity in PC loadings (sparse PCA via regularized SVD, sPCA-rSVD) Journ´ ee, Nesterov, Richt´ arik & Sepulchre (2008) treat both LASSO and cardinality constraints (most elegant!), with generalized power-like method Witten, Tibshirany & Hastie (2009) proposed a general algorithm for sparse SVD, which generalizes and simplifies the previous sparse PCA definitions given by SCoTLASS, SPCA and sPCA-rSVD; can be applied for sparse CCA

13 / 30

slide-14
SLIDE 14

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Analyzing high-dimensional multivariate data Algorithms for sparse component analysis

...and the very recent algorithms for sparse PCs:

Sriperumbudur, Torres, & Lanckriet, (2011) consider sparse version of the generalized EVD to be used for PCA, CCA and LDA (n > p). Instead of constraining the cardinality a0 of the loadings a ∈ Rp or a1, they use: aε =

p

  • i

log(1 + |ai|/ε) log(1 + 1/ε) , for some small ε Qi, Luo, & Zhao, (2012) construct sparse PCs making use of a2

λ = (1 − λ)a2 2 + λa2

  • 1. In contrast to

SCoTLASS and other methods, the constraint set is strictly convex for 0 ≤ λ < 1, which gives unique sparse

  • PCs. The solution is found by a new type of thresholding,

different from the popular soft- and hard-thresholding.

14 / 30

slide-15
SLIDE 15

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Analyzing high-dimensional multivariate data Algorithms for sparse component analysis

Major weakness of the existing methods is that:

they produce sparse loadings that are not completely

  • rthonormal, and

the corresponding sparse components are correlated. Only SCoTLASS (Jolliffe et al., 2003; Trendafilov and Jolliffe, 2006) and the method recently proposed by Qi et

  • al. (2012) are capable to produce either orthonormal

loadings or uncorrelated sparse components. In the next Section new definitions of sparse PCA will be considered some of which can result simultaneously in nearly

  • rthonormal loadings and nearly uncorrelated sparse

components.

15 / 30

slide-16
SLIDE 16

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Analyzing high-dimensional multivariate data Taxonomy of PCA subject to ℓ1 constraint (LASSO)

Types of problems seeking for sparse minimizers x

  • f f (x) through the ℓ1 norm:

Wright, S. (2011). Gradient algorithms for regularized

  • ptimization, SPARS11, Edinburgh, Scotland,

http://pages.cs.wisc.edu/˜swright Weighted form: min f (x) + τx1, for some τ > 0; ℓ1-constrained form (variable selection): min f (x) subject to x1 ≤ τ; Function-constrained form: min x1 subject to f (x) ≤ ¯ f .

16 / 30

slide-17
SLIDE 17

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Analyzing high-dimensional multivariate data Taxonomy of PCA subject to ℓ1 constraint (LASSO)

...and restated accordingly for sparse PCA:

For a given p × p correlation matrix R find vector of loadings a, (a2 = 1), by solving one of the following: Weighted form: max a⊤Ra + τa1, for some τ > 0. ℓ1-constrained form (variable selection): max a⊤Ra subject to a1 ≤ τ, τ ∈ [1, √p]. Function-constrained form: min a1 subject to a⊤Ra ≤ λ, where λ is eigenvalue of R.

17 / 30

slide-18
SLIDE 18

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Function-constrained sparse components Orthonormal sparse loadings and correlated components

Weakly correlated sparse components (WCPC)

The following version of sparse PCA: min

A⊤A=Ir A1 + µA⊤RA − D2F ,

is seeking for sparse loadings A which additionally diagonalize R, i.e. they are supposed to produce sparse components which are as weakly correlated as possible. D2 is diagonal matrix of the original PCs’ variances.

18 / 30

slide-19
SLIDE 19

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Function-constrained sparse components Orthonormal sparse loadings and correlated components

Sparse components approximating the PCs variances (VarFit)

The next version of sparse PCA is: min

A⊤A=Ir A1 + µdiag(A⊤RA) − D2F ,

in which the variances of the sparse components should fit better the initial variances D2, without paying attention to the

  • ff-diagonal elements of R. As a result A⊤RA is expected to

be less similar to a diagonal matrix than in WCPC and the resulting sparse components – more correlated.

19 / 30

slide-20
SLIDE 20

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Function-constrained sparse components Orthonormal sparse loadings and correlated components

Table: Function-constrained sparse loadings, Jeffers’s Pitprop data

Sparse component loadings A, WCPC (45/78) Sparse component loadings A, VarFit (63/78) Var 1 2 3 4 5 6 1 2 3 4 5 6 topdiam

  • .477

.136

  • .471

length

  • .537

.099 .001

  • .484

moist .001 .331

  • .087

.707 testsg

  • .071
  • .001

.880 .707

  • vensg
  • .036
  • .619

.749 .044 .663 ringtop

  • .128
  • .292

.743 ringbut

  • .247
  • .109

.224

  • .329

.089 bowmax

  • .179
  • .338
  • .309

bowdist

  • .460

.001

  • .405

whorls

  • .389

.001

  • .418

clear

  • 1.00

1.00 knots .057 .025 .030 .971

  • 1.00

diaknot .043 .700 .663 1.00 %Var 29.5 12.7 12.2 7.7 6.6 6.1 27.8 14.8 13.1 7.7 7.7 7.7 %Cvar 29.5 42.2 54.4 62.1 68.7 74.8 27.8 42.6 55.7 63.4 71.1 78.8 %Cvaradj 29.5 42.1 54.2 61.8 68.0 73.8 27.8 41.0 53.7 61.1 68.0 74.3 Comp Correlations among sparse components Correlations among sparse components 1 1.0 .09

  • .07
  • .01

.06 .07 1.0

  • .18
  • .27
  • .21

.11

  • .02

2 .09 1.0

  • .02
  • .08

.06

  • .03
  • .18

1.0 .19

  • .20

.07

  • .13

3

  • .07
  • .02

1.0

  • .09

.23 .17

  • .27

.19 1.0 .08

  • .34

.08 4

  • .01
  • .08
  • .09

1.0 .01 .07

  • .21
  • .20

.08 1.0

  • .18

.03 5 .06 .06 .23 .01 1.0 .00 .11 .07

  • .34
  • .18

1.0

  • .01

6 .07

  • .03

.17 .07 .00 1.0

  • .02
  • .13

.08 .03

  • .01

1.0 20 / 30

slide-21
SLIDE 21

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Function-constrained sparse components Orthonormal sparse loadings and correlated components

Sparse components approximating the total variance (TvarFit)

Instead of fitting the variances of the individual sparse PCs to the initial ones as in VarFit, one can consider sparse PCs which total variance fits the total variance of the first r PCs: min

A⊤A=Ir A1 + µ[trace(A⊤RA) − trace(D)]2 .

The obvious drawback of this sparse PCA formulation is that the resulting sparse PCs will not be ordered according to the magnitudes of their variances. As the total variance is fitted

  • nly, one expects that the explained variance will be higher

than with the previous formulations WCPC and VarFit.

21 / 30

slide-22
SLIDE 22

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Function-constrained sparse components Orthonormal sparse loadings and correlated components

Sequential sparse components approximating the PCs variance (sVarFit)

Finally, problem VarFit can be rewritten in the following vectorial form: min

a⊤a = 1 a⊥Ai−1

a1 + µ(a⊤Ra − d2

i )2 ,

where A0 := 0 and Ai−1 = [a1, a2, ..., ai−1].

22 / 30

slide-23
SLIDE 23

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Function-constrained sparse components Orthonormal sparse loadings and correlated components

Table: Function-constrained sparse loadings, Jeffers’s Pitprop data

Sparse component loadings A, TvarFit (57/78) Sparse component loadings A, sVarFit (63/78) Var 1 2 3 4 5 6 1 2 3 4 5 6 topdiam .592 .472 length .606 .485 moist .707 .707 testsg .707 .707

  • vensg

1.00 .999 ringtop .006

  • .633

.132 ringbut

  • .618

.084 .382 bowmax

  • .617

.001 .253 bowdist .466 .383 whorls

  • .440
  • .001
  • .009

.175 .410

  • .001

.001 clear 1.00

  • 1.00

knots .652 .118 1.00 diaknot .465 .116 1.00 %Var 14.5 13.8 7.7 7.7 16.1 21.8 30.06 14.48 7.70 7.69 7.68 7.68 %Cvar 14.5 28.3 36.0 43.7 59.8 81.5 30.06 44.54 52.24 59.93 67.61 75.29 %Cvadj 14.5 28.0 35.6 43.1 55.0 68.3 30.06 43.96 51.49 58.95 65.72 72.83 Comp Correlations among sparse components Correlations among sparse components 1 1.00

  • .14
  • .13
  • .04

.13

  • .29

1.0

  • .20
  • .03

.14

  • .20
  • .02

2

  • .14

1.0 .09

  • .08

.36

  • .43
  • .20

1.0

  • .13

.07

  • .20

.04 3

  • .13

.09 1.0

  • .09

.07 .04

  • .03
  • .13

1.0

  • .01

.03 .09 4

  • .04
  • .08
  • .09

1.0

  • .35
  • .10

.14 .07

  • .01

1.0

  • .18
  • .21

5 .13 .36 .07

  • .35

1.0

  • .35
  • .20
  • .20

.03

  • .18

1.0 .15 6

  • .29
  • .43

.04

  • .10
  • .35

1.0

  • .02

.04 .09

  • .21

.15 1.0 23 / 30

slide-24
SLIDE 24

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Function-constrained sparse components Uncorrelated sparse components

Sequential sparse components approximating the PCs variance (sVarFitOb)

The problem sVarFit can be modified for obtaining uncorrelated sparse components (however, loosing A⊤A = Ir) as follows: min

a⊤a = 1 Ra⊥Ai−1

a1 + µ(a⊤Ra − d2

i )2 ,

where A0 := 0 and Ai−1 = [a1, a2, ..., ai−1].

24 / 30

slide-25
SLIDE 25

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Function-constrained sparse components Uncorrelated sparse components

Weakly correlated sparse components with oblique loadings (WCPCob)

Along with problem WCPC, one can consider solving the following matrix optimization problem: min

diag(A⊤A)=Ir

A1 + µA⊤RA − D2F . As in WCPCT, these components are approximately

  • uncorrelated. This sparse PCA formulation is very interesting,

because, for some reason (not completely explained yet), the resulting loadings A stay nearly orthonormal.

25 / 30

slide-26
SLIDE 26

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Function-constrained sparse components Uncorrelated sparse components

Table: Function-constrained sparse loadings, Jeffers’s Pitprop data

Sparse component loadings A, sVarFitOb (42/78) Sparse component loadings A, WCPCob (50/78) Var 1 2 3 4 5 6 1 2 3 4 5 6 topdiam

  • .472
  • .234
  • .001

.491 .189 .001 .001 length

  • .485
  • .246
  • .146

.258 .518 .215 .216 .001 moist .542

  • .806
  • .001

.075 testsg .808 .154

  • .288

.071

  • .551
  • vensg

.562 .357

  • .687
  • .347

.852 ringtop

  • .132

.375 .001 .065

  • .385

ringbut

  • .382

.186

  • .059

.001 .313

  • .216
  • .226

bowmax

  • .253

.278 bowdist

  • .383

.404 .001 .001 whorls

  • .410
  • .178

.001

  • .063

.385 clear

  • .981

.031

  • .153

1.00 knots .850 .220

  • .971

diaknot

  • .627

.105

  • .669

.776 .523 %Var 30.1 13.8 13.1 7.4 5.4 5.7 29.3 13.4 12.4 7.7 6.7 6.3 %Cvar 30.1 43.9 57.00 64.4 69.8 75.5 29.3 42.8 55.2 62.9 69.5 75.8 Comps A⊤A A⊤A \A⊤RA 1 1.0 .11 .14 .02

  • .13

.03 1.0

  • .03
  • .05
  • .01

.04

  • .06

2 .11 1.0 .04

  • .08

.14 .03 .11 1.0

  • .03

.08

  • .02
  • .05

3 .14 .04 1.0 .12

  • .27

.07 .11 .05 1.0

  • .13

.14

  • .06

4 .02

  • .08

.12 1.0

  • .08

.09 .00

  • .00

.00 1.0 .01

  • .08

5

  • .13

.14

  • .27
  • .08

1.0

  • .08
  • .07

.05

  • .06

.00 1.0 .02 6 .03 .03 .07 .09

  • .08

1.0 .00 .11 .00 .00 .00 1.0 26 / 30

slide-27
SLIDE 27

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Application to simple structure rotation

Revisiting sparse components fitting the total variance (TvarFit)

The obvious drawback of this sparse PCA formulation is that the resulting sparse components will not be ordered according to the magnitudes of their variances. However, they approximate the total variance of the initial PCs. Their features remind for the old rotation methods in PCA, where the rotated components are also not ordered according to the magnitudes of their variances, but they preserve exactly the total variance of the initial PCs. So, why not compare TvarFit with the rotation methods?

27 / 30

slide-28
SLIDE 28

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Application to simple structure rotation Thurstone’s 26 box problem

Thurstone’s 26-variable box data:

  • L. L. Thurstone collected 20 boxes and measured their

three dimensions x (length), y (width) and z (height):

# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x 3 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 y 2 2 3 3 3 2 2 3 3 3 4 4 4 2 2 3 3 4 4 4 z 1 2 1 2 3 1 2 1 2 3 1 2 3 1 2 2 3 1 2 3

The variables are 26 functions of x, y and z, i.e.: n = 20, p = 26 and k = 3. Harman, H. (1976) Modern Factor Analysis, 3rd. Edition., Table 8.9, p. 157.

28 / 30

slide-29
SLIDE 29

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Application to simple structure rotation Thurstone’s 26 box problem Vars Thurstone CLF Sparse (µ = 20) x1 .95 .01 .01 .994

  • .002
  • .007

1.04 x2 .02 .92 .01 .069 .945 .060 1.01 x3 .02 .05 .91 .080 .965 1.02 x1x2 .59 .64

  • .03

.643 .648

  • .004

.552 .616 x1x3 .60 .00 .62 .595 .024 .644 .539 .629 x2x3

  • .04

.60 .58

  • .016

.623 .648 .563 .576 x2

1 x2

.81 .38 .01 .838 .393 .014 .854 .279 x1x2

2

.35 .79 .01 .393 .817 .044 .219 .876 x2

1 x3

.79

  • .01

.41 .788

  • .001

.419 .811 .320 x1x2

3

.40

  • .02

.79 .403 .805 .270 .848 x2

2 x3

  • .04

.74 .40

  • .010

.773 .461 .792 .308 x2x2

3

  • .02

.41 .74

  • .022

.456 .786 .325 .786 x1/x2 .74

  • .77

.06 .734

  • .813

.886

  • .919

x2/x1

  • .74

.77

  • .06
  • .734

.813

  • .886

.919 x1/x3 .74 .02

  • .73

.814

  • .819

.929

  • .912

x3/x1

  • .74
  • .02
  • .73
  • .814

.819

  • .929

.912 x2/x3

  • .07

.80

  • .76

.826

  • .781

.980

  • .953

x3/x2 .07

  • .80

.76

  • .826

.781

  • .980

.953 2x1 + 2x2 .51 .70

  • .03

.556 .716

  • .011

.425 .712 2x1 + 2x3 .56

  • .04

.69 .556

  • .001

.687 .478 .680 2x2 + 2x3

  • .02

.60 .58 .002 .628 .637 .575 .565

  • x2

1 + x2 2

.50 .69

  • .03

.546 .708 .001 .418 .706

  • x2

1 + x2 3

.52

  • .01

.68 .526 .010 .683 .443 .683

  • x2

2 + x2 3

  • .01

.60 .55 .022 .629 .606 .587 .533 x1x2x3 .43 .46 .45 .458 .493 .472 .357 .439 .415

  • x2

1 + x2 2 + x2 3

.31 .51 .46 .348 .542 .494 .205 .512 .447 # zeros 3 3 2 9 9 9 29 / 30

slide-30
SLIDE 30

ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Application to simple structure rotation Twenty-four psychological tests

Table: Loadings matrices for HH24 data from two algorithms.

CLF Sparse (µ = 5) Var 1 2 3 4 1 2 3 4 1 .130 .731 .773 2

  • .073
  • .010

.001 .618 .500 3 .015 .056

  • .149

.690

  • .001

.583 4

  • .092

.160

  • .003

.611 .619 5 .767 .170

  • .855

6 .115 .803 .034

  • .863

7

  • .052

.859 .098

  • .015
  • .888

8

  • .021

.583 .178 .199

  • .741

.001 9 .104 .848

  • .013

.013

  • .883

10 .001 .011 .901

  • .211

.968

  • .224

11 .259 .643 .743 12

  • .095
  • .186

.833 .143 .060 .819 13

  • .135

.606 .381 .724 .001 14 .676 .140 .016

  • .033

0.763 15 .667 .003

  • .001

.088 0.768 16 .525

  • .095
  • .046

.502 .624 17 .647 .217

  • .001

0.729 18 .405

  • .223

.360 .351 .311 .280 19 .358 .006 .149 .252 .001 .435 20 .198 .345 .388 .705 21 .024 .001 .467 .400 .359 .357 22 .190 .322 .048 .367 .707 23 .066 .294 .162 .478 .786 24 .127 .257 .540 .060

  • .075

.658 .001 # zeros 4 2 2 2 21 17 15 9 30 / 30