Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris)
1
Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) - - PowerPoint PPT Presentation
Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1 Regularized generalized CCA A generalization to more than two blocks of regularized canonical correlation analysis 2 References Paper Arthur
1
2
3
Arthur & Michel Tenenhaus Regularized Generalized CCA Psychometrika (june 2011)
New package RGCCA with initial version 1.0 Title: Regularized Generalized Canonical Correlation Analysis Version: 1.0 Date: 2010-06-08 Author: Arthur Tenenhaus Repository: CRAN Date/Publication: 2010-10-15 14:58:02 More information about RGCCA at CRAN Path: /cran/new | permanent link
4
Agricultural inequality
GINI : Inequality of land distributions FARM : % farmers that own half
RENT : % farmers that rent all their land
Industrial development
GNPR : Gross national product per capita ($ 1955) LABO : % of labor force employed in agriculture
INST : Instability of executive (45-61) ECKS : Nb of violent internal war incidents (46-61) DEAT : Nb of people killed as a result of civic group violence (50-62) D-STAB : Stable democracy D-UNST : Unstable democracy DICT : Dictatorship
Gini Farm Rent Gnpr Labo Inst Ecks Deat Demo Argentine 86.3 98.2 32.9 374 25 13.6 57 217 2 Australie 92.9 99.6 * 1215 14 11.3 1 Autriche 74.0 97.4 10.7 532 32 12.8 4 2
France 58.3 86.1 26.0 1046 26 16.3 46 1 2
Yougoslavie 43.7 79.8 0.0 297 67 0.0 9 3
1 = Stable democracy 2 = Unstable democracy 3 = Dictatorship
X1 X2 X3
5
6
1 1 1 11 12 13
2 2 2 21 22
3 3 3 31 32 33 34 35 36
GINI FARM RENT GNPR LABO
Agricultural inequality (X1) Industrial development (X2)
ECKS DEAT D-STB D-INS INST DICT
Political instability (X3) Agr. ineq. Ind. dev. Pol. inst.
C13 = 1 C23 = 1 C12 = 0
7
1 2 3
1 1 3 3 2 2 3 3 , , 2
Maximize g(Cov( , )) g(Cov( , )) subject to the constraints (1 ) ( ) 1, 1,2,3
a a a j j j j j
X a X a X a X a a Var X a j
0 ≤ j ≤ 1, g = identity, square or absolute value
Method Criterion Constraints PLS regression
1 1 2 2
Maximize Cov( , ) X a X a
1 2
Canonical Correlation Analysis
1 1 2 2
Maximize Cor( , ) X a X a
1 1 2 2
Var( ) Var( ) 1 X a X a
Redundancy analysis of X1 with respect to X2
1/2 1 1 2 2 1 1
Maximize Cor( , )Var( ) X a X a X a
1 2 2
1 1 2 2
j j j j j
8
Components X1a1 and X2a2 are well correlated. No stability condition for 2nd component 1st component is stable
Method Criterion Comments PLS regression
1 1 2 2
1 2
a a 1
Maximize Cov( , )
X a X a
Is favoring too much stability with respect to correlation Canonical Correlation Analysis
1 1 2 2
Maximize Cor( , ) X a X a
Is favoring too much correlation with respect to stability
1 1 2 2
j j j j j
9
1 1 2 2
j j j j j
10
1,...,
, 1, 2
J
J jk j j k k a a j k j k j j j j j
where:
j k
jk
and:
j
11
12
13
Initial step
Outer Estimation (explains the block)
2
(1 ) ( ) 1
j j j j j
a Var X a
1 1
1 [( (1 ) ] 1 [( (1 ) ]
t t j j j j j j j t t t j j j j j j j j
I X X X z n a z X I X X X z n
Choice of inner weights ejk:
cjk = 1 if blocks are linked, 0 otherwise
Iterate until convergence
Inner Estimation (takes into account relations between blocks)
j jk k k j
SUMCOR (Horst, 1961) SSQCOR (Kettenring, 1971) SABSCOR (Mathes, 1993, Hanafi, 2004) MAXDIFF (Van de Geer, 1984) [SUMCOV] ( MAXDIFF B Hanafi & Kiers, 2006) [SSQCOV] SABSCOV (Krämer, 2007)
( ) 1 , ,
Cor( , )
j j
j j k k Var X a j k j k
Max X a X a
2 ( ) 1 , ,
Cor ( , )
j j
j j k k Var X a j k j k
Max X a X a
( ) 1 , ,
Cor( , )
j j
j j k k Var X a j k j k
Max X a X a
1 , ,
Cov( , )
j
j j k k a j k j k
Max X a X a
2 1 , ,
Cov ( , )
j
j j k k a j k j k
Max X a X a
1 , ,
Cov( , )
j
j j k k a j k j k
Max X a X a
14
(a) One second order block (b) Several second order blocks
1 1
1 1
J J J
Very often:
15
Method Criterion Constraints Hierarchical PLS regression
1 1
1 1 ,..., 1
Maximize g(Cov( , ))
J
J j j J J j
a a
X a X a 1, 1,..., 1
j
j J a Hierarchical Canonical Correlation Analysis
1 1
1 1 ,..., 1
Maximize g(Cor( , ))
J
J j j J J j
a a
X a X a Var( ) 1, 1,..., 1
j j
j J X a Hierarchical Redundancy analysis
j
X ’s with respect to
1 J
X
1 1
,..., 1/2 1 1 1
Maximize g(Cor( , )Var( ) )
J
J j j J J j j j
a a
X a X a X a
1 1
1, 1,..., Var( ) 1
j J J
j J
a X a Hierarchical Redundancy analysis
1 J
X with respect to the
j
X ’s
1 1
,..., 1/2 1 1 1 1 1
Maximize g(Cor( , )Var( ) )
J
J j j J J J J j
a a
X a X a X a
1
Var( ) 1, 1,..., 1
j j J
j J
X a a
g = identity, square or absolute value
Stable predictors and good prediction Good predictors and stable response
16
2 1 1 1 1
J j j j J J J j t j j j
17
X1 X2 XJ X1|X2|…|XJ y1 y2 yJ yJ+1 XJ+1 X1 X2 XJ X1|X2|…|XJ y1 y2 yJ yJ+1 XJ+1
Method Criterion Constraints SUMCOR (Horst, 1961)
1 1
1 1 ,..., 1
Maximize Cor( , )
J
J j j J J j
a a
X a X a
1 1
1 1 ,..., 1
Maximize Cor( , )
J
J j j J J j
a a
X a X a Var( ) 1, 1,..., 1
j j
j J X a Generalized CCA (Carroll, 1968a,b)
1 1 1 1
2 1 1 ,..., 1 2 1 1 1
Maximize Cor ( , ) Cov ( , )
J
J j j J J j J j j J J j J
a a
X a X a X a X a
1 1
Var( ) 1, 1,..., , 1 1, 1,...,
j j j
j J J j J J X a a
1 1,..., J J
18
X1 X2 XJ X1|X2|…|XJ y1 y2 yJ yJ+1 XJ+1 X1 X2 XJ X1|X2|…|XJ y1 y2 yJ yJ+1 XJ+1
1 1,..., J J
1 1
2 1 1 ,..., 1 1 1
J
J j j J J j j J J
a a
Special case of Carroll’s GCCA
19
cjk = 1 if response block Xk is connected to predictor block Xj, = 0 otherwise
20
1 1 1
,..., 1 1 2
Maximize c g(Cov( , )) subject to the constraints: (1 )Var( ) 1, 1,...,
J
J J jk j j k k j k J j j j j j
j J
a a
X a X a a X a
g = identity, square or absolute value
21
1 1 1
,..., 1 1
Maximize Cov( , ) subject to the constraints: 1, 1,...,
J
J J j j k k j k J j
j J
a a
X a X a a
22
23
Initial step
Outer Estimation (explains the block)
( ) 1
j j
Var X a
1 1
1 [ ] 1 [ ]
t t j j j j j t t t j j j j j j
X X X z n a z X X X X z n
Choice of inner weights ejk:
cjk = 1 if blocks are linked, 0 otherwise
Iterate until convergence
Inner Estimation (takes into account relations between blocks)
j jk k k j
24
Initial step
Outer Estimation (explains the block)
2
1
j
a
t j j j t j j
X z a X z
cjk = 1 if blocks are linked, 0 otherwise
Iterate until convergence
Choice of inner weights ejk:
Inner Estimation (takes into account relations between blocks)
j jk k k j
25
Initial step
Outer Estimation (explains the block)
2
1
j
a
t j j j t j j
X z a X z
cjk = 1 if blocks are linked, 0 otherwise
Iterate until convergence of the outer weights. (No proof, but almost always true in practice !?)
Choice of inner weights ejk:
Inner Estimation (takes into account relations between blocks)
j jk k k j
26
27