Jun S. Liu Department of Statistics Harvard University
Joint work with Bo Jiang
Sliced Inverse Regression with Interaction (Siri) Detection
for non-Gaussian BN learning
1
Sliced Inverse Regression with Interaction (Siri) Detection for - - PowerPoint PPT Presentation
Sliced Inverse Regression with Interaction (Siri) Detection for non-Gaussian BN learning Jun S. Liu Department of Statistics Harvard University Joint work with Bo Jiang 1 General: Regression and Classification Responses Covariates Ind 1 x
1
2
Ind 1 Ind 2 Ind N
Responses
Y1 Y2 YN
x11, x12, …, x1p x21, x22, …, x2p xN1, xN2, …, xNP
Covariates
3
p
4
p
5
6
7
Ind 1 Ind 2 Ind N
Responses
Y1 Y2 YN
x11, x12, …, x1p x21, x22, …, x2p xN1, xN2, …, xNP
How to model this? Covariates
X1 X2 X3 Xm Y
8
9
X1 X2 X3 Y X4 X6 X5 (Pearl 1988; Friedman 1997) TAN (tree-augmented naïve Bayes)
10
X11 X12 X13 Y X2.11 X2.12 X2.13 X01 X02 Group 0 Group 1 Group 21 X2.21 X2.22 Group 22
11
13
p
T X ,... ,β K T X ,ϵ )
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
###ëëʹà”‹åf
30
− 1/2{ X − E X }
− 1∑ i∈ H s zi, and
− 1∑ s= 1 S
T
− 1/2 ̂
31
2
2)
2 32
A the k th
A+ j= n( ̂λ k A+ j− ̂λ k A)
A+ j 33
A the k th
A+ j(k= 1,... , K) are asymptotically
2(1) ,
A+ j= ∑ k= 1 K
A+ jis asymptotically χ 2( K)
A+ j= n( ̂λ k A+ j− ̂λ k A)
A+ j 34
A the k th
A+ j= n( ̂λ k A+ j− ̂λ k A)
A+ j
r) ,r< 1/2
A+ j(k= 1,... , K) are asymptotically
2(1) ,
A+ j= ∑ k= 1 K
A+ jis asymptotically χ 2( K) 35
A the k th
A+ j= n( ̂λ k A+ j− ̂λ k A)
A+ j
r) ,r< 1/2
A+ j(k= 1,... , K) are asymptotically
2(1) ,
A+ j= ∑ k= 1 K
A+ jis asymptotically χ 2( K) 36
C, d=∣A∣
37
38
C, d=∣A∣
39
C, d=∣A∣
K ,belongs to a K -dimensional affine space(K< d)
40
K coincides with SIR directions
K ,belongs to a K -dimensional affine space(K< d)
C, d=∣A∣
41
K coincides with SIR directions
K ,belongs to a K -dimensional affine space(K< d)
42
K coincides with SIR directions
K ,belongs to a K -dimensional affine space(K< d)
C, d=∣A∣
43
K coincides with SIR directions
M 1( X j∣ X A,Y )
M 0( X j∣X A ,Y )
K ,belongs to a K -dimensional affine space(K< d)
C, d=∣A∣
44
K
A+ j)− ∑ k= 1 K
A))
K
A+ j− ̂λ k A
A+ j)
A+ j= n( ̂λ k A+ j− ̂λ k A)
A+ j
2(1), ( ̂λ k A+ j− ̂λ k A)
A+ j
45
K
A+ j)− ∑ k= 1 K
A))
K
A+ j− ̂λ k A
A+ j)
A+ j= n( ̂λ k A+ j− ̂λ k A)
A+ j
2(1), ( ̂λ k A+ j− ̂λ k A)
A+ j
A+ j= ∑ k= 1 K
A+ j→p χ 2(K)
46
47
48
53
54
55
56
57
58
59
i− j∣
Method FP(0, 990) FN(0, 10) SIRI-C
[CV minimizing classification error]
1.86 (0.222) 1.66 (0.117) SIRI-M
[CV minimizing mean square error]
0.76 (0.120) 1.75 (0.114) COP 1.62 (0.165) 1.67 (0.118) SIS-SCAD 0.10 (0.030) 0.64 (0.069) LASSO 5.40 (0.188) 0.00 (0.000)
j= 1 8
Method FP(0, 992) FN(0, 8) SIRI-C 2.00 (0.163) 0.42 (0.138) SIRI-M 0.43 (0.079) 4.60 (0.274) COP 1.26 (0.128) 3.32 (0.192) SIS-SCAD 3.23 (0.356) 8.00 (0.000) LASSO 0.64 (0.255) 8.00 (0.000)
Method FP(0, 997) FN(0, 3) SIRI-C 0.39 (0.115) 0.12 (0.046) SIRI-M 0.03 (0.017) 0.04 (0.020) SIS-SCAD-2 0.00 (0.000) 0.45 (0.068)
l Cross-validation to select the dimension and thresholds l Back to full Bayesian model with dynamic slicing l We want to have flexibility in choosing slicing boundaries l Connection with Mutual-Information Criterion (MIC) l Many interesting possibilities l Robustness to the distribution of predictors
78
79