Supervised Principal Component Regression for Functional Data with High Dimensional Predictors
Xinyi(Cindy) Zhang
University of Toronto xyi.zhang@mail.utoronto.ca
July 10, 2018
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 1 / 32
Supervised Principal Component Regression for Functional Data with - - PowerPoint PPT Presentation
Supervised Principal Component Regression for Functional Data with High Dimensional Predictors Xinyi(Cindy) Zhang University of Toronto xyi.zhang@mail.utoronto.ca July 10, 2018 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 1
Xinyi(Cindy) Zhang
University of Toronto xyi.zhang@mail.utoronto.ca
July 10, 2018
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 1 / 32
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 2 / 32
1
Motivation
2
Methodology SPCR
3
Theoretical Properties Equivalence Estimation Convergence
4
Numerical Studies Simulation Real Data Application
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 3 / 32
Functional magnetic resonance imaging (fMRI) is a noninvasive technique for studying brain activity.
Image courtesy of the Rebecca Saxe laboratory, MIT news, http://news.mit.edu/2011/brain-language-0301 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 4 / 32
fMRI dataset of each subject contains a time series of 3-D images.
(a) (b)
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 5 / 32
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 6 / 32
Collection of a large dimensional set of clinical/demographic variables.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 6 / 32
Collection of a large dimensional set of clinical/demographic variables. Association hasn’t been well understood.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 6 / 32
PCA
Principal component analysis (PCA) can be applied to extract a lower-dimensional subspace that captures the most of variation in the covariates.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 7 / 32
Potential problems
But PCA fails to capture any information when the principal subspace extracted from the covariates is orthogonal to the vectors of regression parameters.
Supervised Principal Component Regression
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 8 / 32
Some notations
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32
Some notations
Covariance matrix Σx = E(XXT); cross-covarinace matrix Σxy =
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32
Some notations
Covariance matrix Σx = E(XXT); cross-covarinace matrix Σxy =
{(Xi, Yi(t)), i = 1, . . . , n} iid ∼ {X, Y(t)}.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32
Some notations
Covariance matrix Σx = E(XXT); cross-covarinace matrix Σxy =
{(Xi, Yi(t)), i = 1, . . . , n} iid ∼ {X, Y(t)}. Empirical estimation Σx = n−1XTX, where X = (X1, . . . , Xn)T ∈ Rn×p.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32
Some notations
Covariance matrix Σx = E(XXT); cross-covarinace matrix Σxy =
{(Xi, Yi(t)), i = 1, . . . , n} iid ∼ {X, Y(t)}. Empirical estimation Σx = n−1XTX, where X = (X1, . . . , Xn)T ∈ Rn×p. Empirical estimation Σxy = n−2
T XTY(t)Y(t)TXdt, where
Y(t) = (Y1(t), . . . , Yn(t))T ∈ Rn.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32
Start with p < n. Regressing Y(t) on the projection XTw1, the optimal regression function γ∗(t) is the minimizer of the expected integrated residual sum of squares defined as IRSS =
E
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 10 / 32
= ⇒
γ∗(t) = {wT
1 E(XXT)w1}−1wT 1 E{XY(t)}.
Plugging in γ∗(t) into IRSS yields
IRSS(γ∗) =
(E{YT(t)Y(t)} − [E{XY(t)}]Tw1{wT
1 E(XXT)w1}−1wT 1 [E{XY(t)}])dt.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 11 / 32
Among all the possible directions of w1, the one minimizing IRSS(γ∗) satisfies
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 12 / 32
Among all the possible directions of w1, the one minimizing IRSS(γ∗) satisfies
Proposition
If w10 is a minimizer of IRSS(γ∗), then w10 satisfies w10 = arg max
w1
wT
1 Σxyw1
wT
1 Σxw1
. (2.1)
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 12 / 32
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 13 / 32
For c = 0, cw10 is also a maximizer of equation (2.1).
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 13 / 32
For c = 0, cw10 is also a maximizer of equation (2.1). Another constraint wT
1 Σxw1 =1 to adjust the effect of potential
different scales in the predictor space.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 13 / 32
A sequence of generalized Rayleigh quotient problems (NP hard): w∗
k
= arg maxwk wk
TΣxy wk s.t.
wk
TΣx wk = 1, wk TΣx w∗ j
= 0, where 1 ≤ j < k. Define W∗ = (w∗
1 , . . . , w∗ K )
Equivalent optimization problem: W∗ = max
W∈Rp×K tr(WTΣxy W) s.t.
WTΣx W = IK Convex simultaneous regression problem: Σxy = UUT + Σǫ = K
i=1 λi vi vT i
+ Σǫ, V∗ = argmin
V 1 2 U − Σx V2 F
The Rayleigh quotient problems − → a convex simultaneous regression problem which recovers the same principal space, i.e. V∗V∗T = W∗W∗T under some mild conditions.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 14 / 32
In reality, the covariance matrices Σx and Σxy are unknown, and the
V
1 2 U − ΣxV2
F,
and U satisfies Σxy = U UT + Σǫ = B + Σǫ.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 15 / 32
When p is relatively large compared with n, or p > n, by adding an ℓ1 penalty to our reformulated problem, one can easily estimate V
V
1 2 U − ΣxV2
F + λV1,1
where · 1,1 denotes (A·11, A·21, · · · , A·m1)1, for a matrix A ∈ Rn×m.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 16 / 32
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32
LASSO.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32
LASSO. Extended BIC (Chen and Chen, 2008) to select λK for fixed K.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32
LASSO. Extended BIC (Chen and Chen, 2008) to select λK for fixed K. 5-fold CV to select K.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32
To make the signal and residual separable with respect to Σxy, we need the separability condition: λmin(Σ−1/2
xy
(UUT)Σ−1/2
xy
) > λmax(Σ−1/2
xy
ΣǫΣ−1/2
xy
).
Theorem (Equivalence)
When p < n, V = span(V∗) can recover W = span(W∗) exactly, that is V = W or equivalently V∗V∗T = W∗W∗T if the separability condition holds.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 18 / 32
Theorem (Estimation Error)
Under proper conditions, with probability going to 1, V converges to V∗.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 19 / 32
Simulation I
Y(t) = Xβ(t) + ǫ(t).
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32
Simulation I
Xi
iid
∼ Np(0, Σ), where Σjj′ = 0.5|j−j′| for 1 ≤ j, j′ ≤ p.
Y(t) = Xβ(t) + ǫ(t).
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32
Simulation I
Xi
iid
∼ Np(0, Σ), where Σjj′ = 0.5|j−j′| for 1 ≤ j, j′ ≤ p. Compact support T = [0, 1].
Y(t) = Xβ(t) + ǫ(t).
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32
Simulation I
Xi
iid
∼ Np(0, Σ), where Σjj′ = 0.5|j−j′| for 1 ≤ j, j′ ≤ p. Compact support T = [0, 1]. ǫi(t)
iid
∼ a gaussian process with mean 0 and covariance function K(s, t) = exp{−3(s − t)2} for 0 ≤ s, t ≤ 1.
Y(t) = Xβ(t) + ǫ(t).
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32
Simulation I
Xi
iid
∼ Np(0, Σ), where Σjj′ = 0.5|j−j′| for 1 ≤ j, j′ ≤ p. Compact support T = [0, 1]. ǫi(t)
iid
∼ a gaussian process with mean 0 and covariance function K(s, t) = exp{−3(s − t)2} for 0 ≤ s, t ≤ 1. β(t) = V∗γ(t), where V∗ = (V1, V2, V3)T ∈ Rp×3; γ(t) = (γ1(t), γ2(t), γ3(t))T.
V1 = (1, 1, 0, 0, · · · , 0, 0, 0, 0)T V2 = (0, 0, 1, 1, 0, 0, · · · , 0, 0)T V3 = (0, 0, 0, 0, · · · , 0, 0, 1, 1)T γ1(t) = 5cos(πt) γ2(t) = 7cos(2πt) γ3(t) = 9cos(3πt) + 5sin2(3πt)
Y(t) = Xβ(t) + ǫ(t).
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32
Simulation I
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 21 / 32
Simulation I
Independent testing dataset (Y∗
i (t), X∗ i ) with size n∗ = 5000.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 21 / 32
Simulation I
Independent testing dataset (Y∗
i (t), X∗ i ) with size n∗ = 5000.
Optimal direction V; least square estimates β
V(t) obtained from
regressing Y(t) on X V.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 21 / 32
Simulation I
(n, p) Method Prediction Error
F
(50, 200) SPCR 12.2496 (0.2776) 0.0657 (0.0039) 5.1000 (0.0915) UPCR 445.0968 (3.8092) NA NA (100, 200) SPCR 8.5161 (0.1474) 0.0319 (0.0015) 3.9400 (0.0632) UPCR 414.3093 (5.6964) NA NA (500, 200) SPCR 6.0579 (0.0569) 0.0059 (0.0003) 3.0600 (0.0238) UPCR 367.6252 (5.8318) NA NA (100, 1000) SPCR 10.5444 (0.1598) 0.0365 (0.0011) 5.9800 (0.0425) UPCR 481.8148 (0.9669) NA NA (200, 1000) SPCR 7.0769 (0.1084) 0.0135 (0.0006) 4.5100 (0.0771) UPCR 427.6922 (0.7265) NA NA
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 22 / 32
Simulation II
In this simulation, we consider V∗ as non-sparse and not residing in the low dimensional space.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 23 / 32
Simulation II
In this simulation, we consider V∗ as non-sparse and not residing in the low dimensional space. Set βj(t) = 5cos{(j+9)
100 πt} for 1 ≤ j ≤ p.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 23 / 32
Simulation II
In this simulation, we consider V∗ as non-sparse and not residing in the low dimensional space. Set βj(t) = 5cos{(j+9)
100 πt} for 1 ≤ j ≤ p.
Other settings same as simulation I.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 23 / 32
Simulation II
(n, p) Method Prediction Error (50, 200) SPCR 315.9588 (3.1784) UPCR 408.2328 (4.0381) (100, 200) SPCR 146.7840 (1.5825) UPCR 349.1845 (5.1075) (100, 1000) SPCR 463.9981 (1.3050) UPCR 492.6754 (4.9588) (200, 1000) SPCR 337.8412 (1.2150) UPCR 380.4472 (2.7625) (500, 200) SPCR 11.8829 (0.0876) UPCR 330.2885 (5.8221)
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 24 / 32
Human Connectome Project fMRI data Application
Application to the cortical surface motor task related fMRI data from HCP Dataset:
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 25 / 32
Human Connectome Project fMRI data Application
Application to the cortical surface motor task related fMRI data from HCP Dataset:
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 25 / 32
Human Connectome Project fMRI data Application
Application to the cortical surface motor task related fMRI data from HCP Dataset:
data.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 25 / 32
Human Connectome Project fMRI data Application
Application to the cortical surface motor task related fMRI data from HCP Dataset:
data.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 25 / 32
Human Connectome Project fMRI data Application
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 26 / 32
Human Connectome Project fMRI data Application
Data:
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 27 / 32
Human Connectome Project fMRI data Application
Data: Average the blood oxygenation level dependent (BOLD) signals of all pixels in the left caudal anterior cingulate region − → Yi(t).
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 27 / 32
Human Connectome Project fMRI data Application
Data: Average the blood oxygenation level dependent (BOLD) signals of all pixels in the left caudal anterior cingulate region − → Yi(t). Covariates X ∈ R280 belongs to {demographic, language, emotion, motor and free surfer brain summary statistics}.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 27 / 32
Human Connectome Project fMRI data Application
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 28 / 32
Human Connectome Project fMRI data Application
Comparing with UPCR:
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 29 / 32
Human Connectome Project fMRI data Application
Comparing with UPCR: Randomly pick ⌊2n/3⌋ subjects as training data and the remaining as test data.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 29 / 32
Human Connectome Project fMRI data Application
Comparing with UPCR: Randomly pick ⌊2n/3⌋ subjects as training data and the remaining as test data. Repeat 100 times.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 29 / 32
Human Connectome Project fMRI data Application
Comparing with UPCR: Randomly pick ⌊2n/3⌋ subjects as training data and the remaining as test data. Repeat 100 times. Report the average prediction error of 100 repetitions.
SPCR : 239.04(1.10) UPCR : 243.48(1.03)
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 29 / 32
Human Connectome Project fMRI data Application
Comparing with UPCR: Randomly pick ⌊2n/3⌋ subjects as training data and the remaining as test data. Repeat 100 times. Report the average prediction error of 100 repetitions.
SPCR : 239.04(1.10) UPCR : 243.48(1.03)
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 29 / 32
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 30 / 32
Statistical inference: testing & confidence regions.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 30 / 32
Statistical inference: testing & confidence regions. Multiple functional responses.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 30 / 32
Statistical inference: testing & confidence regions. Multiple functional responses. Nonlinear relationship.
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 30 / 32
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 31 / 32
Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 32 / 32