supervised principal component regression for functional
play

Supervised Principal Component Regression for Functional Data with - PowerPoint PPT Presentation

Supervised Principal Component Regression for Functional Data with High Dimensional Predictors Xinyi(Cindy) Zhang University of Toronto xyi.zhang@mail.utoronto.ca July 10, 2018 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 1


  1. Supervised Principal Component Regression for Functional Data with High Dimensional Predictors Xinyi(Cindy) Zhang University of Toronto xyi.zhang@mail.utoronto.ca July 10, 2018 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 1 / 32

  2. Joint work with Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 2 / 32

  3. Overview Motivation 1 Methodology 2 SPCR Theoretical Properties 3 Equivalence Estimation Convergence Numerical Studies 4 Simulation Real Data Application Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 3 / 32

  4. Motivation Functional magnetic resonance imaging (fMRI) is a noninvasive technique for studying brain activity. Image courtesy of the Rebecca Saxe laboratory, MIT news, http://news.mit.edu/2011/brain-language-0301 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 4 / 32

  5. Motivation fMRI dataset of each subject contains a time series of 3-D images. (a) (b) Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 5 / 32

  6. Motivation Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 6 / 32

  7. Motivation Collection of a large dimensional set of clinical/demographic variables. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 6 / 32

  8. Motivation Collection of a large dimensional set of clinical/demographic variables. Association hasn’t been well understood. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 6 / 32

  9. Related Methodology PCA Principal component analysis (PCA) can be applied to extract a lower-dimensional subspace that captures the most of variation in the covariates. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 7 / 32

  10. Related Methodology Potential problems But PCA fails to capture any information when the principal subspace extracted from the covariates is orthogonal to the vectors of regression parameters. ⇓ Supervised Principal Component Regression Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 8 / 32

  11. Methodology Some notations Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32

  12. Methodology Some notations Covariance matrix Σ x = E ( XX T ) ; cross-covarinace matrix � T E { XY ( t ) } [ E { XY ( t ) } ] T d t , where T is a compact support. Σ xy = Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32

  13. Methodology Some notations Covariance matrix Σ x = E ( XX T ) ; cross-covarinace matrix � T E { XY ( t ) } [ E { XY ( t ) } ] T d t , where T is a compact support. Σ xy = { ( X i , Y i ( t )) , i = 1 , . . . , n } iid ∼ { X , Y ( t ) } . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32

  14. Methodology Some notations Covariance matrix Σ x = E ( XX T ) ; cross-covarinace matrix � T E { XY ( t ) } [ E { XY ( t ) } ] T d t , where T is a compact support. Σ xy = { ( X i , Y i ( t )) , i = 1 , . . . , n } iid ∼ { X , Y ( t ) } . Empirical estimation � Σ x = n − 1 X T X , where X = ( X 1 , . . . , X n ) T ∈ R n × p . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32

  15. Methodology Some notations Covariance matrix Σ x = E ( XX T ) ; cross-covarinace matrix � T E { XY ( t ) } [ E { XY ( t ) } ] T d t , where T is a compact support. Σ xy = { ( X i , Y i ( t )) , i = 1 , . . . , n } iid ∼ { X , Y ( t ) } . Empirical estimation � Σ x = n − 1 X T X , where X = ( X 1 , . . . , X n ) T ∈ R n × p . Σ xy = n − 2 � Empirical estimation � T X T Y ( t ) Y ( t ) T X d t , where Y ( t ) = ( Y 1 ( t ) , . . . , Y n ( t )) T ∈ R n . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32

  16. Methodology Start with p < n . Regressing Y ( t ) on the projection X T w 1 , the optimal regression function γ ∗ ( t ) is the minimizer of the expected integrated residual sum of squares defined as � � � { Y ( t ) − X T w 1 γ ( t ) } T { Y ( t ) − X T w 1 γ ( t ) } I RSS = E dt . T Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 10 / 32

  17. Methodology = ⇒ 1 E ( XX T ) w 1 } − 1 w T γ ∗ ( t ) = { w T 1 E { XY ( t ) } . Plugging in γ ∗ ( t ) into I RSS yields � ( E { Y T ( t ) Y ( t ) } − [ E { XY ( t ) } ] T w 1 { w T 1 E ( XX T ) w 1 } − 1 w T I RSS ( γ ∗ ) = 1 [ E { XY ( t ) } ]) d t . T Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 11 / 32

  18. Methodology Among all the possible directions of w 1 , the one minimizing I RSS ( γ ∗ ) satisfies Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 12 / 32

  19. Methodology Among all the possible directions of w 1 , the one minimizing I RSS ( γ ∗ ) satisfies Proposition If w 10 is a minimizer of I RSS ( γ ∗ ) , then w 10 satisfies w T 1 Σ xy w 1 w 10 = arg max . (2.1) w T 1 Σ x w 1 w 1 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 12 / 32

  20. Methodology Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 13 / 32

  21. Methodology For c � = 0, c w 10 is also a maximizer of equation (2.1). Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 13 / 32

  22. Methodology For c � = 0, c w 10 is also a maximizer of equation (2.1). Another constraint w T 1 Σ x w 1 = 1 to adjust the effect of potential different scales in the predictor space. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 13 / 32

  23. Methodology Convex simultaneous regression problem: Σ xy = UU T + Σ ǫ = � K i = 1 λ i v i v T + Σ ǫ , i 2 � U − Σ x V � 2 1 V ∗ = argmin F V W ∗ = Equivalent optimization problem: W ∈ R p × K tr ( W T Σ xy W ) s.t. W T Σ x W = I K max A sequence of generalized Rayleigh quotient problems (NP hard): w ∗ T Σ xy w k s.t. = arg max w k w k k T Σ x w k = 1 , w k T Σ x w ∗ = 0 , where 1 ≤ w k j Define W ∗ = ( w ∗ 1 , . . . , w ∗ j < k . K ) The Rayleigh quotient problems − → a convex simultaneous regression problem which recovers the same principal space, i.e. V ∗ V ∗ T = W ∗ W ∗ T under some mild conditions. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 14 / 32

  24. Methodology In reality, the covariance matrices Σ x and Σ xy are unknown, and the optimization problem we’re actually solving is 1 � 2 � � U − � Σ x V � 2 F , V = argmin V U T + � and � U satisfies � Σ xy = � U � Σ ǫ = � B + � Σ ǫ . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 15 / 32

  25. Optimization Problem in High Dimensions When p is relatively large compared with n , or p > n , by adding an ℓ 1 penalty to our reformulated problem, one can easily estimate � V � 1 � � 2 � � U − � Σ x V � 2 V = argmin F + λ � V � 1 , 1 , V where � · � 1 , 1 denotes � ( � A · 1 � 1 , � A · 2 � 1 , · · · , � A · m � 1 ) � 1 , for a matrix A ∈ R n × m . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 16 / 32

  26. Algorithm and Tuning Parameter Selection Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32

  27. Algorithm and Tuning Parameter Selection LASSO. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32

  28. Algorithm and Tuning Parameter Selection LASSO. Extended BIC (Chen and Chen, 2008) to select λ K for fixed K . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32

  29. Algorithm and Tuning Parameter Selection LASSO. Extended BIC (Chen and Chen, 2008) to select λ K for fixed K . 5-fold CV to select K . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32

  30. Theoretical Properties To make the signal and residual separable with respect to Σ xy , we need the separability condition: λ min (Σ − 1 / 2 ( UU T )Σ − 1 / 2 ) > λ max (Σ − 1 / 2 Σ ǫ Σ − 1 / 2 ) . xy xy xy xy Theorem (Equivalence) When p < n, V = span ( V ∗ ) can recover W = span ( W ∗ ) exactly, that is V = W or equivalently V ∗ V ∗ T = W ∗ W ∗ T if the separability condition holds. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 18 / 32

  31. Theoretical Properties Theorem (Estimation Error) Under proper conditions, with probability going to 1, � V converges to V ∗ . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 19 / 32

  32. Numerical Results Simulation I Y ( t ) = X β ( t ) + ǫ ( t ) . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32

  33. Numerical Results Simulation I iid ∼ N p ( 0 , Σ) , where Σ jj ′ = 0 . 5 | j − j ′ | for 1 ≤ j , j ′ ≤ p . X i Y ( t ) = X β ( t ) + ǫ ( t ) . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32

  34. Numerical Results Simulation I iid ∼ N p ( 0 , Σ) , where Σ jj ′ = 0 . 5 | j − j ′ | for 1 ≤ j , j ′ ≤ p . X i Compact support T = [ 0 , 1 ] . Y ( t ) = X β ( t ) + ǫ ( t ) . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32

  35. Numerical Results Simulation I iid ∼ N p ( 0 , Σ) , where Σ jj ′ = 0 . 5 | j − j ′ | for 1 ≤ j , j ′ ≤ p . X i Compact support T = [ 0 , 1 ] . iid ǫ i ( t ) ∼ a gaussian process with mean 0 and covariance function K ( s , t ) = exp {− 3 ( s − t ) 2 } for 0 ≤ s , t ≤ 1. Y ( t ) = X β ( t ) + ǫ ( t ) . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend