large scale sparse kernel canonical correlation analysis
play

Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio - PowerPoint PPT Presentation

Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio 1 , Sahely Bhadra 2 , and Juho Rousu 1 1 Department of Computer Science, Aalto University Helsinki Institute for Information Technology HIIT 2 Indian Institute of Technology


  1. Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio 1 , Sahely Bhadra 2 , and Juho Rousu 1 1 Department of Computer Science, Aalto University Helsinki Institute for Information Technology HIIT 2 Indian Institute of Technology (IIT), Palakkad June 11, 2019 Viivi Uurtio (Aalto, HIIT) ICML 2019 June 11, 2019 1 / 5

  2. From large two-view datasets, it is not straightforward to identify which of the variables are related

  3. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � || Xu || 2 || Yv || 2

  4. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v

  5. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v

  6. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v

  7. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA

  8. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA � � ⊠ RF KCCA

  9. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA � � ⊠ RF KCCA ⊠ � � KNOI

  10. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA � � ⊠ RF KCCA ⊠ � � KNOI � � ⊠ Deep CCA

  11. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA � � ⊠ RF KCCA ⊠ � � KNOI � � ⊠ Deep CCA ⊠ � � SCCA-HSIC Viivi Uurtio (Aalto, HIIT) ICML 2019 June 11, 2019 2 / 5

  12. gradKCCA is a kernel matrix free method that efficiently optimizes u and v

  13. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1

  14. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v

  15. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent

  16. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u :

  17. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u : → Compute the gradient ∇ ρ u = ∂ρ ( u , v ) ∂ u

  18. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u : → Compute the gradient ∇ ρ u = ∂ρ ( u , v ) ∂ u → Step-size using line search: max γ ρ ( u + γ ∇ ρ u )

  19. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u : → Compute the gradient ∇ ρ u = ∂ρ ( u , v ) ∂ u → Step-size using line search: max γ ρ ( u + γ ∇ ρ u ) → Gradient step towards maximum: u grad = u + γ ∗ ∇ ρ u

  20. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u : → Compute the gradient ∇ ρ u = ∂ρ ( u , v ) ∂ u → Step-size using line search: max γ ρ ( u + γ ∇ ρ u ) → Gradient step towards maximum: u grad = u + γ ∗ ∇ ρ u → Project onto ℓ P ball: u = � � . � Px ≤ s x u grad Viivi Uurtio (Aalto, HIIT) ICML 2019 June 11, 2019 3 / 5

  21. Experiments demonstrate noise tolerance, scalability, and superior speed of gradKCCA

  22. Experiments demonstrate noise tolerance, scalability, and superior speed of gradKCCA train test 1 DCCA 0.8 KNOI KCCA 0.6 gradKCCA KCCA preimage SCCA-HSIC 0 F1 score AUC train test 1 0.8 0.6 0 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 Proportion of Noise Variables

  23. Experiments demonstrate noise tolerance, scalability, and superior speed of gradKCCA train test 1 DCCA 0.8 KNOI KCCA 0.6 gradKCCA KCCA preimage gradKCCA DCCA RCCA KNOI SCCA-HSIC SCCA-HSIC 0 F1 score AUC train test train test F1 score Time (s) 10 h 1 1 1 h 0.8 0.6 1 min 1 s 0 0.9 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 Proportion of Noise Variables Sample Size

  24. Experiments demonstrate noise tolerance, scalability, and superior speed of gradKCCA train test 1 DCCA 0.8 KNOI KCCA 0.6 gradKCCA KCCA preimage gradKCCA DCCA RCCA KNOI SCCA-HSIC SCCA-HSIC 0 F1 score AUC train test train test F1 score Time (s) 10 h 1 1 1 h 0.8 0.6 1 min 1 s 0 0.9 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 Proportion of Noise Variables Sample Size MediaMill ρ train ρ test Time (s) gradKCCA 0.666 ± 0.004 0.657 ± 0.007 8 ± 4 Deep CCA 0.643 ± 0.005 0.633 ± 0.003 1280 ± 112 RF KCCA 0.633 ± 0.001 0.626 ± 0.005 23 ± 9 KNOI 0.652 ± 0.001 0.645 ± 0.003 218 ± 73 SCCA-HSIC 0.627 ± 0.004 0.625 ± 0.002 1804 ± 143 Viivi Uurtio (Aalto, HIIT) ICML 2019 June 11, 2019 4 / 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend