Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio - PowerPoint PPT Presentation

Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio 1 , Sahely Bhadra 2 , and Juho Rousu 1 1 Department of Computer Science, Aalto University Helsinki Institute for Information Technology HIIT 2 Indian Institute of Technology (IIT), Palakkad June 11, 2019 Viivi Uurtio (Aalto, HIIT) ICML 2019 June 11, 2019 1 / 5

From large two-view datasets, it is not straightforward to identify which of the variables are related

From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � || Xu || 2 || Yv || 2

From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v

From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v

From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v

From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA

From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA � � ⊠ RF KCCA

From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA � � ⊠ RF KCCA ⊠ � � KNOI

From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA � � ⊠ RF KCCA ⊠ � � KNOI � � ⊠ Deep CCA

From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA � � ⊠ RF KCCA ⊠ � � KNOI � � ⊠ Deep CCA ⊠ � � SCCA-HSIC Viivi Uurtio (Aalto, HIIT) ICML 2019 June 11, 2019 2 / 5

gradKCCA is a kernel matrix free method that efficiently optimizes u and v

gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1

gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v

gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent

gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u :

gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u : → Compute the gradient ∇ ρ u = ∂ρ ( u , v ) ∂ u

gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u : → Compute the gradient ∇ ρ u = ∂ρ ( u , v ) ∂ u → Step-size using line search: max γ ρ ( u + γ ∇ ρ u )

gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u : → Compute the gradient ∇ ρ u = ∂ρ ( u , v ) ∂ u → Step-size using line search: max γ ρ ( u + γ ∇ ρ u ) → Gradient step towards maximum: u grad = u + γ ∗ ∇ ρ u

gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u : → Compute the gradient ∇ ρ u = ∂ρ ( u , v ) ∂ u → Step-size using line search: max γ ρ ( u + γ ∇ ρ u ) → Gradient step towards maximum: u grad = u + γ ∗ ∇ ρ u → Project onto ℓ P ball: u = � � . � Px ≤ s x u grad Viivi Uurtio (Aalto, HIIT) ICML 2019 June 11, 2019 3 / 5

Experiments demonstrate noise tolerance, scalability, and superior speed of gradKCCA

Experiments demonstrate noise tolerance, scalability, and superior speed of gradKCCA train test 1 DCCA 0.8 KNOI KCCA 0.6 gradKCCA KCCA preimage SCCA-HSIC 0 F1 score AUC train test 1 0.8 0.6 0 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 Proportion of Noise Variables

Experiments demonstrate noise tolerance, scalability, and superior speed of gradKCCA train test 1 DCCA 0.8 KNOI KCCA 0.6 gradKCCA KCCA preimage gradKCCA DCCA RCCA KNOI SCCA-HSIC SCCA-HSIC 0 F1 score AUC train test train test F1 score Time (s) 10 h 1 1 1 h 0.8 0.6 1 min 1 s 0 0.9 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 Proportion of Noise Variables Sample Size

Experiments demonstrate noise tolerance, scalability, and superior speed of gradKCCA train test 1 DCCA 0.8 KNOI KCCA 0.6 gradKCCA KCCA preimage gradKCCA DCCA RCCA KNOI SCCA-HSIC SCCA-HSIC 0 F1 score AUC train test train test F1 score Time (s) 10 h 1 1 1 h 0.8 0.6 1 min 1 s 0 0.9 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 Proportion of Noise Variables Sample Size MediaMill ρ train ρ test Time (s) gradKCCA 0.666 ± 0.004 0.657 ± 0.007 8 ± 4 Deep CCA 0.643 ± 0.005 0.633 ± 0.003 1280 ± 112 RF KCCA 0.633 ± 0.001 0.626 ± 0.005 23 ± 9 KNOI 0.652 ± 0.001 0.645 ± 0.003 218 ± 73 SCCA-HSIC 0.627 ± 0.004 0.625 ± 0.002 1804 ± 143 Viivi Uurtio (Aalto, HIIT) ICML 2019 June 11, 2019 4 / 5

Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio - PowerPoint PPT Presentation

Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio 1 , Sahely Bhadra 2 , and Juho Rousu 1 1 Department of Computer Science, Aalto University Helsinki Institute for Information Technology HIIT 2 Indian Institute of Technology

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

Sparse Canonical Correlation Analysis: Minimaxity, Algorithm, and Computational Barrier Harrison

Canonical Correlation Analysis In principal components analysis, we analyzed one set of variables

Canonical Correlation Analysis James H. Steiger Department of Psychology and Human Development

Kernel Exploitation via Uninitialized Stack http://people.canonical.com/~kees/defcon19/ Kees

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Semi-supervised Kernel Canonical Correlation Analysis of Human Functional Magnetic Resonance

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Introducing... Benjamin Mako Hill GULEV: Ubuntu Canonical Ltd. Ubuntu A GNU/Linux Operating

Canonical Typology Danny Hieber Hieber, Daniel W. 2011. Canonical Typology. Talk given to the

A canonical martingale coupling Workshop on Optimal Transportation and Appplications Nicolas

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient

Large deviations and heterogeneities in driven or non-driven kinetically constrained models

Merging Judgments and the Problem of Truth-Tracking Gabriella Pigozzi and Stephan Hartmann

SUPPORTING HOSPITALS IN ADDRESSING SOCIAL DETERMINANTS OF HEALTH NYSHealth Foundation: A

Informational Webinar Community-based Prevention Services Grants December 1, 2015 Division of

IND-CCA-secure Key Encapsulation Mechanism in the Quantum Random Oracle Model, Revisited Haodong

TCP Part 3: Performance, Fairness, & Modern Congestion Controllers 15-441 Guest Lecture

Introduction to Statistical Learning and Kernel Machines Hichem SAHBI CNRS UPMC June 2018

Public-Key Cryptography Lecture 12 CCA Secure PKE Hybrid Encryption CCA Secure PKE In SKE, to get

Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio - PowerPoint PPT Presentation

Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio 1 , Sahely Bhadra 2 , and Juho Rousu 1 1 Department of Computer Science, Aalto University Helsinki Institute for Information Technology HIIT 2 Indian Institute of Technology

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

Sparse Canonical Correlation Analysis: Minimaxity, Algorithm, and Computational Barrier Harrison

Canonical Correlation Analysis In principal components analysis, we analyzed one set of variables

Canonical Correlation Analysis James H. Steiger Department of Psychology and Human Development

Kernel Exploitation via Uninitialized Stack http://people.canonical.com/~kees/defcon19/ Kees

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Semi-supervised Kernel Canonical Correlation Analysis of Human Functional Magnetic Resonance

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Introducing... Benjamin Mako Hill GULEV: Ubuntu Canonical Ltd. Ubuntu A GNU/Linux Operating

Canonical Typology Danny Hieber Hieber, Daniel W. 2011. Canonical Typology. Talk given to the

A canonical martingale coupling Workshop on Optimal Transportation and Appplications Nicolas

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient

Large deviations and heterogeneities in driven or non-driven kinetically constrained models

Merging Judgments and the Problem of Truth-Tracking Gabriella Pigozzi and Stephan Hartmann

SUPPORTING HOSPITALS IN ADDRESSING SOCIAL DETERMINANTS OF HEALTH NYSHealth Foundation: A

Informational Webinar Community-based Prevention Services Grants December 1, 2015 Division of

IND-CCA-secure Key Encapsulation Mechanism in the Quantum Random Oracle Model, Revisited Haodong

TCP Part 3: Performance, Fairness, &amp; Modern Congestion Controllers 15-441 Guest Lecture

Introduction to Statistical Learning and Kernel Machines Hichem SAHBI CNRS UPMC June 2018

Public-Key Cryptography Lecture 12 CCA Secure PKE Hybrid Encryption CCA Secure PKE In SKE, to get

TCP Part 3: Performance, Fairness, & Modern Congestion Controllers 15-441 Guest Lecture