ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING - PowerPoint PPT Presentation

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Kernel PCA 11

ADVANCED MACHINE LEARNING Overview Today’s Lecture • Brief Recap of Classical Principal Component Analysis (PCA) • Derivation of kernel PCA • Exercises to develop a geometrical intuition of kernel PCA 22

ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Take samples of two classes (yellow and pink classes) Each image is a high- dimensional vector   320 240 3 230400 x 33

ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Project the images onto a lower dimensional space     2 2 230400 y through matrix A : y Ax Separating Line 44

ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Project the images onto a lower dimensional space     2 2 230400 y through matrix A : y Ax What is A? PCA discovers the matrix A 55

ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Infinite number of choices for projection matrix A  need criteria to reduce the choice 1: minimum information loss (minimal reconstruction error) x 2 x 1 What is the 2D to 1D projection that minimizes the reconstruction error? 66

ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Infinite number of choices for projection matrix A  need criteria to reduce the choice 1: minimum information loss(minimal reconstruction error) 2: equivalent to finding the direction with maximum variance Smallest breadth of x x data lost 2 2 Largest breadth of data conserved x x Reconstruction after projection 1 1 What is the 2D to 1D projection that minimizes the reconstruction error? 77

ADVANCED MACHINE LEARNING Principal Component Analysis: Overview       1 2 M Dataset X x x ..... x (data is centered, i.e. E X 0)     1  T Compute covariance matrix of dataset : X C E XX M   T Find eigenvalue decomposition: C V V     1 N 1 V e .... e : matrix of eigenvectors  e  : Diagonal matrix of eigenvalues 2 e      1 N Order .... e e , s.t. ... 1 2 N The eigenvectors form a basis of the space. 1 is aligned with the axis of maximum variance. e Project data onto eigenvectors.  Remove projections with low (noise). 88

ADVANCED MACHINE LEARNING PCA for Data Compression   x  p 0.1 N N Compressed image is y Original image is encoded in .  y A x , p Rows of A contains 1st eigenvectors p p Image compressed 90% Original Image 99

ADVANCED MACHINE LEARNING PCA for Feature Extraction Results of decomposition with Principal Component Analysis: eigenvectors Encapsulate main differences across groups of images (in the first eigenvectors) Detailed features (glasses) get encapsulated next (in the following eigenvectors) 10 10

ADVANCED MACHINE LEARNING Principal Component Analysis: Pros & Cons Advantages: a) The projection through PCA ensures minimal reconstruction error. b) The projection does not distort the space (rotation in space).  Ease of visualization/interpretation: The features that appear in the projections are often interpretable visually. Limitations: a ) PCA assumes a linear transformation:  With centering of data, one can only do a rotation in space. b) It fails at finding directions that require a non-linear transformation. 11 11

ADVANCED MACHINE LEARNING Revisiting the hypotheses of PCA PCA assumed a linear transformation  Non-linear PCA (Kernel PCA): find a non-linear embedding of the data and then perform linear PCA. 12 12

ADVANCED MACHINE LEARNING Recall: Principle of kernel Methods Going back to linearity Find a non-linear transformation that send the data in a space where linear computation is again feasible. 13 13

ADVANCED MACHINE LEARNING Kernel PCA: Principle Determine a transformation which brings out features of the data so as to make subsequent computation easier. Original Space After Lifting Data in Feature Space v 2 x 2 x 1 v 1 Example above: Data becomes linearly separable when using a rbf kernel and projecting onto first 2 PC-s of kernel PCA. 14 14

ADVANCED MACHINE LEARNING Kernel PCA: Principle Idea: Send the data X into a feature space H through the nonlinear map f.            i 1... M 1 ,.....,   f  f f i N M X x X x x Perform linear pca in feature space and project into set of eigenvectors in feature space   f 2 x Original H Space   f 2 1 x x 1   x f P x Scholkopf et al, Neural Computation, 1998 15 15

ADVANCED MACHINE LEARNING Kernel PCA: Principle Idea: Send the data X into a feature space H through the nonlinear map f.            i 1... M 1 ,.....,   f  f f i N M X x X x x Perform linear pca in feature space and project into set of eigenvectors in feature space Data projected onto the two first principal components in feature space Original Space x 2 v 2 Determining f is difficult  Kernel Trick x 1 v 1 16 16

MACHINE LEARNING – 2013 Linear PCA in Feature Space Sending the data in feature space through f:   f  f : X H x x Assume that, in feature space H, the data are centered:   M  f  i x 0  i 1 The covariance matrix in the feature space is: 1  T C FF f M    f i The columns i 1... M of are composed of F x . 17

ADVANCED MACHINE LEARNING Linear PCA in Feature Space As in the original space, in feature space, the covariance matrix can be diagonalized and we have now to find the eigenvalues  i > 0 , satisfying:   i i C v v f i Primal eigenvalue problem: Finding the eigenvectors of v C f Not possible in feature space! => Formulate everything as a dot product and use kernel trick! 18 18 18

ADVANCED MACHINE LEARNING From Linear PCA to Kernel PCA 1 M Each eigenvector ,..., v v can be expressed as a linear combination of the images of the datapoints: Rewriting PCA in terms of dot products:     M 1  T f f   i j j i i i Using C v = x x v with C v v f f i M  j 1   M     M 1  1  T   f  f f i j i j j i we obtain, v x x v x .   j M M  j 1  i j 1 i  ij Scalar 19 19 19

ADVANCED MACHINE LEARNING Linear PCA in Feature Space Multiplying the equation:   i i C v v f i   T f j by x , on both sides, we have:     f   f   j i j i x , C v x , v , i j , 1,... M f i   M     1 M  1  T   f  f f i i l i k k i v x C v x x v f  l M M   l 1 k 1 i 20 20 20

MACHINE LEARNING – 2013 Linear PCA in Feature Space             M M M 1   1   f f  f f   f f j k i k l i j l x , x x , x x , x  l l 2 M M    k 1 l 1 l 1 i  K K K jl jk kl k         f f i j i j Use the kernel trick: k x x , : K x , x ij  Eigenvalue problem of the form:    i K M , K : Gram Matrix i i  i Dual eigenvalue problem: Finding the dual eigenvectors . 21

MACHINE LEARNING – 2013 Linear PCA in Feature Space The solutions to the dual eigenvalue problem:   1 M are given by all the eigenvectors ,..., with   non-zero eigenvalues ,..., . 1 M Kernel PCA finds at most M eigenvectors M: number of datapoints M>>N dimension of each datapoint  Eigenvalue problem of the form:    i K M , K : Gram Matrix i i 22

MACHINE LEARNING – 2013 Linear PCA in Feature Space Request that the eigenvectors v of C be normalized, f    i i i.e. v v , 1 i 1,..., M   1 M is equivalent to asking that the dual eigenvectors ,...,    i i are such that: 1/ . 23

ADVANCED MACHINE LEARNING Constructing the kPCA projections We cannot see the projection in feature space! We can only compute the projections of each point onto each eigenvector i Projection of query point x onto eigenvector : v     M M 1  1      f   f f   i i j i j v , x x , x k x , x   j j M M   j 1 j 1 i i Sum over all training points Isolines group points with equal projection:   f  i All points , . : x s t v , x cst . 24 24 24

ADVANCED MACHINE LEARNING kPCA projections: Exercise i Recall: projection of query point x onto eigenvector : v   M 1    f   i i j v , x k x , x  j M  j 1 i  i where the are the dual eigenvectors, solutions of the eigenvalue decomposition of . K   Consider a 2 dimensional data space, with two datapoints, and the RBF kernel: 2  x x '      2 k x x , ' e a) How many dual eigenvectors do you have and what is their dimension? b) Compute the eigenvectors and draw t he isolines for the projections on each eigenvector.    2 c) Repeat (b) for a homogeneous polynomial kernel with p=2: k x x , ' x x , ' 25 25 25

ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING - PowerPoint PPT Presentation

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING Overview Todays Lecture Brief Recap of Classical Principal Component Analysis (PCA) Derivation of kernel PCA Exercises to develop a

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan

kernel CCA, kernel Kmeans Spectral Clustering 1 MACHINE LEARNING 2012 Change in timetable:

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Lecture 24: Principal Component Analysis Aykut Erdem January 2017 Hacettepe University This

Introduction to Kubernetes Containers container vs virtual machine Virtual machine Container

Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA Eigenfaces (PCA) 1 Linear

Principal Component Analysis (PCA) Dr. Veselina Kalinova Max Planck Institute for

Announcements Monday, October 30 WeBWorK 3.1, 3.2 are due Wednesday at 11:59pm. The quiz

rsss rs t

Algebraic Eigenvalue Problem Algebraic Eigenvalue Problem Computers are useless. They can only

Graphs with three eigenvalues Jack Koolen Joint work with Ximing Cheng, Gary Greaves and

Bilinear systems with two supports: Koszul resultant matrices, eigenvalues, and eigenvectors July

Parallel Solution of Symmetric Eigenvalue Problems Zack 2/21/2014 Typically, the eigenvalue

Eigenvalues and Eigenvectors Artem Los (arteml@kth.se) February 9th, 2017 Artem Los

A refined Lanczos method for computing eigenvalues and eigenvectors of unsymmetric matrices Jean