on the eigenspectrum eigenspectrum of the gram of the
play

On the Eigenspectrum Eigenspectrum of the Gram of the Gram On the - PowerPoint PPT Presentation

On the Eigenspectrum Eigenspectrum of the Gram of the Gram On the Matrix and the Generalisation Generalisation Error Error Matrix and the of Kernel PCA ( of Kernel PCA (Shawe Shawe- -Taylor, et al. 2005) Taylor, et al. 2005) Ameet


  1. On the Eigenspectrum Eigenspectrum of the Gram of the Gram On the Matrix and the Generalisation Generalisation Error Error Matrix and the of Kernel PCA ( of Kernel PCA (Shawe Shawe- -Taylor, et al. 2005) Taylor, et al. 2005) Ameet Talwalkar Ameet Talwalkar 02/13/07 02/13/07

  2. Outline Outline � Background Background � � Motivation Motivation � � PCA, MDS PCA, MDS � � ( (Isomap Isomap) ) � � Kernel PCA Kernel PCA � � Generalisation Generalisation Error of Kernel PCA Error of Kernel PCA �

  3. Dimensional Reduction: Motivation Dimensional Reduction: Motivation Lossy Lossy � Computational efficiency Computational efficiency � � Visualization of data requires 2D or 3D representations Visualization of data requires 2D or 3D representations � � Curse of Dimensionality : Learning algorithms require Curse of Dimensionality : Learning algorithms require � “reasonably” good sampling “reasonably” good sampling Intractable learning A(x) problem Tractable learning A(x’) problem Dim Red x -> x’ Lossless – – “Manifold Learning” “Manifold Learning” Lossless � Assumes existence of “intrinsic dimension,” or a Assumes existence of “intrinsic dimension,” or a � reduced representation containing all independent reduced representation containing all independent variables variables

  4. Linear Dimensional Reduction Linear Dimensional Reduction � Assumes input data is a linear function of the Assumes input data is a linear function of the � independent variables independent variables � Common Methods: Common Methods: � � Principal Component Analysis (PCA) Principal Component Analysis (PCA) � � Multidimensional Scaling (MDS) Multidimensional Scaling (MDS) �

  5. PCA – – Big Picture Big Picture PCA � Linearly transform input data in a way Linearly transform input data in a way � that: that: � Maximizes signal (variance) Maximizes signal (variance) � � Minimizes redundancy of signal (covariance) Minimizes redundancy of signal (covariance) �

  6. PCA – – Simple Example Simple Example PCA � Original Data Points Original Data Points � � E.g. shoe size E.g. shoe size � measured in ft, cm measured in ft, cm � y = x provides a y = x provides a � good approx of data good approx of data

  7. PCA – – Simple Example (cont) Simple Example (cont) PCA � Original data Original data � restored using only restored using only first principal first principal component component

  8. PCA – – Covariance Covariance PCA � Covariance is a measure of how much two Covariance is a measure of how much two � variables vary together variables vary together = − − cov( x , y ) E [( x x )( y y )] � � = cov( x , x ) var( x ) � � � If x and y are independent, then If x and y are independent, then cov(x,y cov(x,y) = 0 ) = 0 �

  9. PCA – – Covariance Matrix Covariance Matrix PCA � Stores Stores pairwise pairwise covariance of variables covariance of variables � � Diagonals are variances Diagonals are variances � � Symmetric, Positive Semi Symmetric, Positive Semi- -definite definite � � Start with m column vector observations of n variables Start with m column vector observations of n variables � � Covariance is an n x n matrix Covariance is an n x n matrix � [ ] [ ] [ ] = − − T C E ( X E X )( X E X ) X 1 1 m ∑ T = = T C XX x x X i i m m = i 1

  10. Eigendecomposition Eigendecomposition � Eigenvectors (v) and eigenvalues ( Eigenvectors (v) and eigenvalues ( λ ) for an n x n λ ) for an n x n � matrix, A, are pairs (v, λ ) such that: matrix, A, are pairs (v, λ ) such that: = λ Av v � � � If A is a real symmetric matrix, it can be If A is a real symmetric matrix, it can be � diagonalized into A = E DE into A = E DE T diagonalized T � E = A’s E = A’s orthonormal orthonormal eigenvectors eigenvectors � � D = diagonal matrix of A’s eigenvalues D = diagonal matrix of A’s eigenvalues � � A is positive semi A is positive semi- -definite => eigenvalues non definite => eigenvalues non- -negative negative �

  11. PCA – – Goal (x3) Goal (x3) PCA � Linearly transform input data in a way that: Linearly transform input data in a way that: � � Maximizes signal (variance) Maximizes signal (variance) � � Minimizes redundancy of signal (covariance) Minimizes redundancy of signal (covariance) � � Algorithm: Algorithm: � � Select variance maximizing direction input space Select variance maximizing direction input space � � Find next variance maximizing direction that is orthogonal Find next variance maximizing direction that is orthogonal � to all previously selected directions to all previously selected directions � Repeat k Repeat k- -1 times 1 times � � Find a transformation, P, such that Y = PX and C Find a transformation, P, such that Y = PX and C Y is Y is � diagonalized diagonalized � Solution: project data onto eigenvectors of Solution: project data onto eigenvectors of C C x � x

  12. PCA – – Algorithm Algorithm PCA Goal: Find P where Y = PX s.t s.t. C . C Y is diagonalized diagonalized Goal: Find P where Y = PX Y is � � Select P = E T , or a matrix where each Select P = E T , or a matrix where each 1 � � = T C YY row is an eigenvector of C row is an eigenvector of C x Y m x 1 = T C PAP = T ( PX )( PX ) Y m = T T P ( P DP ) P 1 = T T PXX P m = D = T PAP Inverse = Transpose for Inverse = Transpose for orthonormal orthonormal � � = 1 matrix matrix = T T where A XX EDE m C Y is diagonalized diagonalized C Y is � � PCs are the eigenvectors of C C x PCs are the eigenvectors of � � x th diagonal value of C note: eigenvectors of E are note: eigenvectors of E are i th diagonal value of C Y is the variance of i Y is the variance of � � � � orthonormal orthonormal X along p i X along p i

  13. Gram Matrix (Kernel Matrix) Gram Matrix (Kernel Matrix) � Given X, a collection of m column vector observations Given X, a collection of m column vector observations � of n variables of n variables � Gram Matrix of M: matrix of dot products of inputs Gram Matrix of M: matrix of dot products of inputs � � m x m, real, symmetric m x m, real, symmetric � � Positive semi Positive semi- -definite definite � � “similarity matrix” “similarity matrix” � = T K X X = ⋅ K x x ij i j

  14. Classical Multidimensional Scaling Classical Multidimensional Scaling � Given m objects and dissimilarity Given m objects and dissimilarity δ δ ij for each pair, ij for each pair, � find space in which δ δ ij ≈ Euclidean distance Euclidean distance find space in which ij ≈ 2 = Euclidean Distance: � If If δ δ ij = Euclidean Distance: ij 2 � � Can convert Dissimilarity matrix to Gram Matrix (or we Can convert Dissimilarity matrix to Gram Matrix (or we � can just start with Gram Matrix) can just start with Gram Matrix) � MDS yields same answer as PCA MDS yields same answer as PCA �

  15. Classical Multidimensional Scaling Classical Multidimensional Scaling � Convert Dissimilarity Matrix to Gram Matrix (K) Convert Dissimilarity Matrix to Gram Matrix (K) � � Eigendecomposition Eigendecomposition of K of K � � K = EDE K = EDE T = ED ED 1/2 D 1/2 E T = (ED (ED 1/2 ) (ED 1/2 ) T T = 1/2 D 1/2 E T = 1/2 ) (ED 1/2 ) T � T X � K = K = X X T X � � X X = (ED = (ED 1/2 ) T 1/2 ) T � � Reduce Dimension Reduce Dimension � � Construct Construct X X from subset of eigenvectors/ from subset of eigenvectors/eigenvalues eigenvalues � � Identical to PCA Identical to PCA �

  16. Limitations of Linear Methods Limitations of Linear Methods � Cannot account for non Cannot account for non- - � Small linear relationship of data in linear relationship of data in Euclidean input space input space distance � Data may still have linear Data may still have linear � relationship in some feature relationship in some feature space space � Isomap Isomap: use geodesic : use geodesic � distance to recover manifold distance to recover manifold Large geodesic � Length of shortest curve on Length of shortest curve on � distance a manifold connecting two a manifold connecting two points on the manifold points on the manifold

  17. Local Estimation of Manifolds Local Estimation of Manifolds � Small patches on a non Small patches on a non- -linear manifold look linear linear manifold look linear � � Locally linear neighborhoods defined in two ways Locally linear neighborhoods defined in two ways � � k k - -nearest neighbors: find the k nearest points to a given point nearest neighbors: find the k nearest points to a given point � � ε ε - - ball: find all points that lie within ball: find all points that lie within ε ε of a given point of a given point �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend