unsupervised learning
play

Unsupervised learning General introduction to unsupervised learning - PowerPoint PPT Presentation

Unsupervised learning General introduction to unsupervised learning PCA Special directions These are special directions we will try to find. Best direction u : |u| 2 = 1 2 1. Minimize : d i X i T u is the projection length x i u d i T


  1. Unsupervised learning • General introduction to unsupervised learning

  2. PCA

  3. Special directions These are special directions we will try to find.

  4. Best direction u : |u| 2 = 1 2 1. Minimize : Σd i X i T u is the projection length x i u d i T u) 2 2. Maximize : Σ ( x i u is the direction that maximizes the variance

  5. Finding the best projection: X i u d i Find u that maximize : Σ ( x i T u) 2 T 2 T T ( x u ) = ( u x ) ( x u ) i max Σ ( u T x i ) (x i T u) T = max u [V] u where: [V] = Σ (x i x i T )

  6. The data matrix: X X T [V] = [V ] = Σ (x i x i T ) = XX T

  7. Best direction u • Will minimize the distances from it • Will maximize the variance along it Max(u): u T [V] u subject to: |u| = 1 With Lagrange multipliers: d/dx (x T U x) = 2Ux Maximize u T [V] u - λ( u T u – 1) d/dx (x T x) = 2x Derivative with respect to the vector u: [V]u – λ u = 0 [V]u = λ u The best direction will be the first eigenvector of [V]

  8. Best direction u: X i u d i The best direction will be the first eigenvector of [V]; u 1 with variance λ 1 The next direction will be the second eigenvector of [V]; u 2 with variance λ 2 The Principle Components will be the eigenvectors of the data matrix

  9. PCs, Variance and Least-Squares • The first PC retains the greatest amount of variation in the sample • The k th PC retains the k th greatest fraction of the variation in the sample • The k th largest eigenvalue of the correlation matrix C is the variance in the sample along the k th PC • The least-squares view: PCs are a series of linear least squares fits to a sample, each orthogonal to all previous ones

  10. Dimensionality Reduction Can ignore the components of lesser significance. 25 20 Scree plot Variance (%) 15 10 5 0 PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 You do lose some information, but if the eigenvalues are small, you don’t lose much – n dimensions in original data – calculate n eigenvectors and eigenvalues – choose only the first k eigenvectors, based on their eigenvalues – final data set has only k dimensions

  11. PC dimensionality reduction In the linear case only

  12. PCA and correlations • We can think of our data points as k points from a distribution p(x) • We have k samples (x 1 y 1 ) (x 2 y 2 )… …( x k y k )

  13. PCA and correlations • We have k samples (x 1 y 1 ) (x 2 y 2 )… …( x k y k ) • The correlation between(x,y) is: E [ (x-x 0 ) (y – y 0 ) / σ x σ y ] • For centered variables, x,y are uncorrelated if E(xy) = 0

  14. v 2 v 1 Correlation depends on the coordinates: (x,y) are correlated, (v 1 v 2 ) are not

  15. In the PC coordinates, the variables are uncorrelated • T v 1 T x i ). The projection of a point x i on v 1 is: x i (or v 1 • T v 2 The projection of a point x i on v 2 is: x i • For the correlation, we take the sum: Σ i ( v 1 T v 2 ) T x i ) (x i • Σ i v 1 T x i x i T v 2 = v 1 T C v 2 = • Where C = X T X. (the data matrix) • C v 2 = λ 2 v 2 Since the v i are eigenvectors of C, • • T C v 2 = λ 2 v 1 T v 2 = 0 v 1 • The variables are uncorrelated. • This is a result of using as coordinates the eigenvectors of the correlation matrix C = X T X.

  16. In the PC coordinates the variables are uncorrelated • The correlation depends on the coordinate system. We can start with variables (x,y) which are correlated, transform them to (x', y') that will be un-correlated • If we use the coordinates defined by the eigenvectors of XX T the variables (or the vectors x i of n projections on the i'th axis) will be uncorrelated.

  17. Properties of the PCA • The subspace spanned by the first k PC retains the maximal variance • This subspace minimized the distance of the points from the subspace • The transformed variables, which are linear combinations of the original ones, are uncorrelated.

  18. Best plane, minimizing perpendicular distance over all planes

  19. Eigenfaces: PC of face images • Turk, M., Pentland, A.: Eigenfaces for recognition . J. Cognitive Neuroscience 3 (1991) 71 – 86.

  20. Image Representation • Training set of m images of size N*N are represented by vectors of size N 2 x 1 ,x 2 ,x 3 ,…, x M Example   1   2       3 1 2 3     3    3 1 2      1     4 5 1    3 3  2    4   Need to be well aligned   5     1  9 1

  21. Average Image and Difference Images • The average training set is defined by m = (1 /m) ∑ m i=1 x i • Each face differs from the average by vector r i = x i – m

  22. Covariance Matrix • The covariance matrix is constructed as T where A=[r 1 ,…, r m ] C = AA Size of this matrix is N 2 x N 2 • Finding eigenvectors of N 2 x N 2 matrix is intractable. Hence, use the matrix A T A of size m x m and find eigenvectors of this small matrix.

  23. Face data matrix: m X T m N 2 N 2 X XX T is N 2 * N 2 X T X is m * m

  24. Eigenvectors of Covariance Matrix • Consider the eigenvectors v i of A T A such that A T A v i = m i v i • Pre-multiplying both sides by A , we have AA T ( A v i ) = m i ( A v i ) • A v i is an eigenvector of our original AA T • Find the eigenvectors v i of the small A T A • Get the ‘ eigen- faces’ by A v i

  25. Face Space • u i resemble facial images which look ghostly, hence called Eigenfaces

  26. Projection into Face Space • A face image can be projected into this face space by p k = U T (x k – m ) Rows of U T are the eigenfaces p k are the m coefficients of face x k This is the representation of a face using eigen-faces This representation can then be used for recognition using different recognition algorithms

  27. Recognition in ‘face space’ • Turk and Pentland used 16 faces, and 7 pc • In this case the face representation p: p k = U T (x k – m ) is 7-long vector • • Face classification: • Several images per face-class. • For a new test image I: obtain the representation p I • Turk-Pentland used simple nearest neighbor • Find NN in each class, take the nearest, • s.t. distance < ε, otherwise result is ‘unknown’ • Other algorithms are possible, e.g. SVM

  28. Face detection by ‘face space’ • Turk-Pentland used ‘ faceness ’ measure: • Within a window, compare the original image with its reconstruction from face-space • Find the distance Є between the original image x and its reconstructed image from the eigenface space, x f , Є 2 = || x – x f || 2 , where x f = Up + μ (reconstructed face) • If ε < θ for a threshold θ • A face is detected in the window • Not ‘state -of-the-art and not fast enough • Eigenfaces in the brain?

  29. Next: PCA by Neurons

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend