principal component analysis
play

Principal Component Analysis 4/7/17 PCA: the setting Unsupervised - PowerPoint PPT Presentation

Principal Component Analysis 4/7/17 PCA: the setting Unsupervised learning Unlabeled data Dimensionality reduction Simplify the data representation What does the algorithm do? Performs an affine change of basis. Rotate and


  1. Principal Component Analysis 4/7/17

  2. PCA: the setting Unsupervised learning • Unlabeled data Dimensionality reduction • Simplify the data representation What does the algorithm do? • Performs an affine change of basis. • Rotate and translate the data set so that most of the variance lies along the axes. • Eliminates dimensions with little variation.

  3. Change of Basis Examples So Far Support vector machines • Data that's not linearly separable in the standard basis may be (approximately) linearly separable in a transformed basis. • The kernel trick sometimes lets us work with high- dimensional bases. Approximate Q-learning • When the state space is too large for Q-learning, we may be able to extract features that summarize the state space well. • We then learn values as a linear function of the transformed representation.

  4. Change of Basis in PCA This looks like the change of basis from linear algebra. • PCA performs an affine transformation of the original basis. • Affine ≣ linear plus a constant The goal: • Find a new basis where most of the variance in the data is along the axes. • Hopefully only a small subset of the new axes will be important.

  5. PCA Change of Basis Illustrated

  6. PCA: step one First step: center the data. • From each dimension, subtract the mean value of that dimension. • This is the "plus a constant" part, afterwards we'll perform a linear transformation. • The centroid is now a vector of zeros. Original Data Centered Data x0 x1 x2 x3 x4 means x0 x1 x2 x3 x4 4 3 -4 1 2 1.2 2.8 1.8 -5.2 -.2 .8 8 0 -1 -2 -5 0 8 0 -1 -2 -5 -2 6 -7 -6 -3 -2.4 .4 8.4 -4.6 -3.6 -.6

  7. PCA: step two The hard part: find an orthogonal basis that's a linear transformation of the original, where the variance in the data is explained by as few dimensions as possible. [0,1] [1,0] Basis: set of vectors that span the dimensions. Orthogonal: all vectors are perpendicular. Linear transformation: rotate all vectors by the same amount. Explaining the variance: low co-variance across dimensions.

  8. PCA: step three Last step: reduce the dimension. • Sort the dimensions of the new basis by how much the data varies. • Throw away some of the less-important dimensions. • Could keep a specific number of dimensions. • Could keep all dimensions with variance above some threshold. • This results in a projection into the subspace of the remaining axes.

  9. Computing PCA (step two) • Construct the covariance matrix. • m x m (m is the number of dimensions) matrix. • Diagonal entries give variance along each dimension. • Off-diagonal entries give cross-dimension covariance. • Perform eigenvalue decomposition on the covariance matrix. • Compute the eigenvectors/eigenvalues of the covariance matrix. • Use the eigenvectors as the new basis.

  10. Covariance Matrix Example C X = 1 nXX T X T 2.8 8 .4 C X = ⅕(X)(X T ) X 1.8 0 8.4 2.8 1.8 -5.2 -.2 .8 -5.2 -1 4.6 7.76 4.8 8.08 8 0 -1 -2 -5 -.2 -2 -5 4.8 18.8 3.6 .4 8.4 -4.6 -3.6 -.6 .8 -5 -.6 8.08 3.6 21.04

  11. Linear Algebra Review: Eigenvectors Eigenvectors are vectors that the matrix doesn’t rotate. If X is a matrix, and v is a vector, then v is an eigenvector of x iff there is some constant λ, such that: • Xv = λv λ, the amount by which X stretches the eigenvector is the eigenvalue. np.linalg.eig gives eigenvalues and eigenvectors.

  12. Linear Algebra Review: Eigenvalue Decomposition If the matrix (X)(X T ) has eigenvectors with eigenvalues for i ∈ {1, …, m}, then the following vectors form an orthonormal basis: The key point: computing the eigenvectors of the covariance matrix gives us the optimal (linear) basis for explaining the variance in our data. • Sorting by eigenvalue tells us the relative importance of each dimension.

  13. PCA Change of Basis Illustrated Center the data by Re-align the data by subtracting the mean finding an orthonormal in each dimension. basis for the covariance matrix.

  14. When is/isn’t PCA helpful?

  15. Compare Hypothesis Spaces • What other dimensionality reduction algorithm(s) have we seen before? Compare auto-encoders with PCA: • What sorts of transformation can each perform? • What are advantages/disadvantages of each?

  16. Auto-Encoders Idea: train a network for data compression/ dimensionality reduction by throwing away outputs. target training hidden layer becomes = input the output input 2 2 -1 -1 0 0 1 1 3 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend