unsupervised learning
play

Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE - PowerPoint PPT Presentation

Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Maria-Florina Balcan Unsupervised Learning Discovering hidden structure in data Last time: K-Means Clustering What is the


  1. Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Maria-Florina Balcan

  2. Unsupervised Learning • Discovering hidden structure in data • Last time: K-Means Clustering – What is the objective optimized? – How can we improve initialization? – What is the right value of K? • Today: how can we learn better representations of our data points?

  3. Dimensionality Reduction • Goal: extract hidden lower-dimensional structure from high dimensional datasets • Why? – To visualize data more easily – To remove noise in data – To lower resource requirements for storing/processing data – To improve classification/clustering

  4. Examples of data points in D dimensional space that can be effectively represented in a d-dimensional subspace (d < D)

  5. Principal Component Analysis • Goal: Find a projection of the data onto directions that maximize variance of the original data set – Intuition: those are directions in which most information is encoded • Definition: Principal Components are orthogonal directions that capture most of the variance in the data

  6. PCA: finding principal components • 1 st PC – Projection of data points along 1 st PC discriminates data most along any one direction • 2 nd PC – next orthogonal direction of greatest variability • And so on…

  7. PCA: notation • Data points – Represented by matrix X of size DxN – Let’s assume data is centered • Principal components are d vectors: 𝑤 1 , 𝑤 2 , … 𝑤 𝑒 – 𝑤 𝑗 . 𝑤 𝑘 = 0, 𝑗 ≠ 𝑘 and 𝑤 𝑗 . 𝑤 𝑗 = 1 • The sample variance data projected on vector v 𝑜 (𝑤 𝑈 𝑦 𝑗 ) 2 = 𝑤 𝑈 𝑌𝑌 𝑈 𝑤 is 1 𝑜 𝑗=1

  8. PCA formally • Finding vector that maximizes sample variance of projected data: 𝑏𝑠𝑕𝑛𝑏𝑦 𝑤 𝑤 𝑈 𝑌𝑌 𝑈 𝑤 such that 𝑤 𝑈 𝑤 = 1 • A constrained optimization problem  Lagrangian folds constraint into objective: 𝑏𝑠𝑕𝑛𝑏𝑦 𝑤 𝑤 𝑈 𝑌𝑌 𝑈 𝑤 − 𝜇𝑤 𝑈 𝑤  Solutions are vectors v such that 𝑌𝑌 𝑈 𝑤 = 𝜇𝑤  i.e. eigenvectors of 𝑌𝑌 𝑈 (sample covariance matrix)

  9. PCA formally • The eigenvalue 𝜇 denotes the amount of variability captured along dimension 𝑤 – Sample variance of projection 𝑤 𝑈 𝑌𝑌 𝑈 𝑤 = 𝜇 • If we rank eigenvalues from large to small – The 1 st PC is the eigenvector of 𝑌𝑌 𝑈 associated with largest eigenvalue – The 2 nd PC is the eigenvector of 𝑌𝑌 𝑈 associated with 2 nd largest eigenvalue – …

  10. Alternative interpretation of PCA • PCA finds vectors v such that projection on to these vectors minimizes reconstruction error

  11. Resulting PCA algorithm

  12. How to choose the hyperparameter K? • i.e. the number of dimensions • We can ignore the components of smaller significance

  13. An example: Eigenfaces

  14. PCA pros and cons • Pros – Eigenvector method – No tuning of the parameters – No local optima • Cons – Only based on covariance (2 nd order statistics) – Limited to linear projections

  15. What you should know • Formulate K-Means clustering as an optimization problem • Choose initialization strategies for K-Means • Understand the impact of K on the optimization objective • Why and how to perform Principal Components Analysis

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend