Unsupervised Learning
Principal Component Analysis
CMSC 422 MARINE CARPUAT
marine@cs.umd.edu
Slides credit: Maria-Florina Balcan
Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE - - PowerPoint PPT Presentation
Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Maria-Florina Balcan Unsupervised Learning Discovering hidden structure in data Last time: K-Means Clustering What is the
CMSC 422 MARINE CARPUAT
marine@cs.umd.edu
Slides credit: Maria-Florina Balcan
– What is the objective optimized? – How can we improve initialization? – What is the right value of K?
– To visualize data more easily – To remove noise in data – To lower resource requirements for storing/processing data – To improve classification/clustering
Examples of data points in D dimensional space that can be effectively represented in a d-dimensional subspace (d < D)
– Intuition: those are directions in which most information is encoded
– Projection of data points along 1st PC discriminates data most along any one direction
– next orthogonal direction of greatest variability
– Represented by matrix X of size DxN – Let’s assume data is centered
– 𝑤𝑗. 𝑤𝑘 = 0, 𝑗 ≠ 𝑘 and 𝑤𝑗. 𝑤𝑗 = 1
is
1 𝑜 𝑗=1 𝑜 (𝑤𝑈𝑦𝑗)2 = 𝑤𝑈𝑌𝑌𝑈 𝑤
𝑏𝑠𝑛𝑏𝑦𝑤 𝑤𝑈𝑌𝑌𝑈 𝑤 − 𝜇𝑤𝑈𝑤
captured along dimension 𝑤 – Sample variance of projection 𝑤𝑈𝑌𝑌𝑈 𝑤 = 𝜇
– The 1st PC is the eigenvector of 𝑌𝑌𝑈 associated with largest eigenvalue – The 2nd PC is the eigenvector of 𝑌𝑌𝑈 associated with 2nd largest eigenvalue – …
– Eigenvector method – No tuning of the parameters – No local optima
– Only based on covariance (2nd order statistics) – Limited to linear projections
problem
Analysis