Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE - - PowerPoint PPT Presentation

unsupervised learning
SMART_READER_LITE
LIVE PREVIEW

Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE - - PowerPoint PPT Presentation

Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Maria-Florina Balcan Unsupervised Learning Discovering hidden structure in data Last time: K-Means Clustering What is the


slide-1
SLIDE 1

Unsupervised Learning

Principal Component Analysis

CMSC 422 MARINE CARPUAT

marine@cs.umd.edu

Slides credit: Maria-Florina Balcan

slide-2
SLIDE 2

Unsupervised Learning

  • Discovering hidden structure in data
  • Last time: K-Means Clustering

– What is the objective optimized? – How can we improve initialization? – What is the right value of K?

  • Today: how can we learn better

representations of our data points?

slide-3
SLIDE 3

Dimensionality Reduction

  • Goal: extract hidden lower-dimensional

structure from high dimensional datasets

  • Why?

– To visualize data more easily – To remove noise in data – To lower resource requirements for storing/processing data – To improve classification/clustering

slide-4
SLIDE 4

Examples of data points in D dimensional space that can be effectively represented in a d-dimensional subspace (d < D)

slide-5
SLIDE 5

Principal Component Analysis

  • Goal: Find a projection of the data onto

directions that maximize variance of the

  • riginal data set

– Intuition: those are directions in which most information is encoded

  • Definition: Principal Components are
  • rthogonal directions that capture most of

the variance in the data

slide-6
SLIDE 6

PCA: finding principal components

  • 1st PC

– Projection of data points along 1st PC discriminates data most along any one direction

  • 2nd PC

– next orthogonal direction of greatest variability

  • And so on…
slide-7
SLIDE 7

PCA: notation

  • Data points

– Represented by matrix X of size DxN – Let’s assume data is centered

  • Principal components are d vectors: 𝑤1, 𝑤2, … 𝑤𝑒

– 𝑤𝑗. 𝑤𝑘 = 0, 𝑗 ≠ 𝑘 and 𝑤𝑗. 𝑤𝑗 = 1

  • The sample variance data projected on vector v

is

1 𝑜 𝑗=1 𝑜 (𝑤𝑈𝑦𝑗)2 = 𝑤𝑈𝑌𝑌𝑈 𝑤

slide-8
SLIDE 8

PCA formally

  • Finding vector that maximizes sample

variance of projected data: 𝑏𝑠𝑕𝑛𝑏𝑦𝑤 𝑤𝑈𝑌𝑌𝑈 𝑤 such that 𝑤𝑈𝑤 = 1

  • A constrained optimization problem
  • Lagrangian folds constraint into objective:

𝑏𝑠𝑕𝑛𝑏𝑦𝑤 𝑤𝑈𝑌𝑌𝑈 𝑤 − 𝜇𝑤𝑈𝑤

  • Solutions are vectors v such that 𝑌𝑌𝑈 𝑤 = 𝜇𝑤
  • i.e. eigenvectors of 𝑌𝑌𝑈 (sample covariance matrix)
slide-9
SLIDE 9

PCA formally

  • The eigenvalue 𝜇 denotes the amount of variability

captured along dimension 𝑤 – Sample variance of projection 𝑤𝑈𝑌𝑌𝑈 𝑤 = 𝜇

  • If we rank eigenvalues from large to small

– The 1st PC is the eigenvector of 𝑌𝑌𝑈 associated with largest eigenvalue – The 2nd PC is the eigenvector of 𝑌𝑌𝑈 associated with 2nd largest eigenvalue – …

slide-10
SLIDE 10

Alternative interpretation of PCA

  • PCA finds vectors v such that projection on

to these vectors minimizes reconstruction error

slide-11
SLIDE 11

Resulting PCA algorithm

slide-12
SLIDE 12

How to choose the hyperparameter K?

  • i.e. the number of dimensions
  • We can ignore the components of smaller

significance

slide-13
SLIDE 13

An example: Eigenfaces

slide-14
SLIDE 14

PCA pros and cons

  • Pros

– Eigenvector method – No tuning of the parameters – No local optima

  • Cons

– Only based on covariance (2nd order statistics) – Limited to linear projections

slide-15
SLIDE 15

What you should know

  • Formulate K-Means clustering as an optimization

problem

  • Choose initialization strategies for K-Means
  • Understand the impact of K on the optimization
  • bjective
  • Why and how to perform Principal Components

Analysis