Principal Component Analysis 4/7/17 PCA: the setting Unsupervised - - PowerPoint PPT Presentation

principal component analysis
SMART_READER_LITE
LIVE PREVIEW

Principal Component Analysis 4/7/17 PCA: the setting Unsupervised - - PowerPoint PPT Presentation

Principal Component Analysis 4/7/17 PCA: the setting Unsupervised learning Unlabeled data Dimensionality reduction Simplify the data representation What does the algorithm do? Performs an affine change of basis. Rotate and


slide-1
SLIDE 1

Principal Component Analysis

4/7/17

slide-2
SLIDE 2

PCA: the setting

Unsupervised learning

  • Unlabeled data

Dimensionality reduction

  • Simplify the data representation

What does the algorithm do?

  • Performs an affine change of basis.
  • Rotate and translate the data set so that most of the

variance lies along the axes.

  • Eliminates dimensions with little variation.
slide-3
SLIDE 3

Change of Basis Examples So Far

Support vector machines

  • Data that's not linearly separable in the standard basis

may be (approximately) linearly separable in a transformed basis.

  • The kernel trick sometimes lets us work with high-

dimensional bases.

Approximate Q-learning

  • When the state space is too large for Q-learning, we may

be able to extract features that summarize the state space well.

  • We then learn values as a linear function of the

transformed representation.

slide-4
SLIDE 4

Change of Basis in PCA

This looks like the change of basis from linear algebra.

  • PCA performs an affine transformation of the
  • riginal basis.
  • Affine ≣ linear plus a constant

The goal:

  • Find a new basis where most of the variance in the

data is along the axes.

  • Hopefully only a small subset of the new axes will

be important.

slide-5
SLIDE 5

PCA Change of Basis Illustrated

slide-6
SLIDE 6

PCA: step one

First step: center the data.

  • From each dimension, subtract the mean value of

that dimension.

  • This is the "plus a constant" part, afterwards we'll

perform a linear transformation.

  • The centroid is now a vector of zeros.

x0 x1 x2 x3 x4 4 3

  • 4

1 2 8

  • 1
  • 2
  • 5
  • 2

6

  • 7
  • 6
  • 3

means 1.2

  • 2.4

x0 x1 x2 x3 x4 2.8 1.8

  • 5.2
  • .2

.8 8

  • 1
  • 2
  • 5

.4 8.4

  • 4.6
  • 3.6
  • .6

Original Data Centered Data

slide-7
SLIDE 7

PCA: step two

The hard part: find an orthogonal basis that's a linear transformation of the original, where the variance in the data is explained by as few dimensions as possible. Basis: set of vectors that span the dimensions. Orthogonal: all vectors are perpendicular. Linear transformation: rotate all vectors by the same amount. Explaining the variance: low co-variance across dimensions.

[0,1] [1,0]

slide-8
SLIDE 8

PCA: step three

Last step: reduce the dimension.

  • Sort the dimensions of the new basis by how much

the data varies.

  • Throw away some of the less-important

dimensions.

  • Could keep a specific number of dimensions.
  • Could keep all dimensions with variance above some

threshold.

  • This results in a projection into the subspace of the

remaining axes.

slide-9
SLIDE 9

Computing PCA (step two)

  • Construct the covariance matrix.
  • m x m (m is the number of dimensions) matrix.
  • Diagonal entries give variance along each dimension.
  • Off-diagonal entries give cross-dimension covariance.
  • Perform eigenvalue decomposition on the

covariance matrix.

  • Compute the eigenvectors/eigenvalues of the

covariance matrix.

  • Use the eigenvectors as the new basis.
slide-10
SLIDE 10

Covariance Matrix Example

CX = 1 nXXT

2.8 1.8

  • 5.2
  • .2

.8 8

  • 1
  • 2
  • 5

.4 8.4

  • 4.6
  • 3.6
  • .6

X 2.8 8 .4 1.8 8.4

  • 5.2
  • 1

4.6

  • .2
  • 2
  • 5

.8

  • 5
  • .6

XT 7.76 4.8 8.08 4.8 18.8 3.6 8.08 3.6 21.04 CX = ⅕(X)(XT)

slide-11
SLIDE 11

Linear Algebra Review: Eigenvectors

Eigenvectors are vectors that the matrix doesn’t rotate. If X is a matrix, and v is a vector, then v is an eigenvector of x iff there is some constant λ, such that:

  • Xv = λv

λ, the amount by which X stretches the eigenvector is the eigenvalue. np.linalg.eig gives eigenvalues and eigenvectors.

slide-12
SLIDE 12

Linear Algebra Review: Eigenvalue Decomposition

If the matrix (X)(XT) has eigenvectors with eigenvalues for i ∈ {1, …, m}, then the following vectors form an orthonormal basis: The key point: computing the eigenvectors of the covariance matrix gives us the optimal (linear) basis for explaining the variance in our data.

  • Sorting by eigenvalue

tells us the relative importance of each dimension.

slide-13
SLIDE 13

PCA Change of Basis Illustrated

Center the data by subtracting the mean in each dimension. Re-align the data by finding an orthonormal basis for the covariance matrix.

slide-14
SLIDE 14

When is/isn’t PCA helpful?

slide-15
SLIDE 15

Compare Hypothesis Spaces

  • What other dimensionality reduction algorithm(s)

have we seen before? Compare auto-encoders with PCA:

  • What sorts of transformation can each perform?
  • What are advantages/disadvantages of each?
slide-16
SLIDE 16

Auto-Encoders

Idea: train a network for data compression/ dimensionality reduction by throwing away outputs.

2

  • 1

1 3 2

  • 1

1 3 input target = input hidden layer becomes the output training