Unsupervised learning General introduction to unsupervised learning - - PowerPoint PPT Presentation

unsupervised learning
SMART_READER_LITE
LIVE PREVIEW

Unsupervised learning General introduction to unsupervised learning - - PowerPoint PPT Presentation

Unsupervised learning General introduction to unsupervised learning PCA Special directions These are special directions we will try to find. Best direction u : |u| 2 = 1 2 1. Minimize : d i X i T u is the projection length x i u d i T


slide-1
SLIDE 1

Unsupervised learning

  • General introduction to unsupervised learning
slide-2
SLIDE 2

PCA

slide-3
SLIDE 3

Special directions

These are special directions we will try to find.

slide-4
SLIDE 4

: u Best direction

Xi u

xi

T u is the projection length

di

  • 1. Minimize: Σdi

2

  • 2. Maximize: Σ (xi

Tu)2

is the direction that maximizes the variance u |u|2 = 1

slide-5
SLIDE 5

Xi u di Find u that maximize: Σ (xi

Tu)2

) u

T

x ) ( x

T

u = (

2

) u

T i

x ( max Σ (uTxi) (xi

T u)

[V] u

T

u = max where: [V] = Σ(xi xi

T)

Finding the best projection:

slide-6
SLIDE 6

The data matrix:

[V] =

[V] = Σ(xi xi

T) = XXT

X XT

slide-7
SLIDE 7

u Best direction

  • Will minimize the distances from it
  • Will maximize the variance along it

Max(u): uT [V] u subject to: |u| = 1

With Lagrange multipliers: Maximize uT [V] u - λ(uT u – 1) Derivative with respect to the vector u: [V]u – λu = 0 [V]u = λu The best direction will be the first eigenvector of [V] d/dx (xT U x) = 2Ux d/dx (xT x) = 2x

slide-8
SLIDE 8

Best direction u:

Xi u di

The best direction will be the first eigenvector of [V]; u1 with variance λ1 The next direction will be the second eigenvector of [V]; u2 with variance λ2 The Principle Components will be the eigenvectors of the data matrix

slide-9
SLIDE 9

PCs, Variance and Least-Squares

  • The first PC retains the greatest amount of variation in the

sample

  • The kth PC retains the kth greatest fraction of the variation in

the sample

  • The kth largest eigenvalue of the correlation matrix C is the

variance in the sample along the kth PC

  • The least-squares view: PCs are a series of linear least

squares fits to a sample, each orthogonal to all previous ones

slide-10
SLIDE 10

5 10 15 20 25 PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 Variance (%)

Dimensionality Reduction

Can ignore the components of lesser significance.

You do lose some information, but if the eigenvalues are small, you don’t lose much – n dimensions in original data – calculate n eigenvectors and eigenvalues – choose only the first k eigenvectors, based on their eigenvalues – final data set has only k dimensions

Scree plot

slide-11
SLIDE 11

PC dimensionality reduction

In the linear case only

slide-12
SLIDE 12

PCA and correlations

  • We can think of our data points as k points

from a distribution p(x)

  • We have k samples (x1 y1) (x2 y2)… …(xk yk)
slide-13
SLIDE 13

PCA and correlations

  • We have k samples (x1 y1) (x2 y2)… …(xk yk)
  • The correlation between(x,y) is: E [ (x-x0) (y – y0) / σx σy ]
  • For centered variables, x,y are uncorrelated if E(xy) = 0
slide-14
SLIDE 14

v1 v2

Correlation depends on the coordinates: (x,y) are correlated, (v1 v2) are not

slide-15
SLIDE 15

In the PC coordinates, the variables are uncorrelated

  • The projection of a point xi on v1 is: xi

Tv1

(or v1

Txi ).

  • The projection of a point xi on v2 is: xi

Tv2

  • For the correlation, we take the sum: Σi (v1

Txi) (xi Tv2 )

  • =

Σi v1

Txi xi Tv2 = v1 T C v2

  • Where C = XTX. (the data matrix)
  • Since the vi are eigenvectors of C,

C v2 = λ2 v2

  • v1

T C v2 = λ2 v1 T v2 = 0

  • The variables are uncorrelated.
  • This is a result of using as coordinates the eigenvectors of the correlation

matrix C = XTX.

slide-16
SLIDE 16

In the PC coordinates the variables are uncorrelated

  • The correlation depends on the coordinate system. We can start with

variables (x,y) which are correlated, transform them to (x', y') that will be un-correlated

  • If we use the coordinates defined by the eigenvectors of XXT the

variables (or the vectors xi of n projections on the i'th axis) will be uncorrelated.

slide-17
SLIDE 17

Properties of the PCA

  • The subspace spanned by the first k PC retains the maximal variance
  • This subspace minimized the distance of the points from the subspace
  • The transformed variables, which are linear combinations of the original
  • nes, are uncorrelated.
slide-18
SLIDE 18

Best plane, minimizing perpendicular distance over all planes

slide-19
SLIDE 19

Eigenfaces: PC of face images

  • Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognitive

Neuroscience 3 (1991) 71–86.

slide-20
SLIDE 20

Image Representation

  • Training set of m images of size N*N are

represented by vectors of size N2 x1,x2,x3,…,xM

Example

3 3

1 5 4 2 1 3 3 2 1

          

1 9

1 5 4 2 1 3 3 2 1

                            

Need to be well aligned

slide-21
SLIDE 21

Average Image and Difference Images

  • The average training set is defined by

m= (1/m) ∑m

i=1 xi

  • Each face differs from the average by vector

ri = xi – m

slide-22
SLIDE 22

Covariance Matrix

  • The covariance matrix is constructed as

C = AA

T where A=[r1,…,rm]

  • Finding eigenvectors of N2 x N2 matrix is intractable. Hence, use the matrix

A

TA of size m x m and find eigenvectors of this small matrix.

Size of this matrix is N2 x N2

slide-23
SLIDE 23

Face data matrix:

XXT is N2 * N2

X XT

m m N2 N2 XTX is m * m

slide-24
SLIDE 24

Eigenvectors of Covariance Matrix

  • Consider the eigenvectors vi of ATA such that

ATAvi = mivi

  • Pre-multiplying both sides by A, we have

AAT(Avi) = mi(Avi)

  • Avi is an eigenvector of our original AAT
  • Find the eigenvectors vi of the small ATA
  • Get the ‘eigen-faces’ by Avi
slide-25
SLIDE 25

Face Space

  • ui resemble facial images which look ghostly, hence called Eigenfaces
slide-26
SLIDE 26

Projection into Face Space

  • A face image can be projected into this face space by

pk = UT(xk – m) Rows of UT are the eigenfaces pk are the m coefficients of face xk This is the representation of a face using eigen-faces This representation can then be used for recognition using different recognition algorithms

slide-27
SLIDE 27

Recognition in ‘face space’

  • Turk and Pentland used 16 faces, and 7 pc
  • In this case the face representation p:
  • pk = UT(xk – m) is 7-long vector
  • Face classification:
  • Several images per face-class.
  • For a new test image I: obtain the representation pI
  • Turk-Pentland used simple nearest neighbor
  • Find NN in each class, take the nearest,
  • s.t. distance < ε, otherwise result is ‘unknown’
  • Other algorithms are possible, e.g. SVM
slide-28
SLIDE 28

Face detection by ‘face space’

  • Turk-Pentland used ‘faceness’ measure:
  • Within a window, compare the original image with its reconstruction

from face-space

  • Find the distance Є between the original image x and its reconstructed

image from the eigenface space, xf, Є2 = || x – xf ||2 , where xf = Up + μ (reconstructed face)

  • If ε < θ for a threshold θ
  • A face is detected in the window
  • Not ‘state-of-the-art and not fast enough
  • Eigenfaces in the brain?
slide-29
SLIDE 29

Next: PCA by Neurons