The owning house data Can we separate the points with a line? - - PowerPoint PPT Presentation

the owning house data
SMART_READER_LITE
LIVE PREVIEW

The owning house data Can we separate the points with a line? - - PowerPoint PPT Presentation

Linear Discriminant Analysis Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 28, 2014 The owning house data Can we separate the points with a line? 200 Income (thousand


slide-1
SLIDE 1

Linear ¡Discriminant ¡Analysis ¡

Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata

August 28, 2014

slide-2
SLIDE 2

The ¡owning ¡house ¡data ¡

2 ¡

30 40 50 60 70 80 50 100 150 200 Age (Years) Income (thousand rupees)

Can we separate the points with a line? Equivalently, project the points onto another line so that the projection of the points in the two classes are separated

slide-3
SLIDE 3

Linear ¡Discriminant ¡Analysis ¡(LDA) ¡

§ Reduce dimensionality, preserve as much class discriminatory information as possible

3 ¡

A projection with non- ideal separation A projection with ideal separation

The ¡figures ¡are ¡from ¡Ricardo ¡Gu?errez-­‑Osuna’s ¡slides ¡ ¡ ¡ Not ¡same ¡as ¡Latent ¡Dirichlet ¡Alloca?on ¡(also ¡LDA) ¡

slide-4
SLIDE 4

Projec?on ¡onto ¡a ¡line ¡– ¡basics ¡

2×2 matrix two data points (0.5,0.7) and (1.1,0.8)

4 ¡

0.5 1.1 0.7 0.8 ! " # $ % &

1 ! " # $

1×2 vector norm=1 represents the x axis

0.7 0.8 ! " # $ =

Projection onto the x axis Distances from the origin

0.5 1.1 0.7 0.8 ! " # $ % &

Projection onto the y axis Distances from the origin

1 ! " # $ 0.5 1.1 ! " # $ =

slide-5
SLIDE 5

Projec?on ¡onto ¡a ¡line ¡– ¡basics ¡

5 ¡

0.5 1.1 0.7 0.8 ! " # $ % &

1 2 1 2 ! " # # $ % & &

1×2 vector, norm=1 the x=y line Projection onto the x=y line Distances from the origin

0.85 1.34 ! " # $ =

w : some unit vector x : any point distance of projection

  • f x onto the line along

w from origin = wTx wTx : a scalar

slide-6
SLIDE 6

Projec?on ¡vector ¡for ¡LDA ¡

6 ¡

§ Define a measure of separation (discrimination) § Mean vectors µ1 and µ2 for the two classes c1 and c2, with N1 and N2 points:

µi = 1 Ni x

x∈ci

§ The mean vector projected onto the a unit vector w:

! µi = 1 Ni wTx

x∈ci

= wTµi

slide-7
SLIDE 7

Better separation of means

Towards ¡maximizing ¡separa?on ¡

§ One approach: find a line such that the distance between projected means is maximized § Objective function J(w)

7 ¡

µ1 µ2

J(w) = ! µ1 − ! µ2 = wT (µ1 −µ2)

Example: if w is the unit vector along x

  • r y axis

Better separation

slide-8
SLIDE 8

How ¡much ¡are ¡the ¡points ¡scaQered? ¡

§ Scatter: within each class, variance of the projected points

8 ¡

µ1 µ2

! s2

i =

wTx − ! µi

( )

x∈ci

2

§ Within-class scatter of the projected samples: !

s2

1 + !

s2

2

slide-9
SLIDE 9

Fisher’s ¡discriminant ¡

§ Maximize difference between the projected means, normalized by within-class scatter

9 ¡

µ1 µ2

J(w) = ! µ1 − ! µ2

2

! s2

1 + !

s2

2

Separation of means and the points as well

slide-10
SLIDE 10

Formula?on ¡of ¡the ¡objec?ve ¡func?on ¡

§ Measure of scatter in the feature space (x)

10 ¡

Si = x −µi

( )

x∈ci

x −µi

( )

T

§ The within-class scatter matrix is: SW = S1 + S2 § The scatter of projections, in terms of SW

! s2

i =

wTx − ! µi

( )

x∈ci

2 =

wTx − wTµi

( )

x∈ci

2

= wT x −µi

( ) x −µi ( )

x∈ci

Tw = wTSiw

! s2

1 + !

s2

2 = wTSWw

Hence:

slide-11
SLIDE 11

Formula?on ¡of ¡the ¡objec?ve ¡func?on ¡

§ Similarly, the difference in terms of µi’s in the feature space

11 ¡

! µ1 − ! µ2

2 = wTµ1 − wTµ2

( )

2

§ Fisher’s objective function in terms of SB and SW

! µ1 − ! µ2

2

= wT µ1 −µ2

( ) µ1 −µ2 ( )

T SB

! " ## # $ ### w = wTSBw

Between class scatter matrix

J(w) = wTSBw wTSWw

slide-12
SLIDE 12

Maximizing ¡the ¡objec?ve ¡func?on ¡

§ Take derivative and solve for it being zero

12 ¡

d dw J(w)

[ ] = d

dw wTSBw wTSWw ! " # $ % &= 0 ⇒ wTSWw " # $ % d wTSBw " # $ % dw − wTSBw " # $ % d wTSWw " # $ % dw = 0 ⇒ wTSWw " # $ %2SBw − wTSBw " # $ %2SWw = 0 ⇒ wTSWw wTSWw " # $ % & 'SBw − wTSBw wTSWw " # $ % & 'SWw = 0

Dividing by same denominator

⇒ SBw − J(w)SWw = 0 ⇒ J(w)w = S−1

WSBw

The generalized eigenvalue problem

slide-13
SLIDE 13

Limita?ons ¡of ¡LDA ¡

§ LDA is a parametric method

– Assumes Gaussian (normal) distribution of data – What if the data is very much non-Gaussian?

13 ¡

µ1=µ2 µ1 µ2

slide-14
SLIDE 14

Limita?ons ¡of ¡LDA ¡

§ LDA depends on mean for the discriminatory information

– What if it is mainly in the variance?

14 ¡

µ1=µ2