The owning house data Can we separate the points with a line? - - PowerPoint PPT Presentation

▶

Sep 09, 2022 278 likes •435 views

Linear Discriminant Analysis Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 28, 2014 The owning house data Can we separate the points with a line? 200 Income (thousand

SLIDE 1

Linear ¡Discriminant ¡Analysis ¡

Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata

August 28, 2014

SLIDE 2

The ¡owning ¡house ¡data ¡

2 ¡

30 40 50 60 70 80 50 100 150 200 Age (Years) Income (thousand rupees)

Can we separate the points with a line? Equivalently, project the points onto another line so that the projection of the points in the two classes are separated

SLIDE 3

Linear ¡Discriminant ¡Analysis ¡(LDA) ¡

§ Reduce dimensionality, preserve as much class discriminatory information as possible

3 ¡

A projection with non- ideal separation A projection with ideal separation

The ¡figures ¡are ¡from ¡Ricardo ¡Gu?errez-‑Osuna’s ¡slides ¡ ¡ ¡ Not ¡same ¡as ¡Latent ¡Dirichlet ¡Alloca?on ¡(also ¡LDA) ¡

SLIDE 4

Projec?on ¡onto ¡a ¡line ¡– ¡basics ¡

2×2 matrix two data points (0.5,0.7) and (1.1,0.8)

4 ¡

0.5 1.1 0.7 0.8 ! " # $ % &

1 ! " # $

1×2 vector norm=1 represents the x axis

0.7 0.8 ! " # $ =

Projection onto the x axis Distances from the origin

0.5 1.1 0.7 0.8 ! " # $ % &

Projection onto the y axis Distances from the origin

1 ! " # $ 0.5 1.1 ! " # $ =

SLIDE 5

Projec?on ¡onto ¡a ¡line ¡– ¡basics ¡

5 ¡

0.5 1.1 0.7 0.8 ! " # $ % &

1 2 1 2 ! " # # $ % & &

1×2 vector, norm=1 the x=y line Projection onto the x=y line Distances from the origin

0.85 1.34 ! " # $ =

w : some unit vector x : any point distance of projection

f x onto the line along

w from origin = wTx wTx : a scalar

SLIDE 6

Projec?on ¡vector ¡for ¡LDA ¡

6 ¡

§ Define a measure of separation (discrimination) § Mean vectors µ1 and µ2 for the two classes c1 and c2, with N1 and N2 points:

µi = 1 Ni x

x∈ci

∑

§ The mean vector projected onto the a unit vector w:

! µi = 1 Ni wTx

x∈ci

∑

= wTµi

SLIDE 7

Better separation of means

Towards ¡maximizing ¡separa?on ¡

§ One approach: find a line such that the distance between projected means is maximized § Objective function J(w)

7 ¡

µ1 µ2

J(w) = ! µ1 − ! µ2 = wT (µ1 −µ2)

Example: if w is the unit vector along x

r y axis

Better separation

SLIDE 8

How ¡much ¡are ¡the ¡points ¡scaQered? ¡

§ Scatter: within each class, variance of the projected points

8 ¡

µ1 µ2

! s2

i =

wTx − ! µi

( )

x∈ci

∑

§ Within-class scatter of the projected samples: !

s2

1 + !

s2

SLIDE 9

Fisher’s ¡discriminant ¡

§ Maximize difference between the projected means, normalized by within-class scatter

9 ¡

µ1 µ2

J(w) = ! µ1 − ! µ2

! s2

1 + !

s2

Separation of means and the points as well

SLIDE 10

Formula?on ¡of ¡the ¡objec?ve ¡func?on ¡

§ Measure of scatter in the feature space (x)

10 ¡

Si = x −µi

( )

x∈ci

∑

x −µi

( )

§ The within-class scatter matrix is: SW = S1 + S2 § The scatter of projections, in terms of SW

! s2

i =

wTx − ! µi

( )

x∈ci

∑

2 =

wTx − wTµi

( )

x∈ci

∑

= wT x −µi

( ) x −µi ( )

x∈ci

∑

Tw = wTSiw

! s2

1 + !

s2

2 = wTSWw

Hence:

SLIDE 11

Formula?on ¡of ¡the ¡objec?ve ¡func?on ¡

§ Similarly, the difference in terms of µi’s in the feature space

11 ¡

! µ1 − ! µ2

2 = wTµ1 − wTµ2

( )

§ Fisher’s objective function in terms of SB and SW

! µ1 − ! µ2

= wT µ1 −µ2

( ) µ1 −µ2 ( )

T SB

! " ## # $ ### w = wTSBw

Between class scatter matrix

J(w) = wTSBw wTSWw

SLIDE 12

Maximizing ¡the ¡objec?ve ¡func?on ¡

§ Take derivative and solve for it being zero

12 ¡

d dw J(w)

[ ] = d

dw wTSBw wTSWw ! " # $ % &= 0 ⇒ wTSWw " # $ % d wTSBw " # $ % dw − wTSBw " # $ % d wTSWw " # $ % dw = 0 ⇒ wTSWw " # $ %2SBw − wTSBw " # $ %2SWw = 0 ⇒ wTSWw wTSWw " # $ % & 'SBw − wTSBw wTSWw " # $ % & 'SWw = 0

Dividing by same denominator

⇒ SBw − J(w)SWw = 0 ⇒ J(w)w = S−1

WSBw

The generalized eigenvalue problem

SLIDE 13

Limita?ons ¡of ¡LDA ¡

§ LDA is a parametric method

– Assumes Gaussian (normal) distribution of data – What if the data is very much non-Gaussian?

13 ¡

µ1=µ2 µ1 µ2

SLIDE 14

Limita?ons ¡of ¡LDA ¡

§ LDA depends on mean for the discriminatory information

– What if it is mainly in the variance?

14 ¡

µ1=µ2