Local Fisher Discriminant Local Fisher Discriminant Analysis for - - PowerPoint PPT Presentation

local fisher discriminant local fisher discriminant
SMART_READER_LITE
LIVE PREVIEW

Local Fisher Discriminant Local Fisher Discriminant Analysis for - - PowerPoint PPT Presentation

ICML2006, Pittsburgh, USA June 25-29, 2006 Local Fisher Discriminant Local Fisher Discriminant Analysis for Supervised Analysis for Supervised Dimensionality Reduction Dimensionality Reduction Masashi Sugiyama Tokyo Institute of Technology,


slide-1
SLIDE 1

June 25-29, 2006 ICML2006, Pittsburgh, USA

Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction

Masashi Sugiyama

Tokyo Institute of Technology, Japan

slide-2
SLIDE 2

2

Dimensionality Reduction Dimensionality Reduction

High dimensional data is not easy to handle: Need to reduce dimensionality We focus on

Linear dimensionality reduction: Supervised dimensionality reduction:

slide-3
SLIDE 3

3

Within-Class Multimodality Within-Class Multimodality

Medical checkup: hormone imbalance (high/low) vs. normal Digit recognition: even (0,2,4,6,8) vs. odd (1,3,5,7,9) Multi-class classification:

  • ne vs. rest

One of the classes has several modes Class 2 (red) Class 1 (blue)

slide-4
SLIDE 4

4

Goal of This Research Goal of This Research

We want to embed multimodal data so that

Between-class separability is maximized Within-class multimodality is preserved

FDA LFDA A C B LPP

Separable but within-class multimodality lost Separable and within-class multimodality preserved Within-class multimodality preserved but non-separable

slide-5
SLIDE 5

5

Fisher Discriminant Analysis (FDA) Fisher Discriminant Analysis (FDA)

Within-class scatter matrix: Between-class scatter matrix: FDA criterion:

Within-class scatter is made small Between-class scatter is made large

Fisher (1936)

slide-6
SLIDE 6

6

Interpretation of FDA Interpretation of FDA

:Number of samples in class :Total number of samples

Pairwise expressions: Samples in the same class are made close Samples in different classes are made apart

slide-7
SLIDE 7

7

−10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10

Examples of FDA Examples of FDA

apart close

−10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10

close apart apart close

FDA does not take within-class multimodality into account

Simple Label-mixed cluster Multimodal

NOTE: FDA can extract only C-1 features since

:Number of classes

slide-8
SLIDE 8

8

Locality Preserving Projection (LPP) Locality Preserving Projection (LPP)

Locality matrix: Affinity matrix: e.g., LPP criterion:

Nearby samples in original space are made close Constraint is to avoid

He & Niyogi (NIPS2003)

slide-9
SLIDE 9

9

−10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10

Examples of LPP Examples of LPP

LPP does not take between-class separability into account (unsupervised)

close close close Simple Label-mixed cluster Multimodal

slide-10
SLIDE 10

10

−10 −5 5 10 −10 −5 5 10

Our Approach Our Approach

Nearby samples in the same class are made close Far-apart samples in the same class are not made close Samples in different classes are made apart

don’t care apart close

We combine FDA and LPP

slide-11
SLIDE 11

11

Local Fisher Discriminent Analysis Local Fisher Discriminent Analysis

Local within-class scatter matrix: Local between-class scatter matrix:

slide-12
SLIDE 12

12

How to Obtain Solution How to Obtain Solution

Since LFDA has a similar form to FDA, solution can be obtained just by solving a generalized eigenvalue problem:

slide-13
SLIDE 13

13

−10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10

Examples of LFDA Examples of LFDA

LFDA works well for all three cases!

Simple Label-mixed cluster Multimodal

Note: Usually so LFDA can extract more than C features (cf. FDA)

slide-14
SLIDE 14

14

Neighborhood Component Analysis (NCA) Neighborhood Component Analysis (NCA)

Minimize leave-one-out error of a stochastic k-nearest neighbor classifier Obtained embedding is separable NCA involves non-convex optimization There are local optima No analytic solution available Slow iterative algorithm LFDA has analytic form of global solution

Goldberger, Roweis, Hinton & Salakhutdinov (NIPS2004)

slide-15
SLIDE 15

15

Maximally Collapsing Metric Learning (MCML) Maximally Collapsing Metric Learning (MCML)

Idea is similar to FDA

Samples in the same class are close (“one point”) Samples in different classes are apart

MCML involves non-convex optimization There exists a nice convex approximation Non-global solution No analytic solution available Slow iterative algorithm

Globerson & Roweis (NIPS2005)

slide-16
SLIDE 16

16

Simulations Simulations

Visualization of UCI data sets:

Letter recognition (D=16) Segment (D=18) Thyroid disease (D=5) Iris (D=4)

Extract 3 classes from original data Merge 2 classes Class 2 (red) Class 1 (blue)

slide-17
SLIDE 17

17

Summary of Simulation Results Summary of Simulation Results

Separable and multimodality preserved Separable but no multimodality Multimodality preserved but no separability

Slow, no multi-modal MCML Slow, local optima NCA LFDA No label-separability LPP No multi-modal FDA Comments Iris Thyr Segm Lett

slide-18
SLIDE 18

18

NCA FDA LFDA A C B LPP MCML

Blue

vs.

Red

FDA LPP LFDA MCML NCA

Letter Recognition Letter Recognition

slide-19
SLIDE 19

19

LPP

MCML NCA

FDA LFDA Brickface Sky Foliage

FDA LPP LFDA MCML NCA

Blue

vs.

Red

Segment Segment

slide-20
SLIDE 20

20

LPP MCML NCA FDA LFDA Hyper Hypo Normal

FDA LPP LFDA MCML NCA

Blue

vs.

Red

Thyroid Disease Thyroid Disease

slide-21
SLIDE 21

21

LFDA Setosa Virginica Verisicolour LPP MCML NCA FDA

FDA LPP LFDA MCML NCA

Blue

vs.

Red

Iris Iris

slide-22
SLIDE 22

22

Kernelization Kernelization

LFDA can be non-linearized by kernel trick FDA: Kernel FDA LPP: Laplacian eigenmap MCML: Kernel MCML NCA: not available yet?

Mika et al. (NNSP1999) Belkin & Niyogi (NIPS2001) Globerson & Roweis (NIPS2005)

slide-23
SLIDE 23

23

Conclusions Conclusions

LFDA effectively combines FDA and LPP. LFDA is suitable for embedding multimodal data. Same as FDA, LFDA has analytic optimal solution thus computationally efficient. Same as LPP, LFDA needs to pre-specify affinity matrix. We used local scaling method for computing affinity, which does not include any tuning parameter.

Zelnik-Manor & Perona (NIPS2004)