APPLICATIONS Pittsburgh, February 24 th of 2010 Less is More 2 3D - - PowerPoint PPT Presentation

applications
SMART_READER_LITE
LIVE PREVIEW

APPLICATIONS Pittsburgh, February 24 th of 2010 Less is More 2 3D - - PowerPoint PPT Presentation

Esteban Garca-Cuesta Researcher at Universidad Carlos III - Spain WHEN LESS IS MORE TECHNIQUES AND APPLICATIONS Pittsburgh, February 24 th of 2010 Less is More 2 3D 2D Esteban Garca-Cuesta, Universidad Carlos III de Madrid


slide-1
SLIDE 1

WHEN “LESS IS MORE” TECHNIQUES AND APPLICATIONS

Pittsburgh, February 24th of 2010

Esteban García-Cuesta

Researcher at Universidad Carlos III - Spain

slide-2
SLIDE 2

Esteban García-Cuesta, Universidad Carlos III de Madrid

“Less is More”

3D 2D

2

slide-3
SLIDE 3

Esteban García-Cuesta, Universidad Carlos III de Madrid

Summary

 This talk is about:

 High dimensional datasets  Two proposals developed during

my PhD. studies

 How each of the proposals point

  • f view can help in a robotics

context

 Data mining

 It is not specifically about:

 A machine learning algorithm  Computer vision

3

slide-4
SLIDE 4

Esteban García-Cuesta, Universidad Carlos III de Madrid

Outline

Introduction to dimensionality reduction

Feature selection using eigenvector coefficients (Part I)

 Introduction: Principal Components

Analysis

 How to use the PCA coefficients for

feature selection

 Application to a remote sensing

scenario

Feature extraction models (Part II)

 Graphs and embedding graphs  Homogeneous structures  Remote Sensing application  Facial motion feature points

selection

 Map building without localization

by DR

Recent trends in dimensionality reduction

4

slide-5
SLIDE 5
  • Motivation
  • Problems related with high dimensional data

Introduction to Dimensionality Reduction

5

Hough transform

slide-6
SLIDE 6

Esteban García-Cuesta, Universidad Carlos III de Madrid

Motivation

 Modern technologies routinely produce massive amounts

  • f data

 Scientific progress now heavily depends on the ability

to process and analyze high dimensional data

 The heart of these analysis is the reduction of the

dimensionality by selecting a subset of original features or obtaining a well-chosen combination of them

6

slide-7
SLIDE 7

Esteban García-Cuesta, Universidad Carlos III de Madrid

Problems Related with HD

High dimensionality:

 Most of the machine learning and

data mining techniques are not effective with high dimension datasets.

 Irrelevant features.  Redundant features.

 The so-called “curse of

dimensionality” (CoD)[Bellman’61].

7

Irrelevant features Redundant features

slide-8
SLIDE 8

Esteban García-Cuesta, Universidad Carlos III de Madrid

CoD

 Number of training

instances needed to `populate’ a space grows exponentially with dimensionality

 Unexpected properties

 Euclidean distance tends to

zero

 Gaussian behaivor of

uniformly sampled points

8

Laurens van de Maaten, DM Summer-School 2008, Maastricht

What can we do?

slide-9
SLIDE 9

Esteban García-Cuesta, Universidad Carlos III de Madrid

Dimensionality Reduction

The “intrinsic” dimensionality may be smaller than the number of features

 Def: the minimum number of

necessary features to preserve the data properties

Other reasons for dimensionality reduction:

 Compress data  We want to visualize high

dimensional data

 Feature Selection

 Only a subset of original

features are selected

 Discrete  Comprehensibility

 Feature Extraction

 All features are used  Continous

9

slide-10
SLIDE 10

Esteban García-Cuesta, Universidad Carlos III de Madrid

Remote Sensing Application

10

Spectrum of energy

Wavelength(cm-1) Intensity (a.u.)

  • Infrared Sensor

CO2 H2O CO2

INVERSE MODEL

Retrieve

Spectrum of energy

Wavelength(cm-1) Intensity (a.u.)

FORWARD MODEL

RTE

Length Temperature

slide-11
SLIDE 11

Esteban García-Cuesta, Universidad Carlos III de Madrid

Machine Learning Approach

We have gathered a dataset X:

 N data samples (different flame

  • bservations)

 D features /variables /dimensions

(each one of the wavelengths)

We want to ‘learn’ from this data:

 Inverse of the RTE  Regression problem

11

Spectrums of energy

Wavelength(cm-1) Intensity (a.u.)

Length

Temperature Profile LEARN

slide-12
SLIDE 12

Esteban García-Cuesta, Universidad Carlos III de Madrid

Why is Important to Solve the IRTE

12

To have an automatic control and diagnosis of combustions in order to obtain energy efficiently and minimize the pollutant emissions

COMBUSTION Healthy dangerous Global warming

slide-13
SLIDE 13
  • Introduction: Principal Component Analysis
  • How to use the PCA to select a subset of original features
  • Applied to remote sensing data

Feature selection using the eigenvectors coefficients

13

Wrapper selection [Kohavi’97]

slide-14
SLIDE 14

Esteban García-Cuesta, Universidad Carlos III de Madrid

Feature Selection

 Def: a process that chooses an

  • ptimal subset of features

according to an objective function

 Objectives

 To reduce dimensionality and

remove noise

 To improve mining performance

 Speed of learning  Accuracy  Simplicity and comprehensibility

Supervised

 Exploits input-output relations  Unstable due to multicollinearity  Wrapped approach  There are many subsets 

Unsupervised

 Feature ranking based on a quality

metric

 Based on variance and separability

  • f the data (PCA)

14

slide-15
SLIDE 15

Esteban García-Cuesta, Universidad Carlos III de Madrid

Subset Search Problem

15

[Kohavi & John ‘97]

slide-16
SLIDE 16

Esteban García-Cuesta, Universidad Carlos III de Madrid

Feature Selection

16

 In high dimensional data:  Large number of features to work with  Many irrelevant features and which is more important many

redundant ones

 Individual feature evaluation (filter approach)  Focus on identifying relevant features without handling feature

redundancy or feature relations

 Feature subset selection (wrapper approach)  Rely on the evaluation of the subset to handle the redundancy

(too many possibilities)

slide-17
SLIDE 17

Esteban García-Cuesta, Universidad Carlos III de Madrid

Multicollinearity

17

slide-18
SLIDE 18

Esteban García-Cuesta, Universidad Carlos III de Madrid

PCA

18

1 2

Its main objective is to reduce the dimensionality but conserving the total variance

:[p x p] covariance matrix :[p x k] eigenvector matrix :[k x k] diagonal eigenvalue matrix : k dimensional projection : column vector of the k eigenvector : input data column matrix observation

slide-19
SLIDE 19

Esteban García-Cuesta, Universidad Carlos III de Madrid

PCA Coefficients

19

Eigenvector 1 Coefficients of feature i

slide-20
SLIDE 20

Esteban García-Cuesta, Universidad Carlos III de Madrid

PCA Coefficients

Key idea is that high absolute value

coefficients means more influence

20

relevant features  high absolute value coefficients

slide-21
SLIDE 21

Esteban García-Cuesta, Universidad Carlos III de Madrid

B4 Method [Jolliffe,02]

21

 Very appealing because of

simplicity

 It lacks of redundancy

control

Features Nº

Eigenvector coefficient αk (a.u.)

slide-22
SLIDE 22

Esteban García-Cuesta, Universidad Carlos III de Madrid

Analysis of PCA Coefficients

 Key idea is that similar

absolute value coefficients means high correlation between their associated features

 On the other extreme very

independent features has maximum distance

22

Irrelevant features  coefficients  0 Redundant features  similar coefficients Different eigenvectors  uncorrelated bases

slide-23
SLIDE 23

Esteban García-Cuesta, Universidad Carlos III de Madrid

Redundancy Control

23

Features Nº

Eigenvector coefficient αk (a.u.)

 Select a feature with the

highest value for different ranges

 Difficult to choose the

threshold

slide-24
SLIDE 24

Esteban García-Cuesta, Universidad Carlos III de Madrid

Using a Priory Specific Knowledge

24

  • Infrared Sensor

Wavelength (cm-1) X-space

Emmits Absorbs

Adjacent wavelengths/features have similar space information

slide-25
SLIDE 25

Esteban García-Cuesta, Universidad Carlos III de Madrid

Guided Feature Selection [garcia-cuesta’08]

25

Features Nº

Eigenvector coefficient αk (a.u.)

Selection of features with high and different coefficient values Similar features have similar information Locally find features with high coefficient values

“Multilayer perceptron as inverse model in a ground-based remote sensing temperature retrieval problem” J. Eng. Appl. Artif. Intell., Vol.21:26-34, Issue 1, February 2008.

slide-26
SLIDE 26

Esteban García-Cuesta, Universidad Carlos III de Madrid

Algorithm

26

Calculate the covariance input matrix Σ = XXT

1.

Obtain the eigenvectors α and the eigenvalues Λ of Σ and select αq

2.

Use the selected subset of features as input in a machine learning algorithm

4.

Select a subset of features applying a maximum value algorithm to αq

3.

PCA Guided features selection

slide-27
SLIDE 27

Esteban García-Cuesta, Universidad Carlos III de Madrid

Guided Feature Selection

27

Subset of selected original features

Wavelength number (cm-1) Eigenvector (a.u.)

slide-28
SLIDE 28

Esteban García-Cuesta, Universidad Carlos III de Madrid

Remote Sensing Application Results

 A MLP neural network has

been used for estimation purposes

 Cross-validation  Proofs with different number

  • f hidden neurons

 The proposed GFS improves

and converges faster than B4

 The error increases adding

more features

28

B4 GFS

20 30 40 50 60 70 80 90 100

Number of selected features

7 6.5 6 5.5 5 4.5 4 3.5

Error (K)

slide-29
SLIDE 29

Esteban García-Cuesta, Universidad Carlos III de Madrid

Remote Sensing Application Results

29

slide-30
SLIDE 30

Esteban García-Cuesta, Universidad Carlos III de Madrid

Feature Selection

30

 We developed a feature selection method based

  • n PCA to reveal the dependency between features

 It allows to introduce a priori known knowledge  The selection of original features allows to design

specific sensors

 Reduce the cost of the equipment  Reduce the cost of massive data storage

slide-31
SLIDE 31

Esteban García-Cuesta, Universidad Carlos III de Madrid

31

 Note to the users of provided slides: We would be delighted if you

found this our material useful in giving your own lectures or talks. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of significant portion of these slides in your own lecture or talk, please include this message and authorship according to CC BY license (powerpoint is available under demand via email egarciacuesta@gmail.com).

This work has been funded by

slide-32
SLIDE 32

WHEN “LESS IS MORE” TECHNIQUES AND APPLICATIONS (PART II)

Pittsburgh, February 24th of 2010

Esteban García-Cuesta

Researcher at Universidad Carlos III - Spain

slide-33
SLIDE 33
  • Homogeneous structures
  • Graphs and embedding graphs
  • Graph analysis

Feature extraction

33

Non-linear DR

slide-34
SLIDE 34

Esteban García-Cuesta, Universidad Carlos III de Madrid

Feature Extraction

 Is a process of creating a new

set of features by a general transformation of the original high dimensional data

 Unsupervised  PCA, ICA, KPCA, manifold

learning, etc.

 Supervised  CCA, KCCA, RRR, PLS, etc.  Semi-supervised  Based on graphs

34

slide-35
SLIDE 35

Esteban García-Cuesta, Universidad Carlos III de Madrid

Feature Extraction: Motivation

 We have observed that exists

local lineal behavior vs. a global non-linear one

 Divided the problem into n-

subproblems

 Get a better accuracy modeling

these n-subproblems

35

Spectrums of energy Temperature Profiles LEARN

1 … … … … n

slide-36
SLIDE 36

Esteban García-Cuesta, Universidad Carlos III de Madrid

Homogeneous Structures Analysis

36 36

1 2 3 4 5 6 7 6 7 5 4 2 1 3 1 2 3 4 5 6 7 6 7 5 4 2 1 3 Input space Output space Original space Projected space

slide-37
SLIDE 37

Esteban García-Cuesta, Universidad Carlos III de Madrid

Embedding Graphs

37

Similarity Matrix

Sample 1 Sample 2 Similarity degree

slide-38
SLIDE 38

Esteban García-Cuesta, Universidad Carlos III de Madrid

Laplacian eigenmaps

38

Based on LLE (Local Linear Embedding) [Roweis, S. et. al.’00] and ISOMAP [Tenenbaum, J. B. et. al.’00]. “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering”, [M. Belkin, P. Niyogi, NIPS 2001].

slide-39
SLIDE 39

Esteban García-Cuesta, Universidad Carlos III de Madrid

Graph Analysis

39

(X) (Y)

  • bjeto

G H U W Inputs Outputs

Structure W

Linear model

Sw Sb

slide-40
SLIDE 40

Esteban García-Cuesta, Universidad Carlos III de Madrid

Algorithm [garcia-cuesta’09]

40

1.

Define the adjacency graph

ε-vecinos

K-nearest neighbors

2.

Choose the similarity function and compute the affinity matrices U and V

3.

Solve the eigenproblem

4.

Project the data in the new subspace

5.

Apply a density clustering technique (DBSCAN)

6.

Build different models for the different discovered groups

“Discriminant regression analysis to find homogeneous structures”. International Conference on Intelligent Data Engineering and Automated Learning, IDEAL’2009

slide-41
SLIDE 41

Esteban García-Cuesta, Universidad Carlos III de Madrid

Remote Sensing Application Results

41

Group 1 Group 2 Group 3 Group 4 Test

“Recursive discriminant regression analysis to find homogeneous structures”. Int. J. Neur. Process.,2010 (submitted as invited paper) [garcia-cuesta’10]

slide-42
SLIDE 42

Esteban García-Cuesta, Universidad Carlos III de Madrid

Remote Sensing Application Results

42

slide-43
SLIDE 43
  • Facial Motion Feature Points Selection
  • Map Building Without Localization by DR

Applications to Robotics

43

Replicant (Blade runner, 1982)

Mars rover project

slide-44
SLIDE 44

Esteban García-Cuesta, Universidad Carlos III de Madrid

Feature Selection Using Principal Feature Analysis [ Ira Cohen, et. Al ’07]

44

Calculate the covariance input matrix Σ = XXT

1.

Obtain the eigenvectors α and the eigenvalues Λ of Σ and select αq

2.

Use the selected subset of features as input in a machine learning algorithm

5.

Cluster with K-means the eigenvector rows of αq

3.

Find the centroid of each clustered feature vectors and chose that feature

4. Features selection

PCA

slide-45
SLIDE 45

Esteban García-Cuesta, Universidad Carlos III de Madrid

Feature Selection Using Principal Feature Analysis

45

Coefficients of feature 1 and 2

k

slide-46
SLIDE 46

Esteban García-Cuesta, Universidad Carlos III de Madrid

Facial Motion Feature Points Selection

46

[ Ira Cohen, et. Al ’07]

slide-47
SLIDE 47

Esteban García-Cuesta, Universidad Carlos III de Madrid

Map Building without Localization by Dimensionality Reduction Techniques [Takehisa YAIRI’07]

 Are the mapping and

localization really inseparable ?

 Are the motion and

measurement models necessary ?

 How about the aspect of

map building as an abstraction of the world ?

 Is there another map building

framework ?

 Reconsider the robot map

building from the viewpoint of dimensionality reduction and propose an alternative framework

 Heuristics : Closely located

  • bjects tend to share similar

histories of being observed by a robot

47

slide-48
SLIDE 48

Esteban García-Cuesta, Universidad Carlos III de Madrid

LFMDR

48

An observation about an object is roughly dependent

  • nly on its location, given the map and robots position
slide-49
SLIDE 49

Esteban García-Cuesta, Universidad Carlos III de Madrid

LFMDR

49

XY coordinates Historical observation matrix “If two objects are closely located, their histories of observation are similar ”

Imagine a mapping between and object position and its history observation

slide-50
SLIDE 50

Esteban García-Cuesta, Universidad Carlos III de Madrid

LFMDR algorithm

50

1.

Explore the environment and obtain the observation history data

2.

Apply a DR method based on embedding graphs and

  • btain a set of 2D vectors

3.

Perform the optimal Affine transformation with respect to original position and obtain the estimates

Errors of ≈ 0.13m in a 2.5m environment

slide-51
SLIDE 51

Esteban García-Cuesta, Universidad Carlos III de Madrid

Other Applications to Robotics

51

 Face recognition  Planning  Image segmentation (spectral clustering technique)  To improve any machine learning algorithm applied

to a robotics problem

slide-52
SLIDE 52

Esteban García-Cuesta, Universidad Carlos III de Madrid

Feature Selection and Extraction: New Trends

52

 On-line dimensionality reduction methods  Incorporating prior knowledge

 Semi-supervised dimensionality reduction [“Semi-Supervised Learning Literature Survey” Xiaojin Zhu’08]

 Combining feature selection with extraction

 Methods which consider feature interaction among all the

  • riginal features

 INTERACT [Zhao-Liu07I]

slide-53
SLIDE 53

Esteban García-Cuesta, Universidad Carlos III de Madrid

Conclusion

53

 We can conclude that in high dimensional data we

first need to reduce the data to "less" by removing the irrelevant/redundant parts, and thus achieve better results, that is “less is more”.

slide-54
SLIDE 54
  • Dr. Esteban García-Cuesta

Universidad Carlos III

IEEE - IEEE Computer Society Member E-mail (University): esteban.garcia@uc3m.es (old) E-mail (Personal): egarciacuesta@gmail.com

Thank you!

slide-55
SLIDE 55

Esteban García-Cuesta, Universidad Carlos III de Madrid

55

 Note to the users of provided slides: We would be delighted if you

found this our material useful in giving your own lectures or talks. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of significant portion of these slides in your own lecture or talk, please include this message and authorship according to CC BY license (powerpoint is available under demand via email egarciacuesta@gmail.com).

This work has been funded by