WHEN “LESS IS MORE” TECHNIQUES AND APPLICATIONS
Pittsburgh, February 24th of 2010
Esteban García-Cuesta
Researcher at Universidad Carlos III - Spain
APPLICATIONS Pittsburgh, February 24 th of 2010 Less is More 2 3D - - PowerPoint PPT Presentation
Esteban Garca-Cuesta Researcher at Universidad Carlos III - Spain WHEN LESS IS MORE TECHNIQUES AND APPLICATIONS Pittsburgh, February 24 th of 2010 Less is More 2 3D 2D Esteban Garca-Cuesta, Universidad Carlos III de Madrid
Pittsburgh, February 24th of 2010
Researcher at Universidad Carlos III - Spain
Esteban García-Cuesta, Universidad Carlos III de Madrid
3D 2D
2
Esteban García-Cuesta, Universidad Carlos III de Madrid
This talk is about:
High dimensional datasets Two proposals developed during
my PhD. studies
How each of the proposals point
context
Data mining
It is not specifically about:
A machine learning algorithm Computer vision
3
Esteban García-Cuesta, Universidad Carlos III de Madrid
Introduction to dimensionality reduction
Feature selection using eigenvector coefficients (Part I)
Introduction: Principal Components
Analysis
How to use the PCA coefficients for
feature selection
Application to a remote sensing
scenario
Feature extraction models (Part II)
Graphs and embedding graphs Homogeneous structures Remote Sensing application Facial motion feature points
selection
Map building without localization
by DR
Recent trends in dimensionality reduction
4
Hough transform
Esteban García-Cuesta, Universidad Carlos III de Madrid
Modern technologies routinely produce massive amounts
Scientific progress now heavily depends on the ability
The heart of these analysis is the reduction of the
6
Esteban García-Cuesta, Universidad Carlos III de Madrid
High dimensionality:
Most of the machine learning and
data mining techniques are not effective with high dimension datasets.
Irrelevant features. Redundant features.
The so-called “curse of
dimensionality” (CoD)[Bellman’61].
7
Irrelevant features Redundant features
Esteban García-Cuesta, Universidad Carlos III de Madrid
Number of training
Unexpected properties
Euclidean distance tends to
zero
Gaussian behaivor of
uniformly sampled points
8
Laurens van de Maaten, DM Summer-School 2008, Maastricht
Esteban García-Cuesta, Universidad Carlos III de Madrid
The “intrinsic” dimensionality may be smaller than the number of features
Def: the minimum number of
necessary features to preserve the data properties
Other reasons for dimensionality reduction:
Compress data We want to visualize high
dimensional data
Feature Selection
Only a subset of original
features are selected
Discrete Comprehensibility
Feature Extraction
All features are used Continous
9
Esteban García-Cuesta, Universidad Carlos III de Madrid
10
Spectrum of energy
Wavelength(cm-1) Intensity (a.u.)
CO2 H2O CO2
INVERSE MODEL
Retrieve
Spectrum of energy
Wavelength(cm-1) Intensity (a.u.)
FORWARD MODEL
RTE
Length Temperature
Esteban García-Cuesta, Universidad Carlos III de Madrid
We have gathered a dataset X:
N data samples (different flame
D features /variables /dimensions
(each one of the wavelengths)
We want to ‘learn’ from this data:
Inverse of the RTE Regression problem
11
Spectrums of energy
Wavelength(cm-1) Intensity (a.u.)
Length
Temperature Profile LEARN
Esteban García-Cuesta, Universidad Carlos III de Madrid
12
To have an automatic control and diagnosis of combustions in order to obtain energy efficiently and minimize the pollutant emissions
COMBUSTION Healthy dangerous Global warming
Feature selection using the eigenvectors coefficients
Wrapper selection [Kohavi’97]
Esteban García-Cuesta, Universidad Carlos III de Madrid
Def: a process that chooses an
according to an objective function
Objectives
To reduce dimensionality and
remove noise
To improve mining performance
Speed of learning Accuracy Simplicity and comprehensibility
Supervised
Exploits input-output relations Unstable due to multicollinearity Wrapped approach There are many subsets
Unsupervised
Feature ranking based on a quality
metric
Based on variance and separability
14
Esteban García-Cuesta, Universidad Carlos III de Madrid
15
[Kohavi & John ‘97]
Esteban García-Cuesta, Universidad Carlos III de Madrid
16
In high dimensional data: Large number of features to work with Many irrelevant features and which is more important many
redundant ones
Individual feature evaluation (filter approach) Focus on identifying relevant features without handling feature
redundancy or feature relations
Feature subset selection (wrapper approach) Rely on the evaluation of the subset to handle the redundancy
(too many possibilities)
Esteban García-Cuesta, Universidad Carlos III de Madrid
17
Esteban García-Cuesta, Universidad Carlos III de Madrid
18
Its main objective is to reduce the dimensionality but conserving the total variance
:[p x p] covariance matrix :[p x k] eigenvector matrix :[k x k] diagonal eigenvalue matrix : k dimensional projection : column vector of the k eigenvector : input data column matrix observation
Esteban García-Cuesta, Universidad Carlos III de Madrid
19
Eigenvector 1 Coefficients of feature i
Esteban García-Cuesta, Universidad Carlos III de Madrid
Key idea is that high absolute value
20
relevant features high absolute value coefficients
Esteban García-Cuesta, Universidad Carlos III de Madrid
21
Very appealing because of
simplicity
It lacks of redundancy
control
Features Nº
Eigenvector coefficient αk (a.u.)
Esteban García-Cuesta, Universidad Carlos III de Madrid
Key idea is that similar
absolute value coefficients means high correlation between their associated features
On the other extreme very
22
Irrelevant features coefficients 0 Redundant features similar coefficients Different eigenvectors uncorrelated bases
Esteban García-Cuesta, Universidad Carlos III de Madrid
23
Features Nº
Eigenvector coefficient αk (a.u.)
Select a feature with the
highest value for different ranges
Difficult to choose the
Esteban García-Cuesta, Universidad Carlos III de Madrid
24
Wavelength (cm-1) X-space
Emmits Absorbs
Adjacent wavelengths/features have similar space information
Esteban García-Cuesta, Universidad Carlos III de Madrid
25
Features Nº
Eigenvector coefficient αk (a.u.)
Selection of features with high and different coefficient values Similar features have similar information Locally find features with high coefficient values
“Multilayer perceptron as inverse model in a ground-based remote sensing temperature retrieval problem” J. Eng. Appl. Artif. Intell., Vol.21:26-34, Issue 1, February 2008.
Esteban García-Cuesta, Universidad Carlos III de Madrid
26
Calculate the covariance input matrix Σ = XXT
1.
Obtain the eigenvectors α and the eigenvalues Λ of Σ and select αq
2.
Use the selected subset of features as input in a machine learning algorithm
4.
Select a subset of features applying a maximum value algorithm to αq
3.
PCA Guided features selection
Esteban García-Cuesta, Universidad Carlos III de Madrid
27
Subset of selected original features
Wavelength number (cm-1) Eigenvector (a.u.)
Esteban García-Cuesta, Universidad Carlos III de Madrid
A MLP neural network has
been used for estimation purposes
Cross-validation Proofs with different number
The proposed GFS improves
and converges faster than B4
The error increases adding
more features
28
B4 GFS
20 30 40 50 60 70 80 90 100
Number of selected features
7 6.5 6 5.5 5 4.5 4 3.5
Error (K)
Esteban García-Cuesta, Universidad Carlos III de Madrid
29
Esteban García-Cuesta, Universidad Carlos III de Madrid
30
We developed a feature selection method based
It allows to introduce a priori known knowledge The selection of original features allows to design
Reduce the cost of the equipment Reduce the cost of massive data storage
Esteban García-Cuesta, Universidad Carlos III de Madrid
31
Note to the users of provided slides: We would be delighted if you
found this our material useful in giving your own lectures or talks. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of significant portion of these slides in your own lecture or talk, please include this message and authorship according to CC BY license (powerpoint is available under demand via email egarciacuesta@gmail.com).
This work has been funded by
Pittsburgh, February 24th of 2010
Researcher at Universidad Carlos III - Spain
Non-linear DR
Esteban García-Cuesta, Universidad Carlos III de Madrid
Is a process of creating a new
set of features by a general transformation of the original high dimensional data
Unsupervised PCA, ICA, KPCA, manifold
learning, etc.
Supervised CCA, KCCA, RRR, PLS, etc. Semi-supervised Based on graphs
34
Esteban García-Cuesta, Universidad Carlos III de Madrid
We have observed that exists
local lineal behavior vs. a global non-linear one
Divided the problem into n-
subproblems
Get a better accuracy modeling
these n-subproblems
35
Spectrums of energy Temperature Profiles LEARN
1 … … … … n
Esteban García-Cuesta, Universidad Carlos III de Madrid
36 36
1 2 3 4 5 6 7 6 7 5 4 2 1 3 1 2 3 4 5 6 7 6 7 5 4 2 1 3 Input space Output space Original space Projected space
Esteban García-Cuesta, Universidad Carlos III de Madrid
37
Sample 1 Sample 2 Similarity degree
Esteban García-Cuesta, Universidad Carlos III de Madrid
38
Based on LLE (Local Linear Embedding) [Roweis, S. et. al.’00] and ISOMAP [Tenenbaum, J. B. et. al.’00]. “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering”, [M. Belkin, P. Niyogi, NIPS 2001].
Esteban García-Cuesta, Universidad Carlos III de Madrid
39
(X) (Y)
G H U W Inputs Outputs
Structure W
Linear model
Esteban García-Cuesta, Universidad Carlos III de Madrid
40
1.
Define the adjacency graph
ε-vecinos
K-nearest neighbors
2.
Choose the similarity function and compute the affinity matrices U and V
3.
Solve the eigenproblem
4.
Project the data in the new subspace
5.
Apply a density clustering technique (DBSCAN)
6.
Build different models for the different discovered groups
“Discriminant regression analysis to find homogeneous structures”. International Conference on Intelligent Data Engineering and Automated Learning, IDEAL’2009
Esteban García-Cuesta, Universidad Carlos III de Madrid
41
Group 1 Group 2 Group 3 Group 4 Test
“Recursive discriminant regression analysis to find homogeneous structures”. Int. J. Neur. Process.,2010 (submitted as invited paper) [garcia-cuesta’10]
Esteban García-Cuesta, Universidad Carlos III de Madrid
42
Replicant (Blade runner, 1982)
Mars rover project
Esteban García-Cuesta, Universidad Carlos III de Madrid
Feature Selection Using Principal Feature Analysis [ Ira Cohen, et. Al ’07]
44
Calculate the covariance input matrix Σ = XXT
1.
Obtain the eigenvectors α and the eigenvalues Λ of Σ and select αq
2.
Use the selected subset of features as input in a machine learning algorithm
5.
Cluster with K-means the eigenvector rows of αq
3.
Find the centroid of each clustered feature vectors and chose that feature
4. Features selection
PCA
Esteban García-Cuesta, Universidad Carlos III de Madrid
Feature Selection Using Principal Feature Analysis
45
Coefficients of feature 1 and 2
Esteban García-Cuesta, Universidad Carlos III de Madrid
46
[ Ira Cohen, et. Al ’07]
Esteban García-Cuesta, Universidad Carlos III de Madrid
Are the mapping and
localization really inseparable ?
Are the motion and
measurement models necessary ?
How about the aspect of
map building as an abstraction of the world ?
Is there another map building
framework ?
Reconsider the robot map
building from the viewpoint of dimensionality reduction and propose an alternative framework
Heuristics : Closely located
histories of being observed by a robot
47
Esteban García-Cuesta, Universidad Carlos III de Madrid
48
An observation about an object is roughly dependent
Esteban García-Cuesta, Universidad Carlos III de Madrid
49
XY coordinates Historical observation matrix “If two objects are closely located, their histories of observation are similar ”
Imagine a mapping between and object position and its history observation
Esteban García-Cuesta, Universidad Carlos III de Madrid
50
1.
2.
3.
Errors of ≈ 0.13m in a 2.5m environment
Esteban García-Cuesta, Universidad Carlos III de Madrid
51
Face recognition Planning Image segmentation (spectral clustering technique) To improve any machine learning algorithm applied
Esteban García-Cuesta, Universidad Carlos III de Madrid
52
On-line dimensionality reduction methods Incorporating prior knowledge
Semi-supervised dimensionality reduction [“Semi-Supervised Learning Literature Survey” Xiaojin Zhu’08]
Combining feature selection with extraction
Methods which consider feature interaction among all the
INTERACT [Zhao-Liu07I]
Esteban García-Cuesta, Universidad Carlos III de Madrid
53
We can conclude that in high dimensional data we
IEEE - IEEE Computer Society Member E-mail (University): esteban.garcia@uc3m.es (old) E-mail (Personal): egarciacuesta@gmail.com
Esteban García-Cuesta, Universidad Carlos III de Madrid
55
Note to the users of provided slides: We would be delighted if you
found this our material useful in giving your own lectures or talks. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of significant portion of these slides in your own lecture or talk, please include this message and authorship according to CC BY license (powerpoint is available under demand via email egarciacuesta@gmail.com).
This work has been funded by