 
              Esteban García-Cuesta Researcher at Universidad Carlos III - Spain WHEN “LESS IS MORE” TECHNIQUES AND APPLICATIONS Pittsburgh, February 24 th of 2010
“ Less is More” 2 3D 2D Esteban García-Cuesta, Universidad Carlos III de Madrid
Summary 3  This talk is about :  It is not specifically about :  High dimensional datasets  A machine learning algorithm  Two proposals developed during  Computer vision my PhD. studies  How each of the proposals point of view can help in a robotics context  Data mining Esteban García-Cuesta, Universidad Carlos III de Madrid
Outline 4 Introduction to dimensionality  Homogeneous structures  reduction  Remote Sensing application Feature selection using eigenvector   Facial motion feature points coefficients (Part I) selection  Introduction: Principal Components Analysis  Map building without localization by DR  How to use the PCA coefficients for feature selection Recent trends in dimensionality   Application to a remote sensing reduction scenario Feature extraction models (Part II)   Graphs and embedding graphs Esteban García-Cuesta, Universidad Carlos III de Madrid
Hough transform 5 Introduction to Dimensionality Reduction • Motivation • Problems related with high dimensional data
Motivation 6  Modern technologies routinely produce massive amounts of data  Scientific progress now heavily depends on the ability to process and analyze high dimensional data  The heart of these analysis is the reduction of the dimensionality by selecting a subset of original features or obtaining a well-chosen combination of them Esteban García-Cuesta, Universidad Carlos III de Madrid
Problems Related with HD 7 High dimensionality:   Most of the machine learning and data mining techniques are not effective with high dimension datasets.  Irrelevant features.  Redundant features.  The so- called “curse of Irrelevant features Redundant features dimensionality” ( CoD) [Bellman’61] . Esteban García-Cuesta, Universidad Carlos III de Madrid
CoD 8  Number of training instances needed to `populate’ a space grows exponentially with dimensionality What can we do?  Unexpected properties  Euclidean distance tends to zero  Gaussian behaivor of uniformly sampled points Laurens van de Maaten, DM Summer-School 2008, Maastricht Esteban García-Cuesta, Universidad Carlos III de Madrid
Dimensionality Reduction 9  Feature Selection The “intrinsic” dimensionality may be  smaller than the number of features  Only a subset of original features are selected  Def: the minimum number of  Discrete necessary features to preserve the data properties  Comprehensibility Other reasons for dimensionality  reduction:  Feature Extraction  Compress data  All features are used  We want to visualize high  Continous dimensional data Esteban García-Cuesta, Universidad Carlos III de Madrid
Remote Sensing Application 10 FORWARD MODEL Spectrum of energy Infrared Sensor CO 2 H 2 O Intensity (a.u.) -- CO 2 Wavelength(cm -1 ) INVERSE MODEL Spectrum of energy RTE Intensity (a.u.) Temperature Retrieve Wavelength(cm -1 ) Length Esteban García-Cuesta, Universidad Carlos III de Madrid
Machine Learning Approach 11 We have gathered a dataset X:   N data samples (different flame Spectrums of energy Temperature Profile observations) Intensity (a.u.)  D features /variables /dimensions (each one of the wavelengths) LEARN We want to ‘learn’ from this data: Wavelength(cm -1 )  Length  Inverse of the RTE  Regression problem Esteban García-Cuesta, Universidad Carlos III de Madrid
Why is Important to Solve the IRTE 12 COMBUSTION Global warming Healthy dangerous To have an automatic control and diagnosis of combustions in order to obtain energy efficiently and minimize the pollutant emissions Esteban García-Cuesta, Universidad Carlos III de Madrid
Wrapper selection [Kohavi’97] 13 Feature selection using the eigenvectors coefficients • Introduction: Principal Component Analysis • How to use the PCA to select a subset of original features • Applied to remote sensing data
Feature Selection 14  Def: a process that chooses an Supervised  optimal subset of features  Exploits input-output relations according to an objective  Unstable due to multicollinearity function  Wrapped approach  Objectives  There are many subsets  To reduce dimensionality and remove noise Unsupervised   To improve mining performance  Feature ranking based on a quality metric  Speed of learning  Based on variance and separability  Accuracy of the data (PCA)  Simplicity and comprehensibility Esteban García-Cuesta, Universidad Carlos III de Madrid
Subset Search Problem 15 [Kohavi & John ‘97] Esteban García-Cuesta, Universidad Carlos III de Madrid
Feature Selection 16  In high dimensional data:  Large number of features to work with  Many irrelevant features and which is more important many redundant ones  Individual feature evaluation (filter approach)  Focus on identifying relevant features without handling feature redundancy or feature relations  Feature subset selection (wrapper approach)  Rely on the evaluation of the subset to handle the redundancy (too many possibilities) Esteban García-Cuesta, Universidad Carlos III de Madrid
Multicollinearity 17 Esteban García-Cuesta, Universidad Carlos III de Madrid
PCA 18 Its main objective is to reduce the  dimensionality but conserving the total variance  2  1 :[p x p] covariance matrix : k dimensional projection :[p x k] eigenvector matrix : column vector of the k eigenvector :[k x k] diagonal eigenvalue matrix : input data column matrix observation Esteban García-Cuesta, Universidad Carlos III de Madrid
PCA Coefficients 19 Eigenvector 1 Coefficients of feature i Esteban García-Cuesta, Universidad Carlos III de Madrid
PCA Coefficients 20  Key idea is that high absolute value coefficients means more influence relevant features  high absolute value coefficients Esteban García-Cuesta, Universidad Carlos III de Madrid
B4 Method [Jolliffe,02] 21  Very appealing because of Eigenvector coefficient α k (a.u.) simplicity  It lacks of redundancy control Features Nº Esteban García-Cuesta, Universidad Carlos III de Madrid
Analysis of PCA Coefficients 22  Key idea is that similar absolute value coefficients means high correlation Irrelevant features  coefficients  0 between their associated features Redundant features  similar coefficients  On the other extreme very independent features has maximum distance Different eigenvectors  uncorrelated bases Esteban García-Cuesta, Universidad Carlos III de Madrid
Redundancy Control 23  Select a feature with the Eigenvector coefficient α k (a.u.) highest value for different ranges  Difficult to choose the threshold Features Nº Esteban García-Cuesta, Universidad Carlos III de Madrid
Using a Priory Specific Knowledge 24 Infrared Sensor -- Emmits Absorbs Adjacent wavelengths/features have similar space information Wavelength (cm-1) X-space Esteban García-Cuesta, Universidad Carlos III de Madrid
Guided Feature Selection [garcia- cuesta’08] 25 “Multilayer perceptron as inverse model in a ground-based remote sensing temperature retrieval problem” J. Eng. Appl. Artif. Intell., Vol.21:26-34, Issue 1, February 2008. Selection of features with high and different coefficient values Eigenvector coefficient α k (a.u.) Similar features have similar information Locally find features with high coefficient values Features Nº Esteban García-Cuesta, Universidad Carlos III de Madrid
Algorithm 26 PCA 1. 2. Calculate the covariance input Obtain the eigenvectors α and matrix the eigenvalues Λ of Σ and Σ = XX T select α q Select a subset of features Use the selected subset of applying a maximum value features as input in a machine algorithm to α q learning algorithm 3. 4. Guided features selection Esteban García-Cuesta, Universidad Carlos III de Madrid
Guided Feature Selection 27 Subset of selected original features Eigenvector (a.u.) Wavelength number (cm -1 ) Esteban García-Cuesta, Universidad Carlos III de Madrid
Remote Sensing Application Results 28  A MLP neural network has been used for estimation 7 B4 purposes GFS 6.5  Cross-validation 6 Error (K)  Proofs with different number 5.5 of hidden neurons 5 4.5  The proposed GFS improves 4 and converges faster than B4 3.5 20 30 40 50 60 70 80 90 100  The error increases adding Number of selected features more features Esteban García-Cuesta, Universidad Carlos III de Madrid
Remote Sensing Application Results 29 Esteban García-Cuesta, Universidad Carlos III de Madrid
Feature Selection 30  We developed a feature selection method based on PCA to reveal the dependency between features  It allows to introduce a priori known knowledge  The selection of original features allows to design specific sensors  Reduce the cost of the equipment  Reduce the cost of massive data storage Esteban García-Cuesta, Universidad Carlos III de Madrid
Recommend
More recommend