Feature selection and extraction Petr Po s k Czech Technical - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Feature selection and extraction Petr Poˇ s´ ık Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics P. Poˇ s´ ık c � 2015 Artificial Intelligence – 1 / 18

Feature selection P. Poˇ s´ ık c � 2015 Artificial Intelligence – 2 / 18

Motivation Why? To reduce overfitting which arises ■ when we have a regular data set ( | T | > D ), but too flexible model, and/or Feature selection ■ when we have a high-dimensional data set with not enough data ( | T | < D ). • Motivation • Example • Classification of feature selection methods Univariate methods of feature selection Multivariate methods of feature selection Feature extraction Conclusions P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 18

Motivation Why? To reduce overfitting which arises ■ when we have a regular data set ( | T | > D ), but too flexible model, and/or Feature selection ■ when we have a high-dimensional data set with not enough data ( | T | < D ). • Motivation • Example • Classification of Data sets with thousands or millions of variables (features) are quite usual these days: we feature selection want to choose only those, which are needed to construct simpler, faster, and more accurate methods Univariate methods of models . feature selection Features Multivariate methods of feature selection Feature extraction Cases Conclusions Cases P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 18

Example: Fisher’s Iris data Complete enumeration of all combinations of features: LOO Xval Error: Leave-one-out crossvalidation error Feature selection Input features LOO Xval Error • Motivation SL SW PL PW Dec. tree 3-NN • Example • Classification of # inputs 100.0 % 100.0 % feature selection methods 1 input x 26.7 % 28.7 % Univariate methods of x 41.3 % 47.3 % feature selection x 6.0 % 8.0 % Multivariate methods x 5.3 % 4.0 % of feature selection 2 inputs x x 23.3 % 24.0 % Feature extraction x x 6.7 % 5.3 % Conclusions x x 5.3 % 4.0 % x x 6.0 % 6.0 % x x 5.3 % 4.7 % x x 4.7 % 5.3 % 3 inputs x x x 6.7 % 7.3 % x x x 5.3 % 5.3 % x x x 4.7 % 3.3 % x x x 4.7 % 4.7 % All inputs x x x x 4.7 % 4.7 % ■ Decision tree reaches its lowest error (4.7 %) whenever PL and PW are among the inputs; it is able to choose them for decision making, more features do not harm. ■ 3-NN itself does not contain any feature selection method, it uses all features available. The lowest error is usually not achieved when using all inputs! P. Poˇ s´ ık c � 2015 Artificial Intelligence – 4 / 18

Classification of feature selection methods Classification based on the number of variables considered together: ■ Univariate methods, variable ranking: consider the input variables (features, attributes) one by one. Feature selection • Motivation ■ Multivariate methods, variable subset selection: • Example consider whole groups of variables together. • Classification of feature selection methods Univariate methods of feature selection Multivariate methods of feature selection Feature extraction Conclusions P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 18

Classification of feature selection methods Classification based on the number of variables considered together: ■ Univariate methods, variable ranking: consider the input variables (features, attributes) one by one. Feature selection • Motivation ■ Multivariate methods, variable subset selection: • Example consider whole groups of variables together. • Classification of feature selection methods Univariate methods of Classification based on the use of the ML model in the feature selection process: feature selection ■ Filter: selects a subset of variables independently of the model that shall Multivariate methods of feature selection subsequently use them. Feature extraction ■ Wrapper: selects a subset of variables taking into accoiunt the model that shall use Conclusions them. ■ Embedded method: the feature selection method is built in the ML model (or rather its training algorithm) itself (e.g. decision trees). P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 18

Univariate methods of feature selection P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 18

Variable ranking ■ The main or auxiliary technique in many more complex methods. ■ Simple and scalable, often works well in practice. Incomplete list of methods usable for various combinations of input and output variable. Output variable Y Input variable X Nominal Continuous Nominal Confusion matrix analysis T-test, ANOVA p ( Y ) vs. p ( Y | X ) ROC (AUC) χ 2 -test of independence discretize Y (see the left column) Inf. gain (see decision trees) Continuous T-test, ANOVA correlation ROC (AUC) regression logistic regression discretize Y (see the left column) discretize X (see the top row) discretize X (see the top row) ■ All the methods provide a score which can be used to rank the input variables according to the “size of relationship” with the output variable. ■ Statistical tests provide the so-called p -values (attained level of significance); these may serve to judge the absolute “importance” of an attribute. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 18

Variable ranking ■ The main or auxiliary technique in many more complex methods. ■ Simple and scalable, often works well in practice. Incomplete list of methods usable for various combinations of input and output variable. Output variable Y Input variable X Nominal Continuous Nominal Confusion matrix analysis T-test, ANOVA p ( Y ) vs. p ( Y | X ) ROC (AUC) χ 2 -test of independence discretize Y (see the left column) Inf. gain (see decision trees) Continuous T-test, ANOVA correlation ROC (AUC) regression logistic regression discretize Y (see the left column) discretize X (see the top row) discretize X (see the top row) ■ All the methods provide a score which can be used to rank the input variables according to the “size of relationship” with the output variable. ■ Statistical tests provide the so-called p -values (attained level of significance); these may serve to judge the absolute “importance” of an attribute. However, we can make many mistakes when relying on univariate methods! P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 18

Redundant variables? Redundant variable ■ does not bring any new information about the dependent variable. Feature selection Univariate methods of feature selection • Variable ranking • Redundant variables? • Correlation influence on redundancy? • Useless variables? Multivariate methods of feature selection Feature extraction Conclusions P. Poˇ s´ ık c � 2015 Artificial Intelligence – 8 / 18

Redundant variables? Redundant variable ■ does not bring any new information about the dependent variable. Feature selection Univariate methods of Are we able to judge the redundancy of a variable looking just on 1D projections? feature selection • Variable ranking −5 0 5 1 0.5 0 −5 0 5 1 0.5 0 5 5 5 5 • Redundant variables? • Correlation influence on redundancy? • Useless variables? 0 0 Y 0 0 Y Multivariate methods of feature selection Feature extraction Conclusions −5 −5 −5 −5 1 1 0.5 0.5 0 0 −5 0 5 −5 0 5 X X ■ Based on the 1D projections, it seems that both variables on the left have similar relationship with the class. (So one of them is redundant, right?) On the right, one variable seems to be useless ( Y ), the other ( X ) seems to carry more information about the class than each of the variables on the left (the “peaks” are better separated). P. Poˇ s´ ık c � 2015 Artificial Intelligence – 8 / 18

Redundant variables? Redundant variable ■ does not bring any new information about the dependent variable. Feature selection Univariate methods of Are we able to judge the redundancy of a variable looking just on 1D projections? feature selection • Variable ranking −5 0 5 1 0.5 0 −5 0 5 1 0.5 0 5 5 5 5 • Redundant variables? • Correlation influence on redundancy? • Useless variables? 0 0 Y 0 0 Y Multivariate methods of feature selection Feature extraction Conclusions −5 −5 −5 −5 1 1 0.5 0.5 0 0 −5 0 5 −5 0 5 X X ■ Based on the 1D projections, it seems that both variables on the left have similar relationship with the class. (So one of them is redundant, right?) On the right, one variable seems to be useless ( Y ), the other ( X ) seems to carry more information about the class than each of the variables on the left (the “peaks” are better separated). ■ The situation on the right is the same as the situation on the left, only rotated. If we decided to throw away one of the variables on the left, we wouldn’t be able to create the situation on the right. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 8 / 18

Feature selection and extraction Petr Po s k Czech Technical - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Feature selection and extraction Petr Po s k Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Extraction 7-1 Ronald Peikert SciVis 2008 - Feature Extraction What are features?

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

Object based feature extraction of Google based feature extraction of Google Object Earth

AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

AI-Augmented Algorithms How I Learned to Stop Worrying and Love Choice Lars Kotthofg

Come up with your own definition of the theory of evolution through natural selection Natural

WTF: Whats the function? - How Applied Behaviour Analysis Can Enrich Transactional Analysis

Realist Security: The State, Anarchy and Power Week 3 - 11 October 2017 Realist House

Top-down Ground Proof Procedure Idea: search backward from a query to determine if it is a logical

Feature Selection Yingyu Liang Computer Sciences 760 Fall 2017

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den

Top-down Definite Clause Proof Procedure Idea: search backward from a query to determine if it is

Feature selection and extraction Petr Po s k Czech Technical - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Feature selection and extraction Petr Po s k Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Extraction 7-1 Ronald Peikert SciVis 2008 - Feature Extraction What are features?

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

PCA &amp; ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

Object based feature extraction of Google based feature extraction of Google Object Earth

AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

AI-Augmented Algorithms How I Learned to Stop Worrying and Love Choice Lars Kotthofg

Come up with your own definition of the theory of evolution through natural selection Natural

WTF: Whats the function? - How Applied Behaviour Analysis Can Enrich Transactional Analysis

Realist Security: The State, Anarchy and Power Week 3 - 11 October 2017 Realist House

Top-down Ground Proof Procedure Idea: search backward from a query to determine if it is a logical

Feature Selection Yingyu Liang Computer Sciences 760 Fall 2017

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den

Top-down Definite Clause Proof Procedure Idea: search backward from a query to determine if it is

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani