Data Warehousing and Machine Learning Feature Selection Thomas D. - PowerPoint PPT Presentation

Data Warehousing and Machine Learning Feature Selection Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 16

Feature Selection When features don’t help Data generated by process described by Bayesian network: Class Class ⊕ ⊖ 0.5 0.5 A 1 Class 0 1 ⊕ 0.4 0.6 A 1 A 3 A 3 ⊖ 0.5 0.5 Class 0 1 ⊕ 0.5 0.5 A 2 ⊖ 0.7 0.3 A 1 0 1 A 2 0 1.0 0.0 1 0.0 1.0 Attribute A 2 is just a duplicate of A 1 . Conditional class probability for example: P ( ⊕ | A 1 = 1 , A 2 = 1 , A 3 = 0 ) = 0 . 461 DWML Spring 2008 2 / 16

Feature Selection The Naive Bayes model learned from data: Class Class ⊕ ⊖ 0.5 0.5 A 1 A 2 A 3 A 1 A 2 A 3 Class 0 1 Class 0 1 Class 0 1 ⊕ 0.4 0.6 ⊕ 0.4 0.6 ⊕ 0.5 0.5 ⊖ 0.5 0.5 ⊖ 0.5 0.5 ⊖ 0.7 0.3 In Naive Bayes model: P ( ⊕ | A 1 = 1 , A 2 = 1 , A 3 = 0 ) = 0 . 507 Intuitively: the NB model double counts the information provided by A 1 , A 2 . DWML Spring 2008 3 / 16

Feature Selection The Naive Bayes model with selected features A 1 and A 3 : Class Class ⊕ ⊖ 0.5 0.5 A 1 A 3 A 1 A 3 Class 0 1 Class 0 1 ⊕ 0.4 0.6 ⊕ 0.5 0.5 ⊖ 0.5 0.5 ⊖ 0.7 0.3 In this Naive Bayes model: P ( ⊕ | A 1 = 1 , A 3 = 0 ) = 0 . 461 (and all other posterior class probabilities are also the same as for the true model). DWML Spring 2008 4 / 16

Feature Selection Decision Tree Decision trees learned from the same data: A 3 A 1 1 0 1 0 A 2 A 3 A 3 A 1 1 0 1 0 1 0 1 0 ⊕ / ⊖ : ⊕ / ⊖ : ⊕ / ⊖ : ⊕ / ⊖ : ⊕ / ⊖ : ⊕ / ⊖ : ⊕ / ⊖ : ⊕ / ⊖ : 0 . 66 / 0 . 57 / 0 . 46 / 0 . 36 / 0 . 66 / 0 . 46 / 0 . 57 / 0 . 36 / 0 . 33 0 . 43 0 . 54 0 . 64 0 . 33 0 . 54 0 . 43 0 . 64 Decision tree does not test two equivalent variables twice on one branch (but might pick one or the other on different branches). DWML Spring 2008 5 / 16

Feature Selection Problems • Correlated features can skew prediction • Irrelevant features (not correlated to class variable) cause unnecessary blowup of model space (search space) • Irrelevant features can drown the information provided by informative features in noise (e.g. distance function dominated by random values of many uninformative features) • Irrelevant features in a model reduce its explanatory value (also when predictive accuracy is not reduced). DWML Spring 2008 6 / 16

Feature Selection Problems • Correlated features can skew prediction • Irrelevant features (not correlated to class variable) cause unnecessary blowup of model space (search space) • Irrelevant features can drown the information provided by informative features in noise (e.g. distance function dominated by random values of many uninformative features) • Irrelevant features in a model reduce its explanatory value (also when predictive accuracy is not reduced). Methods of feature selection • Define relevance of features, and filter out irrelevant features before learning (relevance independent of used model). • Filter features based on model-specific criteria (e.g. eliminate highly correlated features for Naive Bayes). • Wrapper approach : evaluate feature subsets by model performance. DWML Spring 2008 6 / 16

Feature Selection Relevance A possible definition: A feature A i is irrelevant if for all a ∈ states ( A i ) and c ∈ states ( C ) P ( C = c | A i = a ) = P ( C = c ) . DWML Spring 2008 7 / 16

Feature Selection Relevance A possible definition: A feature A i is irrelevant if for all a ∈ states ( A i ) and c ∈ states ( C ) P ( C = c | A i = a ) = P ( C = c ) . Limitations of relevance based filtering: • Even if A i is irrelevant, it may become relevant in the presence of another feature, i.e. for some A j and a ′ ∈ states ( A j ) : P ( C = c | A i = a , A j = a ′ ) � = P ( C = c | A j = a ′ ) . • For Naive Bayes: irrelevant features neither help nor hurt • Irrelevance does not capture redundancy Generally: difficult to say in a data- and method-independent way what features are useful. DWML Spring 2008 7 / 16

Feature Selection The Wrapper Approach [Kohavi,John 97] • Search over possible feature subsets • Candidate feature subsets v are evaluated: • Construct a model using features v using the given learning method (= induction algorithm). • Evaluate performance of the model using cross-validation • Assign score f ( v ) : average predictive accuracy in cross-validation • Best feature subset found is used to learn final model DWML Spring 2008 8 / 16

Feature Selection Feature Selection Search The feature subset lattice for 4 attributes: E.g. 1 , 0 , 1 , 0 represents feature subset { A 1 , A 3 } . Search space too big for exhaustive search! DWML Spring 2008 9 / 16

Feature Selection Greedy Search f(v)= 0.5 DWML Spring 2008 10 / 16

Feature Selection Greedy Search f(v)= 0.5 0.6 0.7 0.6 0.5 DWML Spring 2008 10 / 16

Feature Selection Greedy Search f(v)= 0.5 0.7 0.72 Search terminates when no score improvement obtained by expansion. DWML Spring 2008 10 / 16

Feature Selection Best First Search f(v)= 0.5 open: closed: best: DWML Spring 2008 11 / 16

Feature Selection Best First Search f(v)= 0.5 open: 0.6 0.7 0.6 0.5 closed: best: DWML Spring 2008 11 / 16

Feature Selection Best First Search f(v)= 0.5 open: 0.6 0.7 0.6 0.5 closed: best: 0.65 0.63 0.72 DWML Spring 2008 11 / 16

Feature Selection Best First Search f(v)= 0.5 open: 0.6 0.7 0.6 0.5 closed: best: 0.65 0.63 0.72 0.69 0.7 DWML Spring 2008 11 / 16

Feature Selection Best First Search f(v)= 0.5 open: 0.6 0.7 0.6 0.5 closed: best: 0.65 0.63 0.72 0.69 0.7 0.75 DWML Spring 2008 11 / 16

Feature Selection Best First Search f(v)= 0.5 open: 0.6 0.7 0.6 0.5 closed: best: 0.65 0.63 0.72 0.82 0.69 0.7 0.75 DWML Spring 2008 11 / 16

Feature Selection Best First Search Seach continues until k consecutive expansions have not generated any score improvement for feature subset. Can also be used as anytime algorithm : search continues indefinitely, current ’best’ is always available. DWML Spring 2008 11 / 16

Feature Selection Experimental Results: Accuracy Results from [Kohavi,John 97]. Table gives accuracy in 10-fold cross-validation. Comparison of algorithm using all features, and algorithm using selected features (-FSS). Here with greedy search. DWML Spring 2008 12 / 16

Feature Selection Experimental Results: Number of Featues Results from [Kohavi,John 97]. Average number of features selected in 10-fold cross-validation. DWML Spring 2008 13 / 16

Feature Selection Experimental Results: Accuracy Results from [Kohavi,John 97]. Table gives accuracy in 10-fold cross-validation. Comparison of feature subset selection using hill climbing and best first search. DWML Spring 2008 14 / 16

Feature Selection Experimental Results: Number of Featues Results from [Kohavi,John 97]. Average number of features selected in 10-fold cross-validation. DWML Spring 2008 15 / 16

Feature Generation Building new features • Discretization of continuous attributes • Value grouping: e.g. reduce date of sale to month of sale • Synthesize new features: e.g. from A 1 , A 2 (continuous) compute A new := A 1 / A 2 DWML Spring 2008 16 / 16

Data Warehousing and Machine Learning Feature Selection Thomas D. - PowerPoint PPT Presentation

Data Warehousing and Machine Learning Feature Selection Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 16 Feature Selection When features dont help Data generated by process

Database Management Objectives of Lecture 5 Systems Data Warehousing and OLAP Data Warehousing

Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online

CS520 Data Integration, Warehousing, and Provenance 6. Data Warehousing IIT DBGroup Boris

A Generic Solution for A Generic Solution for Warehousing Business Warehousing Business

Warehousing Warehousing are the activities involved in the design and operation of warehouses

Lifting your cargoes faster shorecranes up to 208 tons rhb stevedoring & warehousing Our

TAPPI Shipping, Receiving & Warehousing Workshop TAPPI Shipping, Receiving & Warehousing

4/14/20 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2)

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

5/5/16 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2)

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Data Management and Analysis with Business Applications Data Warehousing Andrea Brunello

Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Artificial algorithm are widely used in computer science and especially in AI. Intelligence

Informed Search strategies AIMA sections 3.5, 3.6 Summary Informed Search strategies

Informed search algorithms Chapter 3, Sections 56 of; based on AIMA Slides c Artificial

CSCI 5582 Artificial Intelligence Lecture 4 Jim Martin CSCI 5582 Fall 2006 Today 9/7

recap: Approximation Versus Generalization VC Analysis Bias-Variance Analysis E out E in +

combinator, numbers, Church numerals Carlos Varela Rennselaer Polytechnic Institute September 8,

An Introduction to Functional Programming TyngRuey Chuang Institute of Information Science

Formal Semantics Aspects to formalize Syntax : whats a syntactically well-formed program? Why

Data Warehousing and Machine Learning Feature Selection Thomas D. - PowerPoint PPT Presentation

Data Warehousing and Machine Learning Feature Selection Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 16 Feature Selection When features dont help Data generated by process

Database Management Objectives of Lecture 5 Systems Data Warehousing and OLAP Data Warehousing

Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online

CS520 Data Integration, Warehousing, and Provenance 6. Data Warehousing IIT DBGroup Boris

A Generic Solution for A Generic Solution for Warehousing Business Warehousing Business

Warehousing Warehousing are the activities involved in the design and operation of warehouses

Lifting your cargoes faster shorecranes up to 208 tons rhb stevedoring &amp; warehousing Our

TAPPI Shipping, Receiving &amp; Warehousing Workshop TAPPI Shipping, Receiving &amp; Warehousing

4/14/20 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2)

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

5/5/16 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2)

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Data Management and Analysis with Business Applications Data Warehousing Andrea Brunello

Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Artificial algorithm are widely used in computer science and especially in AI. Intelligence

Informed Search strategies AIMA sections 3.5, 3.6 Summary Informed Search strategies

Informed search algorithms Chapter 3, Sections 56 of; based on AIMA Slides c Artificial

CSCI 5582 Artificial Intelligence Lecture 4 Jim Martin CSCI 5582 Fall 2006 Today 9/7

recap: Approximation Versus Generalization VC Analysis Bias-Variance Analysis E out E in +

combinator, numbers, Church numerals Carlos Varela Rennselaer Polytechnic Institute September 8,

An Introduction to Functional Programming TyngRuey Chuang Institute of Information Science

Formal Semantics Aspects to formalize Syntax : whats a syntactically well-formed program? Why

Lifting your cargoes faster shorecranes up to 208 tons rhb stevedoring & warehousing Our

TAPPI Shipping, Receiving & Warehousing Workshop TAPPI Shipping, Receiving & Warehousing