Feature Selection Gavin Brown www.cs.man.ac.uk/~gbrown The Usual - PowerPoint PPT Presentation

Aug 28, 2022 •538 likes •724 views

COMP61011 : Machine Learning Feature Selection Gavin Brown www.cs.man.ac.uk/~gbrown The Usual Supervised Learning Approach data + labels Learning Algorithm Model Predicted label Testing data Predicting Recurrence of Lung Cancer ofi Only a

COMP61011 : Machine Learning Feature Selection Gavin Brown www.cs.man.ac.uk/~gbrown
The Usual Supervised Learning Approach data + labels Learning Algorithm Model Predicted label Testing data
Predicting Recurrence of Lung Cancer ofi Only a few genes actually matter! Need small, interpretable subset to help doctors! fl
Text classi fi cation.... is this news story “ i nterestin g” ? “ B ag-of-Words ” representation: x = { 0 , 3 , 0 , 0 , 1 , ..., 2 , 3 , 0 , 0 , 0 , 1 } one entry per word! Easily 50,000 words! Very sparse - easy to over fi t! Need accuracy, otherwise we lose visitors to our news website!
The Usual Supervised Learning Approach ????? data + labels Learning Algorithm – OVERWHELMED! Model Predicted label Testing data
With big data….  Time complexity  Computational cost  Cost in data collection  Over-fitting  Lack of interpretability Feature selection
Some things matter, Some do not. Relevant features - those that we need to perform well Irrelevant features - those that are simply unnecessary Redundant features - those that become irrelevant in the presence of others
3 main categories of Feature Selection techniques: Wrappers, Filters, Embedded methods
Wrappers: Evaluation method Feature set Pros :  Model-oriented  Usually gets good performance for the model you choose. Trains a model Cons : Trains a model  Hugely computationally expensive. Outputs accuracy
Wrappers: Search strategy  With an exhaustive search 101110000001000100001000000000100101010 20 features … 1 million feature sets to check 25 features … 33.5 million sets 30 features … 1.1 billion sets  Need for a search strategy  Sequential forward selection  Recursive backward elimination  Genetic algorithms  Simulated annealing  …
Wrappers: Sequential Forward Selection
Search Complexity for Sequential Forward Selection
Feature Selection (2): Filters
Search Complexity for Filter Methods Pros :  A lot less expensive! Cons :  Not model-oriented
Feature Selection (3): Embedded methods Principle : the classifier performs feature selection as part of the learning procedure Example : the logistic LASSO (Tibshirani, 1996) With Error Function: Cross-entropy error Regularizing term Pros :  Performs feature selection as part of learning the procedure Cons :  Computationally demanding
Conclusions on Feature Selection Potential benefits Wrappers generally infeasible on the modern “big data” problem. Filters mostly heuristics, but can be formalized in some cases. - Manchester MLO group works on this challenge.
This is the End of the Course Unit…. That’s it. We’re done. Exam in January – past papers on website. MSc students: Projects due Friday, 4pm CDT/MRes students: 1 week later. You need to submit a hardcopy to SSO: - your 6 page (maximum) report You need to send by email to Gavin : - the report as PDF, and a ZIP file of your code.

Recommend

Mike New man Mike New man Mike New man Mike New man Mike New man Mike New man Mike New man

Mike New man Mike New man Mike New man Mike New man Mike New man Mike New man Mike New man Mike New man SVP & Chief Financial Officer SVP & Chief Financial Officer 4Q 2002 Sales 4Q 2002 Sales Comparable Store Sales +2%

478 views • 32 slides

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Feature Selection Feature Extraction Feature Selection Feature Extraction Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2 Principal Components Analysis (PCA) Old Dominion Univ. Factor Analysis (FA)

602 views • 6 slides

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Feature Selection Feature Extraction Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection Feature Extraction Outline Feature Selection 1 Feature Extraction 2 Principal Components Analysis (PCA) Factor

479 views • 36 slides

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature 3 Feature 4 output ? Yes Low Medium Bad Feature test in the first level Yes Yes High Medium Bad ? No High Medium Good No No

579 views • 23 slides

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection Hypothesis Tests Examined (e.g. t-test): Useful for discarding features But does not tell us about overlap between classes for a feature! At Left

526 views • 21 slides

MAN ELECTRIC | ELECTRONIC SYSTEMS MAN TRUCK & BUS AG MAN Electric/Electronic Systems

395 views • 36 slides

Causal and Non-Causal Feature Selection for Ridge Regression Gavin Cawley School of Computing

Causal and Non-Causal Feature Selection for Ridge Regression Gavin Cawley School of Computing Sciences University of East Anglia Norwich, United Kingdom gcc@cmp.uea.ac.uk Wednesday 3 rd June 2008 Introduction Causal feature selection

246 views • 13 slides

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Department of Applied Mechanics A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of Turbulent Combustion of Turbulent Combustion of Turbulent Combustion of Turbulent Combustion of Lean

695 views • 46 slides

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature Generation The creation of new data features in an automated fashion from existing data features Multiplicative Interactions You have variables A

513 views • 33 slides

PHYS30441 Electrodynamics (M) Introduction L1 Gavin Smith Room 4.12 gavin.smith@manchester.ac.uk

PHYS30441 Electrodynamics (M) Introduction L1 Gavin Smith Room 4.12 gavin.smith@manchester.ac.uk http://nucpc100.ph.man.ac.uk/electrodynamics/ Syllabus 0. Introduction and Overview (1 Lecture) 1. Electromagnetic Field Equations (7 lectures)

361 views • 6 slides

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

ERP Selection What are the Key Elements ? Agenda Why ERP Selection is important ? Why ERP Selection is important ? Selection Criteria- General Selection Criteria- General Selection Criteria- General Selection Criteria-

319 views • 18 slides

Mutual Information an Adequate Tool for Feature Selection ? Benot Frnay November 15, 2013

Mutual Information an Adequate Tool for Feature Selection ? Benot Frnay November 15, 2013 Introduction What is Feature Selection ? Overview of the Presentation Example of Feature Selection: Diabetes Progression Goal : predict the diabetes

1.05k views • 64 slides

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation feature, landscape, topography Erosion and Weathering Unit How would you describe your mom to someone who's never met her? You might talk about the

66 views • 3 slides

Florida Man: The World's Worst Superhero Florida Man: The World's Worst Superhero Miami Herald

Florida Man: The World's Worst Superhero Florida Man: The World's Worst Superhero Miami Herald Who What is Florida Man? Florida Man Intentionally Drove Ferrari 360 Into Ocean At Top Speed Florida Man Finds a WWII Grenade, Places It in His

744 views • 60 slides

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction } Feature selection } Select a subset of a given feature set } Feature extraction }

875 views • 51 slides

Real-time updates to signed zones using dynamic update, OpenDNSSEC and BIND views Gavin

Real-time updates to signed zones using dynamic update, OpenDNSSEC and BIND views Gavin Brown <gavin.brown@centralnic.com> ICANN 50 London PRIVATE & CONFIDENTIAL A Brief History of CentralNics DNS System 1994: Altos Series 1000

505 views • 18 slides

One-Slide Summary List Recursion Examples & Recursive Procedures Recursive functions

One-Slide Summary List Recursion Examples & Recursive Procedures Recursive functions that operate on lists have a similar structure. list-cruncher is a higher-order function that can be used to implement many others. Decisions in a

331 views • 9 slides

Network Configuration Management with NETCONF and YANG J urgen Sch onw alder 84th IETF

Network Configuration Management with NETCONF and YANG J urgen Sch onw alder 84th IETF Meeting, Vancouver, 2012-07-29 1 / 90 Network Management Protocol Soup SID/TAM CMIN/WBEM GDMO/CMIP [TMFORUM] [DMTF] IE/IPFIX [OSI] [IETF]

1.15k views • 90 slides

Probabilistic Modelling and Bayesian Inference Zoubin Ghahramani Department of Engineering

Probabilistic Modelling and Bayesian Inference Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ MLSS T ubingen Lectures 2015 What is Machine Learning?

845 views • 61 slides

Understanding Neural Networks Part II: Convolutional Layers and Collaborative Filters Nick

TensorFlow Workshop 2018 Understanding Neural Networks Part II: Convolutional Layers and Collaborative Filters Nick Winovich Department of Mathematics Purdue University July 2018 SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks

1.21k views • 64 slides

Morphing ensemble Kalman filter and applications Jan Mandel and Jonathan D. Beezley Center for

Data assimilation Applications Software Morphing ensemble Kalman filter and applications Jan Mandel and Jonathan D. Beezley Center for Computational Mathematics Department of Mathematical and Statistical Sciences University of Colorado

703 views • 27 slides

tvz@insead.edu INSEAD (France) Presentation at DIMACS Workshop on Bounded Rationality

Timothy Van Zandt Players as Serial or Parallel Random Access Machines DIMACS 31 January 2005 1 Players as Serial or Parallel Random Access Machines ( EXPLORATORY REMARKS ) Timothy Van Zandt tvz@insead.edu INSEAD (France)

630 views • 34 slides

Co-visualiza+on of full data and in situ data extracts

Co-visualiza+on of full data and in situ data extracts from unstructured grid CFD @ 160K cores M. Rasquin 1 , P. Marion 2 , V. Vishwanath 3 , B.

465 views • 30 slides

A PPLICATION : S EARCH IN T OURISM (S KY S CANNER ) Goal: search for hotels/flights/trips using

Q UERYING S EMANTIC B IG D ATA AND I TS A PPLICATIONS Boris Motik University of Oxford November 16, 2015 T ABLE OF C ONTENTS B IG D ATA A PPLICATIONS OF S EMANTIC F ORMALISMS 1 RDF OX : P ARALLEL M ATERIALISATION -B ASED D ATALOG R EASONER 2 A

650 views • 50 slides