Feature Selection Gavin Brown www.cs.man.ac.uk/~gbrown The Usual - - PowerPoint PPT Presentation

feature selection gavin brown cs man ac uk gbrown the
SMART_READER_LITE
LIVE PREVIEW

Feature Selection Gavin Brown www.cs.man.ac.uk/~gbrown The Usual - - PowerPoint PPT Presentation

COMP61011 : Machine Learning Feature Selection Gavin Brown www.cs.man.ac.uk/~gbrown The Usual Supervised Learning Approach data + labels Learning Algorithm Model Predicted label Testing data Predicting Recurrence of Lung Cancer ofi Only a


slide-1
SLIDE 1

COMP61011 : Machine Learning

Feature Selection

Gavin Brown

www.cs.man.ac.uk/~gbrown

slide-2
SLIDE 2

The Usual Supervised Learning Approach

data + labels Learning Algorithm Model Testing data Predicted label

slide-3
SLIDE 3
  • fi

fl

Predicting Recurrence of Lung Cancer

Only a few genes actually matter! Need small, interpretable subset to help doctors!

slide-4
SLIDE 4

Text classification.... is this news story “ interesting” ? “ Bag-of-Words” representation: x = { 0, 3, 0, 0, 1, ..., 2, 3, 0, 0, 0, 1}

  • ne entry per word!

Easily 50,000 words! Very sparse - easy to overfit! Need accuracy, otherwise we lose visitors to our news website!

slide-5
SLIDE 5

The Usual Supervised Learning Approach ?????

data + labels Learning Algorithm – OVERWHELMED! Model Testing data Predicted label

slide-6
SLIDE 6

With big data….

Feature selection

  • Time complexity
  • Computational cost
  • Cost in data collection
  • Over-fitting
  • Lack of interpretability
slide-7
SLIDE 7

Some things matter, Some do not.

Relevant features

  • those that we need to perform well

Irrelevant features

  • those that are simply unnecessary

Redundant features

  • those that become irrelevant in the presence of others
slide-8
SLIDE 8

3 main categories of Feature Selection techniques: Wrappers, Filters, Embedded methods

slide-9
SLIDE 9

Wrappers: Evaluation method

Feature set

Pros:

  • Model-oriented
  • Usually gets good performance for the

model you choose. Cons:

  • Hugely computationally expensive.

Trains a model

Trains a model Outputs accuracy

slide-10
SLIDE 10

Wrappers: Search strategy

101110000001000100001000000000100101010 20 features … 1 million feature sets to check 25 features … 33.5 million sets 30 features … 1.1 billion sets  Need for a search strategy

  • Sequential forward selection
  • Recursive backward elimination
  • Genetic algorithms
  • Simulated annealing

 With an exhaustive search

slide-11
SLIDE 11

Wrappers: Sequential Forward Selection

slide-12
SLIDE 12

Search Complexity for Sequential Forward Selection

slide-13
SLIDE 13

Feature Selection (2): Filters

slide-14
SLIDE 14

Search Complexity for Filter Methods

Pros:

  • A lot less expensive!

Cons:

  • Not model-oriented
slide-15
SLIDE 15

Feature Selection (3): Embedded methods

Pros:

  • Performs feature selection as part of learning the procedure

Cons:

  • Computationally demanding

Principle: the classifier performs feature selection as part of the learning procedure Example: the logistic LASSO (Tibshirani, 1996) With Error Function: Cross-entropy error Regularizing term

slide-16
SLIDE 16

Conclusions on Feature Selection

Potential benefits Wrappers generally infeasible on the modern “big data” problem. Filters mostly heuristics, but can be formalized in some cases.

  • Manchester MLO group works on this challenge.
slide-17
SLIDE 17

This is the End of the Course Unit….

That’s it. We’re done. Exam in January – past papers on website.

MSc students: Projects due Friday, 4pm CDT/MRes students: 1 week later.

You need to submit a hardcopy to SSO:

  • your 6 page (maximum) report

You need to send by email to Gavin :

  • the report as PDF, and a ZIP file of your code.