1 The Cost of Feature Transformation Feature Rescaling } Not every - PowerPoint PPT Presentation

Feature Engineering } As we saw with polynomial regression, we often want to transform our data in order to get better results from a machine learning algorithm } We often get better results by: Changing how features are represented. 1. Adding new features. 2. Class #10: Deleting/ignoring some features. 3. Feature Engineering Machine Learning (COMP 135): M. Allen, 20 Feb. 20 2 Thursday, 20 Feb. 2020 Machine Learning (COMP 135) 1 2 Example: Higher-Order Polynomial Features The Cost of Feature Transformation } As seen in Assignment 02, transforming data by mapping to higher-degree polynomials, and then fitting a linear regression, can reduce error } Gains are most significant at first, and then error starts to level off Not every transformation is as useful as others } The polynomial degrees above 3 from previous slide also start to show some } evidence of over-fitting, as revealed by cross-validation 4 Thursday, 20 Feb. 2020 Machine Learning (COMP 135) 3 Thursday, 20 Feb. 2020 Machine Learning (COMP 135) 3 4 1

The Cost of Feature Transformation Feature Rescaling } Not every transformation Input: Each numeric feature has arbitrary min/max is useful—at very high } Some in [0, 1], Some in [-5, 5], Some [-3333, -2222] polynomials, some of the mathematics of the linear Transformed feature vector regression libraries in } Set each feature value f to have [0, 1] range sklearn break down } Mathematically, we expect φ ( x n ) f = x nf − min f better and better fits max f − min f } In practice, the method ceases working effectively, } min_f = minimum observed in training set and models are generally } max_f = maximum observed in training set useless Machine Learning (COMP 135) 6 Thursday, 20 Feb. 2020 Machine Learning (COMP 135) 5 Thursday, 20 Feb. 2020 5 6 Feature Standardization Feature Standardization φ ( x n ) f = x nf − µ f Input: Each feature is numeric, has arbitrary scale σ f Transformed feature vector } Treats each feature as “Normal(0, 1)” • Set each feature value f to have zero mean, unit variance } Typical range will be -3 to +3 φ ( x n ) f = x nf − µ f } If original data is approximately normal σ f } Also called z-score transform Empirical mean observed in training set µ f Empirical standard deviation observed in training set σ f Machine Learning (COMP 135) Machine Learning (COMP 135) 8 Thursday, 20 Feb. 2020 7 Thursday, 20 Feb. 2020 7 8 2

Best Subset Selection Forward Stepwise Selection Start with zero feature model (guess mean) 1. } Store as M_0 2. Add best scoring single feature (among all F ) } Store as M_1 3. For each size k = 2, … F } Try each possible not-included feature ( F – k + 1 ) } Add best scoring feature to the model M_k-1 } Store as M_k } Main issue: too many subsets 4. Pick best among M_0, M_1, … M_F, } There are O(2 p ) such collections of features based upon the validation data } For problems with large feature-sets, this grows quickly infeasible Machine Learning (COMP 135) Machine Learning (COMP 135) 10 Thursday, 20 Feb. 2020 9 Thursday, 20 Feb. 2020 9 10 Best vs Forward Stepwise Backwards Stepwise Selection The basic forward model can also be run backwards: Start with all features 1. Gradually test all models with one feature removed 2. from each Repeat to remove 2, 3, … features, down to single- 3. feature versions Easy to find cases where forward stepwise ‘s greedy approach doesn’t deliver best possible subset. Machine Learning (COMP 135) Machine Learning (COMP 135) 12 Thursday, 20 Feb. 2020 11 Thursday, 20 Feb. 2020 11 12 3

Next Week } Special schedule : Class Wednesday & Thursday } T opics : Clustering methods } Readings linked from class schedule page } Assignments : } Homework 03: due Wednesday, 26 Feb., 9:00 AM } Logistic regression & decision trees } Project 01: due Monday, 09 March, 5:00 PM } Feature engineering and classification for image data } Midterm Exam: Wednesday, 11 March } Office Hours : 237 Halligan } Monday, 10:30 AM – Noon } T uesday, 9:00 AM – 1:00 PM } TA hours can be found on class website Thursday, 20 Feb. 2020 Machine Learning (COMP 135) 13 13 4

1 The Cost of Feature Transformation Feature Rescaling } Not every - PowerPoint PPT Presentation

Feature Engineering } As we saw with polynomial regression, we often want to transform our data in order to get better results from a machine learning algorithm } We often get better results by: Changing how features are represented. 1. Adding

Information Theory and Feature Selection (Joint Informativeness and Tractability) Leonidas

Introduction: Why Optimization? Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

Computational Approaches to Analysis and Control of Hybrid Systems Antoine Girard Laboratoire

Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology

Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative SMT

Outline Evolution of neurocomputing Artificial neural networks Feed forward

Lecture 6.1: Introduction to Fourier series Matthew Macauley Department of Mathematical Sciences

,-.). '.)*' / 0 f ( x ) 0 1

We are familiar with the Fourier Series in which a function x of period T can be represented as a

EE361: Signals and System II Fourier Series Highlights http://www.ee.unlv.edu/~b1morris/ee361/

Lecture 16: Discrete Fourier Transform, Spherical Harmonics COMPSCI/MATH 290-04 Chris Tralie,

Topic 4: Continuous-Time Fourier Transform (CTFT) o Introduction to Fourier Transform o Fourier

Acceleration of Fourier Series Charles Moore Kansas State University, U.S.A. International

02/04/1439 Chapter 3

The Fourier Transform Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of

Signals and Systems Fall 2003 Lecture #6 23 September 2003 1. CT Fourier series reprise,

Classification and feat u re engineering MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN

Chapter 14: Fourier Transforms and Boundary Value Problems in an Unbounded Region Department of

Fourier Transform Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

Nature is an infinite sphere of which the center is everywhere and the circumference nowhere.

CFT, self-similarity, fractals II Shahin Rouhani Physics Department Sharif University of

1 The Cost of Feature Transformation Feature Rescaling } Not every - PowerPoint PPT Presentation

Feature Engineering } As we saw with polynomial regression, we often want to transform our data in order to get better results from a machine learning algorithm } We often get better results by: Changing how features are represented. 1. Adding

Information Theory and Feature Selection (Joint Informativeness and Tractability) Leonidas

Introduction: Why Optimization? Geoff Gordon &amp; Ryan Tibshirani Optimization 10-725 / 36-725

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

Computational Approaches to Analysis and Control of Hybrid Systems Antoine Girard Laboratoire

Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology

Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative SMT

Outline Evolution of neurocomputing Artificial neural networks Feed forward

Lecture 6.1: Introduction to Fourier series Matthew Macauley Department of Mathematical Sciences

,-.)*. '*.)*' / 0 f ( x ) 0 1

We are familiar with the Fourier Series in which a function x of period T can be represented as a

EE361: Signals and System II Fourier Series Highlights http://www.ee.unlv.edu/~b1morris/ee361/

Lecture 16: Discrete Fourier Transform, Spherical Harmonics COMPSCI/MATH 290-04 Chris Tralie,

Topic 4: Continuous-Time Fourier Transform (CTFT) o Introduction to Fourier Transform o Fourier

Acceleration of Fourier Series Charles Moore Kansas State University, U.S.A. International

02/04/1439 Chapter 3

The Fourier Transform Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of

Signals and Systems Fall 2003 Lecture #6 23 September 2003 1. CT Fourier series reprise,

Classification and feat u re engineering MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN

Chapter 14: Fourier Transforms and Boundary Value Problems in an Unbounded Region Department of

Fourier Transform Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

Nature is an infinite sphere of which the center is everywhere and the circumference nowhere.

CFT, self-similarity, fractals II Shahin Rouhani Physics Department Sharif University of

Introduction: Why Optimization? Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725

,-.). '.)*' / 0 f ( x ) 0 1