Feature Engineering in Machine Learning ek Zden Zabokrtsk y - PowerPoint PPT Presentation

Feature Engineering in Machine Learning ek ˇ Zdenˇ Zabokrtsk´ y Institute of Formal and Applied Linguistics, Charles University in Prague

Used resources http://www.cs.princeton.edu/courses/archive/spring10/cos424/slides/18-feat.pdf http://stackoverflow.com/questions/2674430/how-to-engineer-features-for-machine-learning https://facwiki.cs.byu.edu/cs479/index.php/Feature engineering documentation of scikit-learn wikipedia

Human’s role when applying Machine Learning Machine learning provides you with extremely powerful tools for decision making ... ... but until a breakthrough in AI, the role of the developer’s decision will still be crucial. Your responsibility: setting up the correct problem to be optimized (it’s far from straightforward in the real world) choosing a model choosing a learning algorithm (or a family of algorithms) finding relevant data designing features, feature representation, feature selection . . .

Feature a feature - a piece of information that is potentially useful for prediction

Feature engineering feature engineering - not a formally defined term, just a vaguely agreed space of tasks related to designing feature sets for ML applications two components: first, understanding the properties of the task you’re trying to solve and how they might interact with the strengths and limitations of the model you are going to use second, experimental work were you test your expectations and find out what actually works and what doesn’t.

Feature engineering in real life Typically a cycle 1 design a set of features 2 run an experiment and analyze the results on a validation dataset 3 change the feature set 4 go to step 1 Don’t expect any elegant answers today.

Causes of feature explosion Feature templates: When designing a feature set, you usually quickly turn from coding individual features (such as ’this word is predeced by a preposition and a determiner’) to implementing feature templates (such as ’the two preceding POSs are X a Y’) Feature combination: linear models cannot handle some dependencies between features (e.g. XOR with binary operations, polynomial dependencies with real-valued features) - feature combinations might work better. Both lead to quick growth of the number of features.

Stop the explosion There must be some limits, because Given the limited size of training data, the number of features that can be efficiently used is hardly unbounded (overfitting). Sooner or later speed becomes a problem.

Possible solutions to avoid the explosion feature selection regularization kernels

Feature selection Central assumption: we can identify features that are redundant or irrelevant. Let’s just use the best-working subset: arg max f acc ( f ), where acc ( f ) evaluates prediction quality on held-out data Rings a bell? Yes, there’s a set-of-all-subsets problem (NP hard), exhaustive search clearly intractable. (The former implicitly used e.g. in top-down-induced of decision trees.) A side effect of feature reduction: improved model interpretability.

Feature selection Basic approaches: wrapper - search through the space of subsets, train a model for current subset, evaluate it on held-out data, iterate... simple greedy search heuristics: forward selection - start with an empty set, gradually add the “strongest” features backward selection - start with the full set, gradually remove the “weakest” features computationally expensive filter - use N most promissing features according to ranking resulting from a proxy measure, e.g. from mutual information Pearson correlation coefficient embedded methods - feature selection is a part of model construction

Regularisation regularisation = introducing penalty for complexity the more features matter in the model, the bigger complexity in other words, try to concentrate the weight mass, don’t scatter it too much application of Occam’s razor: the model should be simple Bayesian view: regularization = imposing this prior knowledge (“the world is simple”) on parameters

Regularisation In practice, regularisation is enforced by adding a factor that has high values for complex parameter settings to the cost function, typically to negative log-likelihood: cost ( f ) = − l ( f ) + regularizer ( f ) L 0 norm: ... + λ count ( w j � = 0) - minimize the number of features with non-zero weight, the less the better L 1 norm: ... + λ | w | - minimize the sum of all weights L 2 norm: ... + λ || w || - minimize the lenght of the weight vector � L 1 / 2 norm: ... + λ || w || L ∞

Experimenting with features in scikit-learn

Encoding categorical features Turning (a dictionary of) categorical features to a fixed-lenght vector: estimators can be fed only with numbers, not with strings turn a categorical feature to one-of-K vector of features preprocessing.OneHotEncoder for one feature after another or sklearn.feature extraction.DictVectorizer for the whole dataset at once, two steps: fit transform and transform

Feature binarization thresholding numerical features to get boolean values preprocessing.Binarizer

Feature Discretization converting continuous features to discrete features Typically data is discretized into partitions of K equal lengths/width (equal intervals) or K% of the total data (equal frequencies). ? in sklearn?

Dataset standardization some estimators might work badly if distributions of values of different features are radically different (e.g. in the order of magnitude) solution: transform the data by moving the center (toward zero mean) and scaling (towards unit variance) preprocessing.scale

Vector normalization Normalization is the process of scaling individual samples to have unit norm. solution: transform the data by moving the center (toward zero mean) and scaling (towards unit variance) preprocessing.normalize

Feature selection Scikit-learn exposes feature selection routines as objects that implement the transform method: SelectKBest removes all but the k highest scoring features SelectPercentile removes all but a user-specified highest scoring percentile of features using common univariate statistical tests for each feature: false positive rate SelectFpr, false discovery rate SelectFdr, or family wise error SelectFwe. These objects take as input a scoring function that returns univariate p-values: For regression: f regression For classification: chi2 or f classif

Feature Engineering in Machine Learning ek Zden Zabokrtsk y - PowerPoint PPT Presentation

Feature Engineering in Machine Learning ek Zden Zabokrtsk y Institute of Formal and Applied Linguistics, Charles University in Prague Used resources http://www.cs.princeton.edu/courses/archive/spring10/cos424/slides/18-feat.pdf

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Feature Space Aleix M. Martinez aleix@ece.osu.edu Feature Space Many problems in science

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

TRECVID 2011 Paul Over* Alan Smeaton (Dublin City University) George Awad* Wessel Kraaij

Draft . 1/26 Classical Probability Model for an Arbitrary Experimental Setup 26 November

Partition Relations for Linear Orders without the Axiom of Choice 03E02, 03E60, 05C63 Thilo

INTRODUCTION CSSE 120 Rose Hulman Institute of Technology The C Programming Language

Sidan 1 The activation concept The activation concept 1. Wakes up, Son starts executing

Google Slides Home Screen slides.google.com The home screens serve as a central place to collect

Consistent Storage or Scalable Storage Why Not Both? CONSISTENCY Strong Consistency

CSE543 - Introduction to Computer and Network Security Module: Mandatory Access Control Professor

Feature Engineering in Machine Learning ek Zden Zabokrtsk y - PowerPoint PPT Presentation

Feature Engineering in Machine Learning ek Zden Zabokrtsk y Institute of Formal and Applied Linguistics, Charles University in Prague Used resources http://www.cs.princeton.edu/courses/archive/spring10/cos424/slides/18-feat.pdf

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

PCA &amp; ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Feature Space Aleix M. Martinez aleix@ece.osu.edu Feature Space Many problems in science

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

TRECVID 2011 Paul Over* Alan Smeaton (Dublin City University) George Awad* Wessel Kraaij

Draft . 1/26 Classical Probability Model for an Arbitrary Experimental Setup 26 November

Partition Relations for Linear Orders without the Axiom of Choice 03E02, 03E60, 05C63 Thilo

INTRODUCTION CSSE 120 Rose Hulman Institute of Technology The C Programming Language

Sidan 1 The activation concept The activation concept 1. Wakes up, Son starts executing

Google Slides Home Screen slides.google.com The home screens serve as a central place to collect

Consistent Storage or Scalable Storage Why Not Both? CONSISTENCY Strong Consistency

CSE543 - Introduction to Computer and Network Security Module: Mandatory Access Control Professor

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani