Mach Machine Le ine Learning arning Feature Space, Feature - PowerPoint PPT Presentation

Mach Machine Le ine Learning arning Feature Space, Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 /

Agenda Agenda  Features and Patterns  The Curse of Size and Dimensionality  Features and Patterns  Data Reduction  Sampling  Dimensionality Reduction  Feature Selection  Feature Selection Methods  Univariate Feature selection  Multivariate Feature selection Sharif University of Technology, Computer Engineering Department, Machine Learning Course 2

Features and Features and Patter Patterns ns  Feature  Feature is any distinctive aspect, quality or characteristic of an object (population) Feature vector  Features may be symbolic (e.g. color) or numeric (e.g. height).  Definitions  Feature vector  The combination of d features is presented as a d-dimensional column vector called a feature vector. Feature Space (3D)  Feature space  The d-dimensional space defined by the feature vector is called the feature space.  Scatter plot  Objects are represented as points in feature space. This Scatter plot (2D) representation is called a scatter plot. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 3

Features and Features and Patter Patterns ns  Pattern is a composite of traits or features corresponding to characteristics of an object or population  In classification; a pattern is a pair of feature vector and label  What makes a good feature vector  The quality of a feature vector is related to its ability to discriminate samples from different classes  Samples from the same class should have similar feature values  Samples from different classes have different feature values Good features Bad features Sharif University of Technology, Computer Engineering Department, Machine Learning Course 4

Fea eatur tures and es and Patt tter erns ns  More feature properties Linear Separability Non-Linear Separability Highly correlated Multi modal  Good features are:  Representative: provide a concise description  Characteristic: different values for different classes, and almost identical values for very similar objects  Interpretable: easily translate into object characteristics used by human experts  Suitable: natural choice for the task at hand  Independent: dependent features are redundant Sharif University of Technology, Computer Engineering Department, Machine Learning Course 5

The he Cur Curse se of of Si Size e and and Di Dimensionali mensionality ty  The performance of a classifier depends on the interrelationship between  sample sizes  number of features  classifier complexity  The probability of misclassification of a decision rule does not increase beyond a certain dimension for the feature space as the number of features increases.  This is true as long as the class-conditional densities are completely known.  Peaking Phenomena  Adding features may actually degrade the performance of a classifier Sharif University of Technology, Computer Engineering Department, Machine Learning Course 6

Fe Featu atures an res and d Pa Patterns tterns  The curse of dimensionality examples  Case 1 (left): Drug Screening (Weston et al, Bioinformatics, 2002)  Case 2 (right): Text Filtering (Bekkerman et al, JMLR, 2003) Performance Performance Number of features Number of features Sharif University of Technology, Computer Engineering Department, Machine Learning Course 7

Features and Features and Patter Patterns ns  Examples of number of samples and features  Face recognition application  For 1024*768 images, the number of samples will be 786432 !  Bio-informatics applications (gene and micro array data)  Few samples (about 100) with high dimension (6000 – 60000)  Text categorization application  In a 50000 words vocabulary language, each document is represented by a 50000-dimensional vector  How to resolve the problem of huge data  Data reduction  Dimensional reduction (Selection or extraction) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 8

Dat Data a Reducti Reduction on  Data reduction goal  Obtain a reduced representation of the data set that is much smaller in volume but yet produce the same (or almost the same) analytical results  Data reduction methods  Regression  Data are modeled to fit a determined model (e.g. line or AR)  Sufficient Statistics  A function of the data that maintains all the statistical information of the original population  Histograms  Divide data into buckets and store average (sum) for each bucket  Partitioning rules: equal-width, equal-frequency, equal-variance, etc.  Clustering  Partition data set into clusters based on similarity, and store cluster representation only  Clustering methods will be discussed later.  Sampling  obtaining small samples to represent the whole data set D Sharif University of Technology, Computer Engineering Department, Machine Learning Course 9

Sampli Sampling ng  Sampling strategies  Simple Random Sampling  There is an equal probability of selecting any particular item  Sampling without replacement  As each item is selected, it is removed from the population, the same object can not be picked up more than once  Sampling with replacement  Objects are not removed from the population as they are selected for the sample.  In sampling with replacement, the same object can be picked up more than once  Stratified sampling  Grouping (split) population samples into relatively homogeneous subgroups  Then, draw random samples from each partition according to its size Stratified Sampling Sharif University of Technology, Computer Engineering Department, Machine Learning Course 10

Di Dimensionali mensionality R ty Reduc educti tion on  A limited yet salient feature set simplifies both pattern representation and classifier design.  Pattern representation is easy for 2D and 3D features.  How to make pattern with high dimensional features viewable? (refer to HW 1)  Dimensionality Reduction   x  Feature Selection (will be discussed today)  x  1   i x   1     2   Select the best subset from a given feature set       x   i x   m    Feature Extraction (e.g. PCA etc., will be discussed next time) d     x x  Create new features based on the original feature set  y  1 1     i   x 1 x     2   2 f ( )    Transforms are usually involved           y   i  x   x    m   d d Sharif University of Technology, Computer Engineering Department, Machine Learning Course 11

Fe Featu ature Se re Selection lection  Problem definition  Feature Selection : Select the most “relevant” subset of attributes according to some selection criteria. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 13

Why Why Fea Feature Se ture Selection lection is is importan important? t?  May Improve performance of classification algorithm  Classification algorithm may not scale up to the size of the full feature set either in sample or time  Allows us to better understand the domain  Cheaper to collect a reduced set of predictors  Safer to collect a reduced set of predictors Sharif University of Technology, Computer Engineering Department, Machine Learning Course 14

Feature Select Feature Selection ion Met Methods hods  One view  Univariate method  Considers one variable (feature) at a time  Multivariate method  Considers subsets of variables (features) together.  Another view  Filter method  Ranks features subsets independently of the classifier.  Wrapper method  Uses a classifier to assess features subsets.  Embedded  Feature selection is part of the training procedure of a classifier (e.g. decision trees) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 15

Univariate Univariate Fe Featu ature selec re selection tion  Filter methods have been used often (Why?)  Criterion of Feature Selection X 2  Significant difference X 1 is more X 1 significant than X 2 X 2  Independence: Non-correlated feature selection X 1 and X 2 both are X 1 significant, but correlated  Discrimination Power X 1 results in more discrimination than X 2 Sharif University of Technology, Computer Engineering Department, Machine Learning Course 16

Univari nivariate F ate Feature sel eature selecti ection on  Information based significant difference  select A if Gain(A) > Gain(B).  Gain can be calculated using several methods such as:  Information gain, Gain ratio, Gini index (These methods will be discussed in TA session).  Statistical significant difference  Continuous data with normal distribution  Two classes: T-test  will be discussed here!  Multi classes: ANOVA  Continuous data with non-normal distribution or rank data  Two classes: Mann-Whitney test  Multi classes: Kruskal-Wallis test  Categorical data  Chi-square test Sharif University of Technology, Computer Engineering Department, Machine Learning Course 17

Mach Machine Le ine Learning arning Feature Space, Feature - PowerPoint PPT Presentation

Mach Machine Le ine Learning arning Feature Space, Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Agenda Features and Patterns The Curse of Size and

Unit 14: The Mach Operating System 14.3. Mach Memory Management AP 9/01 Mach Virtual Memory

Unit 14: The Mach Operating System 14.2. Threads and Scheduling in Mach AP 9/01 Threads

1 Mach Overview Mach Overview Mach Mach Mach is more general than NT in that objects named by

Study on Mach Reflection and Mach Configuration CHEN Shuxing Hyp-2008, Maryland Outline

DAVID MACH By Jack Haxley BIOGRAPHY David Mach was born in a town called Methil on the

Shock wave v v = = = S sin ; Mach angle , Mach number v v S Shock

The Mach System From "Operating Systems Concepts, Sixth Edition" by Abraham

A Quick Look At Low Mach Number Methodology Ann Almgren Center for Computational Sciences and

Acc ccele elera ratin ting g Ma Mach chin ine e Lea earnin rning g wit with h Tra

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MART INE Z CRE E K L INE AR CRE E KWAY T RAIL Pub lic Me e ting Ja nua ry 18, 2018 L

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Earl y w arning s y stems D E FE N SIVE R P R OG R AMMIN G Dr . Colin Gillespie J u mping Ri v

Review Mining Automatically Assessing Review Helpfulness Sanae Sato Haotian He April 22, 2014 O

Correlation analysis Fernando Brito e Abreu (fba@di.fct.unl.pt) Universidade Nova de

Fabienne Cap March 9th, 2017 Overview Motivation Methodology Results Fabienne Cap

A Degree-of of-Knowledge Model to Capture Source Code Familiarity Thomas Fritz, Jingwen Ou, Gail

of galaxies and gravitational lenses Deepak Jain Deen Dayal Upadhyaya College University of

Evaluating translation quality - part 2 Machine Translation Lecture 10 Instructor: Chris

Drilling through the M31 Halo near Mayall-II/G1 Michael Gregg University of California, Davis

Scientific papers Stefano Chessa Pisa & their performances 13 th March 2019 1

Mach Machine Le ine Learning arning Feature Space, Feature - PowerPoint PPT Presentation

Mach Machine Le ine Learning arning Feature Space, Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Agenda Features and Patterns The Curse of Size and

Unit 14: The Mach Operating System 14.3. Mach Memory Management AP 9/01 Mach Virtual Memory

Unit 14: The Mach Operating System 14.2. Threads and Scheduling in Mach AP 9/01 Threads

1 Mach Overview Mach Overview Mach Mach Mach is more general than NT in that objects named by

Study on Mach Reflection and Mach Configuration CHEN Shuxing Hyp-2008, Maryland Outline

DAVID MACH By Jack Haxley BIOGRAPHY David Mach was born in a town called Methil on the

Shock wave v v = = = S sin ; Mach angle , Mach number v v S Shock

The Mach System From &quot;Operating Systems Concepts, Sixth Edition&quot; by Abraham

A Quick Look At Low Mach Number Methodology Ann Almgren Center for Computational Sciences and

Acc ccele elera ratin ting g Ma Mach chin ine e Lea earnin rning g wit with h Tra

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MART INE Z CRE E K L INE AR CRE E KWAY T RAIL Pub lic Me e ting Ja nua ry 18, 2018 L

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Earl y w arning s y stems D E FE N SIVE R P R OG R AMMIN G Dr . Colin Gillespie J u mping Ri v

Review Mining Automatically Assessing Review Helpfulness Sanae Sato Haotian He April 22, 2014 O

Correlation analysis Fernando Brito e Abreu (fba@di.fct.unl.pt) Universidade Nova de

Fabienne Cap March 9th, 2017 Overview Motivation Methodology Results Fabienne Cap

A Degree-of of-Knowledge Model to Capture Source Code Familiarity Thomas Fritz, Jingwen Ou, Gail

of galaxies and gravitational lenses Deepak Jain Deen Dayal Upadhyaya College University of

Evaluating translation quality - part 2 Machine Translation Lecture 10 Instructor: Chris

Drilling through the M31 Halo near Mayall-II/G1 Michael Gregg University of California, Davis

Scientific papers Stefano Chessa Pisa &amp; their performances 13 th March 2019 1

The Mach System From "Operating Systems Concepts, Sixth Edition" by Abraham

Scientific papers Stefano Chessa Pisa & their performances 13 th March 2019 1