Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil - PowerPoint PPT Presentation

Feature Selection Feature Extraction Feature Selection Feature Extraction Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2 Principal Components Analysis (PCA) Old Dominion Univ. Factor Analysis (FA) Multidimensional Scaling (MDS) Fall 2010 Linear Discriminants Analysis (LDA) 1 2 Feature Selection Feature Extraction Feature Selection Feature Extraction Motivation Basic Approaches Reduction in complexity of prediction and training Given an input population characterized by d attributes: Feature Selection : find k < d dimensions that give the most Reduction in cost of data extraction information. Discard the other d − k . Simpler models – reduced variance subset selection Easier to visualize & analyze results, identify outliers, etc. Feature Extraction : find k ≤ d dimensions that are linear combinations of the original d Principal Components Analysis (unsupervised) Related: Factor Analysis and Multidimensional Scaling Linear Discriminants Analysis (supervised) Text also mensions Nonlinear methods: Isometric feature mapping and Locally Linear Embedding Not enough info to really justify 3 4

Feature Selection Feature Extraction Feature Selection Feature Extraction Subset Selection Notes Assume we have a suitable error function and can evaluate it Variant floating search adds multiple features at once, then for a variety of models (cross-validation). backtracks to see what features can be removed Misclassification error for classification problems Selection is less useful in very high-dimension problems where Mean-squared error for regression individual features are of limiteduse, but clusters of features Can’t evaluate all 2 d subsets of d features are significant. Forward selection: Start with an empty feature set. Repeatedly add the feature that reduces the error the most. Stop when decrease is insignificant. Backward selection: Start with all features. Remove the feature that decreases the error the most (or increases it the least). Stop when any further removals increase the error significantly. Both directions are O ( d 2 ) Hill-climing: not guaranteed to find global optimum 5 6 Feature Selection Feature Extraction Feature Selection Feature Extraction Outline Principal Components Analysis (PCA) Find a mapping � z = A � x onto a lower-dimension space Unsupervised method: seeks to minimize variance Feature Selection 1 Intuitively: try to spread the points apart as far as possible Feature Extraction 2 Principal Components Analysis (PCA) Factor Analysis (FA) Multidimensional Scaling (MDS) Linear Discriminants Analysis (LDA) 7 8

Feature Selection Feature Extraction Feature Selection Feature Extraction 1st Principal Component 2nd Principal Component w T w T w T Assume � x ∼ N ( � µ, Σ). Then Next find z 2 = � 2 � x , with � 2 � w 2 = 1 and � 2 � w 1 = 0, that w T maximizes Var ( z 2 ) = � 2 Σ � w 2 . w T � w T � w T Σ � � x ∼ N ( � µ, � w ) Solution: Σ � w 2 = α 2 � w 2 Choose the solution (eigenvector) corresponding to the 2nd w T w T Find z 1 = � 1 � x , with � 1 � w 1 = 1, that maximizes largest eigenvalue α 2 w T Var ( z 1 ) = � 1 Σ � w 1 . Because Σ is symmetric, its eigenvectors are mutually w T w T Find max � 1 Σ � w 1 − α ( � w 1 − 1), α ≥ 0 w 1 � 1 � orthogonal Solution: Σ � w 1 = α� w 1 This is an eigenvalue problem on Σ. We want the solution (eigenvector) corresponding to the largest eigenvalue α 9 10 Feature Selection Feature Extraction Feature Selection Feature Extraction Visualizing PCA Is Spreading the Space Enough? Although we can argue that spreading the points leads to a better- z = W T ( � � x − � m ) conditioned problem: What does this have to do with reducing dimensionality? 11 12

Feature Selection Feature Extraction Feature Selection Feature Extraction Detecting Linear Depencencies When to Stop? Suppose that some subset of the inputs are linearly correlated Proportion of Variance (PoV) for eigenvalues λ 1 , λ 2 , . . . , λ k q T � � k ∃ � q | � x = 0 i =1 λ i � d i =1 λ d Then Σ is singular. Plot and look for elbow q T � q T � x − � E [ � µ ] = 0 Typically stop around PoV = 0.9 Σ � q = 0 q is an eigenvector of the problem Σ � w = α� w with α = 0 � The last eigenvectors(s) we would consider using Flip side: PCA can be overly sensitive to scaling issues [normalize] and to outliers 13 14 Feature Selection Feature Extraction Feature Selection Feature Extraction PoV PCA & Visualization If 1st two eigenvectors account for majority of variance, plot data, using symbols for classes or other features Visually search for patterns 15 16

Feature Selection Feature Extraction Feature Selection Feature Extraction PCA Visualization Factor Analysis (FA) A kind of “inverted” PCA. Find a set of factors � z that can be combined to generate � x :   k �  + ε i x i − µ i = v ij z j  j =1 z i are latent factors E [ z i ] = 0 , Var ( z i ) = 1 , i � = j ⇒ ( Cov ( z i , z j ) = 0 ε i are noise sources E [ ε i ] = 0 , Var ( ε i ) = φ i , i � = j ⇒ ( Cov ( ε i , ε j ) = 0 Cov ( ε i , z j ) = 0 v ij are factor loadings 17 18 Feature Selection Feature Extraction Feature Selection Feature Extraction PCA vs FA Multidimensional Scaling (MDS) Given the pairwise distances d ij between N points, place those points on a low-dimension map, preserving the distances z = � g ( � x | θ ) � Choose θ to minimize Sammon stress z r − � x r − � x s || ) 2 z s || − || � ( || � � E ( θ |X ) = x r − � || � x s || r , s x r − � x s || ) 2 x r | θ ) − � x s | θ ) || − || � ( || � g ( � g ( � � = x r − � || � x s || r , s Use regression methods for � g , usng the above as the error function to be minimized. 19 20

Feature Selection Feature Extraction Feature Selection Feature Extraction Linear Discriminants Analysis (LDA) Scatter w ) = ( m 1 − m 2 ) 2 Supervised method J ( � s 2 1 + s 2 Find a projection of � x onto 2 a low-dimension space w T � w t � where classes are ( m 1 − m 2 ) 2 m 2 ) 2 = ( � m 1 − � well-separated w T S B � = � w Find � w maximizing w ) = ( m 1 − m 2 ) 2 m 2 ) T is the between-class scatter where S B = ( � m 1 − � m 2 )( � m 1 − � J ( � s 2 1 + s 2 2 Similarly, s 2 1 + s 2 w T S W � 2 = � w w T � m i = � m i where S W = S 1 + S 2 is the within-class scatter x t − m i ) 2 r t � w T � s i = ( � t 21 22

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil - PowerPoint PPT Presentation

Feature Selection Feature Extraction Feature Selection Feature Extraction Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2 Principal Components Analysis (PCA) Old Dominion Univ. Factor Analysis (FA)

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Reducing dimensionality Principal components R.W. Oldford Reducing dimensions Recall how

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

RECSM Summer School: Machine Learning for Social Sciences Session 3.2: Principal Components

PCA on manifolds: application to spaces of landmarks Dr Sergey Kushnarev Singapore University of

Python Programming for Data Processing and Climate Analysis Jules Kouatchou and Hamid Oloso

Pediatric Ruptured PCA Aneurysm Ricardo A Hanel, MD PhD Director, Stroke and Cerebrovascular

EECS 192: Mechatronics Design Lab Discussion 5: PCB Peer Review GSI: Varun Tolani 15 & 16

Printed Circuit Board Design Updates Andrew Laundrie and Xu Zhai UW Physical Sciences Laboratory

Session 4C Escape Routing of Mixed-Pattern Signals Based on Staggered-Pin-Array PCBs Kan Wang,

Operating Systems Processes Lecture 3 Michael OBoyle 1 Overview Process Process

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil - PowerPoint PPT Presentation

Feature Selection Feature Extraction Feature Selection Feature Extraction Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2 Principal Components Analysis (PCA) Old Dominion Univ. Factor Analysis (FA)

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Reducing dimensionality Principal components R.W. Oldford Reducing dimensions Recall how

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

RECSM Summer School: Machine Learning for Social Sciences Session 3.2: Principal Components

PCA on manifolds: application to spaces of landmarks Dr Sergey Kushnarev Singapore University of

Python Programming for Data Processing and Climate Analysis Jules Kouatchou and Hamid Oloso

Pediatric Ruptured PCA Aneurysm Ricardo A Hanel, MD PhD Director, Stroke and Cerebrovascular

EECS 192: Mechatronics Design Lab Discussion 5: PCB Peer Review GSI: Varun Tolani 15 &amp; 16

Printed Circuit Board Design Updates Andrew Laundrie and Xu Zhai UW Physical Sciences Laboratory

Session 4C Escape Routing of Mixed-Pattern Signals Based on Staggered-Pin-Array PCBs Kan Wang,

Operating Systems Processes Lecture 3 Michael OBoyle 1 Overview Process Process

EECS 192: Mechatronics Design Lab Discussion 5: PCB Peer Review GSI: Varun Tolani 15 & 16