Feature Selection ZHI LI Fenys Lab October 3, 2019 What is - - PowerPoint PPT Presentation

feature selection
SMART_READER_LITE
LIVE PREVIEW

Feature Selection ZHI LI Fenys Lab October 3, 2019 What is - - PowerPoint PPT Presentation

Feature Selection ZHI LI Fenys Lab October 3, 2019 What is Feature? X (Independent) Variable Predictor Covariate Face; leg; tail; Hair texture/style Impeachment Trade war Whatever you can think of.. What methods are out there?


slide-1
SLIDE 1

Feature Selection

ZHI LI Fenyö’s Lab October 3, 2019

slide-2
SLIDE 2

What is Feature?

X (Independent) Variable Predictor Covariate Face; leg; tail; Hair texture/style Impeachment Trade war Whatever you can think of..

slide-3
SLIDE 3

What methods are out there?

Filter methods: Correlation Embedded methods: Regularization Wrapper methods: Forward selection Depend on the way to combine selection algorithm and the model building

slide-4
SLIDE 4

Filter methods

  • Numeric Outcomes:
  • numeric predictors:
  • correlation (linear, nonlinear); distance
  • mutual information
  • categorical predictors:
  • t-statistics
  • ANOVA (multiple predictors)
  • mutual information
  • Categorical Outcomes:
  • numeric predictors:
  • ROC
  • Relief
  • mutual information
  • categorical predictors:
  • Chi-square
  • Fisher’s exact
  • Odds ratio
slide-5
SLIDE 5

5 5

Minimizing the loss function, L (sum of squared errors/MSR): ˆ w0 = 1 n X yi − ˆ w1 1 n X xi

<latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit><latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit><latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit><latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit>

Slide courtesy of Wilson with minor adaption

ˆ w1 = n P xiyi − (P xi) (P yi) n (P x2

i ) − (P xi)2

= cov(X, Y ) var(X)

<latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit><latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit><latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit><latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit>

n2 n2

slide-6
SLIDE 6

Pearson’s correlation

slide-7
SLIDE 7

Feature Importance Measures

Correlation Based Feature selection

slide-8
SLIDE 8

Spearman’s Correlation

slide-9
SLIDE 9

Spearman’s Correlation

Denominator: Numerator: Courtesy of Glen_b from stackexchange

slide-10
SLIDE 10

Other non-linear Measures

slide-11
SLIDE 11

Other non-linear Measures

  • MIC
slide-12
SLIDE 12

Other non-linear Measures

slide-13
SLIDE 13

Filter methods

slide-14
SLIDE 14

Other non-linear Measures

slide-15
SLIDE 15

t-statistics (categorical Predictor)

slide-16
SLIDE 16

Categorical Outcome

slide-17
SLIDE 17

ROC

slide-18
SLIDE 18

18

Evaluation of Binary Classification Models

18

Actual Predicted 0 1 1 True Negative False Negative True Positive False Positive

  • False Positive Rate = FP/(FP+TN) – fraction of label 0 predicted to be label 1
  • Accuracy = (TP+TN)/total - fraction of correct prediction
  • Precision = TP/(TP+FP) – fraction of correct among positive predictions
  • Sensitivity = TP/(TP+FN) – fraction of correct predictions among label 1. Also

called true positive rate and recall.

  • Specificity = TN/(TN+FP) – fraction of correct predictions among label 0

Slide courtesy of David

slide-19
SLIDE 19

Relief algorithm

slide-20
SLIDE 20

Relief algorithm

slide-21
SLIDE 21

More on categorical outcome with numeric predictors

slide-22
SLIDE 22

Categorical Outcome with Categorical Variable

slide-23
SLIDE 23

Categorical Outcome with Categorical Variable

Chi-square test: Large number Easy to calculate Approximation Fisher’s exact: 2 x 2 chi-squared test Small number Hard to calculate Exact

slide-24
SLIDE 24

Fisher’s exact

slide-25
SLIDE 25

Consequences of Using Non-informative Predictors

slide-26
SLIDE 26

Embedded methods

!"##$ = &'' + ) *

+,- .

/+ &0123 &3243##0$5 = &'' + ) *

+,- .

/+

6 Slide courtesy of Anna

slide-27
SLIDE 27

Wrapper methods Stepwise Regression Forward selection Backward elimination Bidirectional elimination

slide-28
SLIDE 28

Forward selection

Applied Predictive Modeling Chapter 19

slide-29
SLIDE 29

Backward Selection

slide-30
SLIDE 30

Filter Methods

Akaike Information Criterion Bayesian information criterion CV (Covered Earlier by Anna)

!"# = % log (*

+,- .

(/0 − 2 /0)4 ) + 6 ∗ 89:(%)

slide-31
SLIDE 31

Feature Importance Measures

Courtesy of Scott Lundberg

Often model dependent

slide-32
SLIDE 32

A comparison of two methods for determining variable importance.

slide-33
SLIDE 33

Feature Importance Measures

1.Tree SHAP. A new proposed method 2.Saabas. An individualized heuristic feature attribution method. 3.mean(|Tree SHAP|). A global attribution method based on the average magnitude of the individualized Tree SHAP attributions. 4.Gain. The same method used above in XGBoost, and also equivalent to the Gini importance measure used in scikit-learn tree models. 5.Split count. Represents both the closely related “weight” and “cover” methods in XGBoost, but is computed using the “weight” method. 6.Permutation. The resulting drop in accuracy of the model when a single feature is randomly permuted in the test data set. Courtesy of Scott Lundberg

slide-34
SLIDE 34
slide-35
SLIDE 35

Controlling Procedure (FWER and FDR)

slide-36
SLIDE 36

Summary

Various tools to dissect the relationships of predictor and outcome are available and the selection of them are subject to the question in mind. Feature selection is mostly subject to model selection except for filtering method. Most of the time, simple plotting (scatterplot, boxplot, pca) can save you a ton of time to figure out what is relevant/robust or dubious/misleading. Low complexity is mostly preferred when the primary goal is interpreting the contribution of predictor to the outcome. Sometimes, multiplicity should be controlled for multiple hypothesis test.

slide-37
SLIDE 37

Thank You