Feature Selection
ZHI LI Fenyö’s Lab October 3, 2019
Feature Selection ZHI LI Fenys Lab October 3, 2019 What is - - PowerPoint PPT Presentation
Feature Selection ZHI LI Fenys Lab October 3, 2019 What is Feature? X (Independent) Variable Predictor Covariate Face; leg; tail; Hair texture/style Impeachment Trade war Whatever you can think of.. What methods are out there?
ZHI LI Fenyö’s Lab October 3, 2019
X (Independent) Variable Predictor Covariate Face; leg; tail; Hair texture/style Impeachment Trade war Whatever you can think of..
Filter methods: Correlation Embedded methods: Regularization Wrapper methods: Forward selection Depend on the way to combine selection algorithm and the model building
5 5
Minimizing the loss function, L (sum of squared errors/MSR): ˆ w0 = 1 n X yi − ˆ w1 1 n X xi
<latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit><latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit><latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit><latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit>Slide courtesy of Wilson with minor adaption
i ) − (P xi)2
n2 n2
Correlation Based Feature selection
Denominator: Numerator: Courtesy of Glen_b from stackexchange
18
18
Actual Predicted 0 1 1 True Negative False Negative True Positive False Positive
called true positive rate and recall.
Slide courtesy of David
Chi-square test: Large number Easy to calculate Approximation Fisher’s exact: 2 x 2 chi-squared test Small number Hard to calculate Exact
!"##$ = &'' + ) *
+,- .
/+ &0123 &3243##0$5 = &'' + ) *
+,- .
/+
6 Slide courtesy of Anna
Applied Predictive Modeling Chapter 19
Akaike Information Criterion Bayesian information criterion CV (Covered Earlier by Anna)
+,- .
Courtesy of Scott Lundberg
Often model dependent
1.Tree SHAP. A new proposed method 2.Saabas. An individualized heuristic feature attribution method. 3.mean(|Tree SHAP|). A global attribution method based on the average magnitude of the individualized Tree SHAP attributions. 4.Gain. The same method used above in XGBoost, and also equivalent to the Gini importance measure used in scikit-learn tree models. 5.Split count. Represents both the closely related “weight” and “cover” methods in XGBoost, but is computed using the “weight” method. 6.Permutation. The resulting drop in accuracy of the model when a single feature is randomly permuted in the test data set. Courtesy of Scott Lundberg
Various tools to dissect the relationships of predictor and outcome are available and the selection of them are subject to the question in mind. Feature selection is mostly subject to model selection except for filtering method. Most of the time, simple plotting (scatterplot, boxplot, pca) can save you a ton of time to figure out what is relevant/robust or dubious/misleading. Low complexity is mostly preferred when the primary goal is interpreting the contribution of predictor to the outcome. Sometimes, multiplicity should be controlled for multiple hypothesis test.