Measuring variable importance A new, conditional importance Conclusion References
Party on! A new, conditional variable importance A new, conditional - - PowerPoint PPT Presentation
Party on! A new, conditional variable importance A new, conditional - - PowerPoint PPT Presentation
Measuring variable Party on! A new, conditional variable importance A new, conditional importance measure for random forests importance Conclusion available in party References Carolin Strobl (LMU M unchen) and Achim Zeileis (WU Wien)
Measuring variable importance A new, conditional importance Conclusion References
Introduction
random forests
◮ have become increasingly popular in, e.g., genetics and
the neurosciences
◮ can deal with “small n large p”-problems, high-order
interactions, correlated predictor variables
◮ are used not only for prediction, but also to measure
variable importance (advantage: RF variable importance measures capture the effect of a variable in main effects and interactions → smarter for screening than univariate measures)
Measuring variable importance A new, conditional importance Conclusion References
(Small) random forest
Start p < 0.001 1 ≤ ≤ 8 > > 8 n = 15 y = (0.4, 0.6) 2 Start p < 0.001 3 ≤ ≤ 14 > > 14 n = 34 y = (0.882, 0.118) 4 n = 32 y = (1, 0) 5 Start p < 0.001 1 ≤ ≤ 12 > > 12 n = 38 y = (0.711, 0.289) 2 Number p < 0.001 3 ≤ ≤ 3 > > 3 n = 25 y = (1, 0) 4 n = 18 y = (0.889, 0.111) 5 Start p < 0.001 1 ≤ ≤ 12 > > 12 Age p < 0.001 2 ≤ ≤ 27 > > 27 n = 10 y = (1, 0) 3 Number p < 0.001 4 ≤ ≤ 4 > > 4 n = 14 y = (0.357, 0.643) 5 n = 9 y = (0.111, 0.889) 6 Start p < 0.001 7 ≤ ≤ 13 > > 13 n = 11 y = (0.818, 0.182) 8 n = 37 y = (1, 0) 9 Start p < 0.001 1 ≤ ≤ 8 > > 8 Start p < 0.001 2 ≤ ≤ 1 > > 1 n = 9 y = (0.778, 0.222) 3 n = 13 y = (0.154, 0.846) 4 Start p < 0.001 5 ≤ ≤ 12 > > 12 n = 12 y = (0.833, 0.167) 6 n = 47 y = (1, 0) 7 Start p < 0.001 1 ≤ ≤ 8 > > 8 n = 13 y = (0.308, 0.692) 2 Age p < 0.001 3 ≤ ≤ 87 > 87 n = 36 y = (1, 0) 4 Start p < 0.001 5 ≤ 13 > 13 n = 16 y = (0.75, 0.25) 6 n = 16 y = (1, 0) 7 Number p < 0.001 1 ≤ ≤ 5 > 5 Age p < 0.001 2 ≤ ≤ 81 > > 81 n = 33 y = (1, 0) 3 Start p < 0.001 4 ≤ ≤ 12 > > 12 n = 13 y = (0.385, 0.615) 5 Start p < 0.001 6 ≤ 15 > 15 n = 12 y = (0.833, 0.167) 7 n = 12 y = (1, 0) 8 n = 11 y = (0.364, 0.636) 9 Start p < 0.001 1 ≤ ≤ 12 > 12 Age p < 0.001 2 ≤ ≤ 81 > > 81 n = 20 y = (0.85, 0.15) 3 n = 16 y = (0.188, 0.812) 4 Start p < 0.001 5 ≤ 13 > 13 n = 11 y = (0.818, 0.182) 6 n = 34 y = (1, 0) 7 Start p < 0.001 1 ≤ ≤ 12 > 12 Age p < 0.001 2 ≤ ≤ 71 > > 71 n = 15 y = (0.667, 0.333) 3 n = 17 y = (0.235, 0.765) 4 Start p < 0.001 5 ≤ 14 > 14 n = 17 y = (0.882, 0.118) 6 n = 32 y = (1, 0) 7 Start p < 0.001 1 ≤ 12 > 12 Age p < 0.001 2 ≤ 68 > 68 Number p < 0.001 3 ≤ 4 > 4 n = 11 y = (1, 0) 4 n = 9 y = (0.556, 0.444) 5 n = 12 y = (0.25, 0.75) 6 n = 49 y = (1, 0) 7 Start p < 0.001 1 ≤ 12 > 12 Age p < 0.001 2 ≤ 18 > 18 n = 10 y = (0.9, 0.1) 3 Number p < 0.001 4 ≤ 4 > 4 n = 12 y = (0.417, 0.583) 5 n = 10 y = (0.2, 0.8) 6 Number p < 0.001 7 ≤ 3 > 3 n = 28 y = (1, 0) 8 n = 21 y = (0.952, 0.048) 9 Start p < 0.001 1 ≤ 8 > 8 Start p < 0.001 2 ≤ 3 > 3 n = 12 y = (0.667, 0.333) 3 n = 14 y = (0.143, 0.857) 4 Age p < 0.001 5 ≤ 136 > 136 n = 47 y = (1, 0) 6 n = 8 y = (0.75, 0.25) 7 Start p < 0.001 1 ≤ 12 > 12 n = 28 y = (0.607, 0.393) 2 Start p < 0.001 3 ≤ 14 > 14 n = 21 y = (0.905, 0.095) 4 n = 32 y = (1, 0) 5 Start p < 0.001 1 ≤ 1 > 1 n = 8 y = (0.375, 0.625) 2 Number p < 0.001 3 ≤ 4 > 4 Age p < 0.001 4 ≤ 125 > 125 n = 31 y = (1, 0) 5 n = 11 y = (0.818, 0.182) 6 n = 31 y = (0.806, 0.194) 7 Start p < 0.001 1 ≤ 14 > 14 Age p < 0.001 2 ≤ 71 > 71 n = 15 y = (0.933, 0.067) 3 Start p < 0.001 4 ≤ 12 > 12 n = 16 y = (0.375, 0.625) 5 n = 15 y = (0.733, 0.267) 6 n = 35 y = (1, 0) 7 Number p < 0.001 1 ≤ 6 > 6 Number p < 0.001 2 ≤ 3 > 3 Start p < 0.001 3 ≤ 13 > 13 n = 10 y = (0.8, 0.2) 4 n = 24 y = (1, 0) 5 n = 37 y = (0.865, 0.135) 6 n = 10 y = (0.5, 0.5) 7 Start p < 0.001 1 ≤ 8 > 8 n = 18 y = (0.5, 0.5) 2 Start p < 0.001 3 ≤ 12 > 12 n = 18 y = (0.833, 0.167) 4 Number p < 0.001 5 ≤ 3 > 3 n = 30 y = (1, 0) 6 n = 15 y = (0.933, 0.067) 7
Measuring variable importance A new, conditional importance Conclusion References
Measuring variable importance
Measuring variable importance A new, conditional importance Conclusion References
Measuring variable importance
◮ Gini importance
mean Gini gain produced by Xj over all trees (can be severely biased due to estimation bias and mutiple testing; Strobl et al., 2007)
Measuring variable importance A new, conditional importance Conclusion References
Measuring variable importance
◮ Gini importance
mean Gini gain produced by Xj over all trees (can be severely biased due to estimation bias and mutiple testing; Strobl et al., 2007)
◮ permutation importance
mean decrease in classification accuracy after permuting Xj over all trees (unbiased when subsampling is used; Strobl et al., 2007)
Measuring variable importance A new, conditional importance Conclusion References
The permutation importance
within each tree t VI (t)(xj) =
- i∈B
(t) I
- yi = ˆ
y(t)
i
- B
(t)
- −
- i∈B
(t) I
- yi = ˆ
y(t)
i,πj
- B
(t)
- ˆ
y(t)
i
= f (t)(xi) = predicted class before permuting ˆ y(t)
i,πj = f (t)(xi,πj) = predicted class after permuting Xj
xi,πj = (xi,1, . . . , xi,j−1, xπj(i),j, xi,j+1, . . . , xi,p
- Note: VI (t)(xj) = 0 by definition, if Xj is not in tree t
Measuring variable importance A new, conditional importance Conclusion References
The permutation importance
- ver all trees:
VI(xj) = ntree
t=1 VI (t)(xj)
ntree
Measuring variable importance A new, conditional importance Conclusion References
What null hypothesis does this permutation scheme correspond to?
- bs
Y Xj Z 1 y1 xπj(1),j z1 . . . . . . . . . . . . i yi xπj(i),j zi . . . . . . . . . . . . n yn xπj(n),j zn H0 : Xj ⊥ Y , Z or Xj ⊥ Y ∧ Xj ⊥ Z P(Y , Xj, Z)
H0
= P(Y , Z) · P(Xj)
Measuring variable importance A new, conditional importance Conclusion References
What null hypothesis does this permutation scheme correspond to?
the current null hypothesis reflects independence of Xj from both Y and the remaining predictor variables Z ⇒ a high variable importance can result from violation of either one!
Measuring variable importance A new, conditional importance Conclusion References
Suggestion: Conditional permutation scheme
- bs
Y Xj Z 1 y1 xπj|Z=a(1),j z1 = a 3 y3 xπj|Z=a(3),j z3 = a 27 y27 xπj|Z=a(27),j z27 = a 6 y6 xπj|Z=b(6),j z6 = b 14 y14 xπj|Z=b(14),j z14 = b 33 y33 xπj|Z=b(33),j z33 = b . . . . . . . . . . . . H0 : Xj ⊥ Y |Z P(Y , Xj|Z)
H0
= P(Y |Z) · P(Xj|Z)
- r P(Y |Xj, Z)
H0
= P(Y |Z)
Measuring variable importance A new, conditional importance Conclusion References
Technically
◮ use any partition of the feature space for conditioning
Measuring variable importance A new, conditional importance Conclusion References
Technically
◮ use any partition of the feature space for conditioning ◮ here: use binary partition already learned by tree
Measuring variable importance A new, conditional importance Conclusion References
Simulation study
◮ dgp: yi = β1 ·xi,1 +· · ·+β12 ·xi,12 +εi, εi i.i.d.
∼ N(0, 0.5)
◮ X1, . . . , X12 ∼ N(0, Σ)
Σ = 1 0.9 0.9 0.9 · · · 0.9 1 0.9 0.9 · · · 0.9 0.9 1 0.9 · · · 0.9 0.9 0.9 1 · · · 1 · · · . . . . . . . . . . . . . . . ... 1
Xj X1 X2 X3 X4 X5 X6 X7 X8 · · · X12 βj 5 5 2
- 5
- 5
- 2
· · ·
Measuring variable importance A new, conditional importance Conclusion References
Results
mtry = 1
- 5
15 25 mtry = 3
- 10
30 50 mtry = 8
- 1
2 3 4 5 6 7 8 9 10 11 12 20 40 60 80
variable
Measuring variable importance A new, conditional importance Conclusion References
Peptide-binding data
0.005 unconditional 0.005 conditional h2y8 flex8 pol3 *
Measuring variable importance A new, conditional importance Conclusion References
R-Example
spurious correlation between shoe size and reading skills in school-children
> mycf <- cforest(score ~ ., data = readingSkills, + control = cforest_unbiased(mtry = 2)) > varimp(mycf) nativeSpeaker age shoeSize 12.62926 74.89542 20.01108 > varimp(mycf, conditional = TRUE) nativeSpeaker age shoeSize 11.808192 46.995336 2.092454 from party 0.9-991
Measuring variable importance A new, conditional importance Conclusion References
Conclusion
◮ conditional permutation is expensive ◮ but gets us closer to the interpretation of
importance that we (statisticians) are used to → beta coefficients, partial correlations
◮ choice of mtry has a high impact
Measuring variable importance A new, conditional importance Conclusion References
General remarks
◮ default settings for mtry vary between implementations
e.g., for classification: randomForest: mtry= √p cforest: mtry= 5 small values of mtry may often be a good choice - but not in the case of correlated predictors!
◮ make sure your results are stable before interpreting
importance rankings fit another forest with a different random seed - if the ranking changes increase ntree
Measuring variable importance A new, conditional importance Conclusion References
Measuring variable importance A new, conditional importance Conclusion References