Party on! A new, conditional variable importance A new, conditional - - PowerPoint PPT Presentation

party on a new conditional variable
SMART_READER_LITE
LIVE PREVIEW

Party on! A new, conditional variable importance A new, conditional - - PowerPoint PPT Presentation

Measuring variable Party on! A new, conditional variable importance A new, conditional importance measure for random forests importance Conclusion available in party References Carolin Strobl (LMU M unchen) and Achim Zeileis (WU Wien)


slide-1
SLIDE 1

Measuring variable importance A new, conditional importance Conclusion References

Party on! A new, conditional variable importance measure for random forests available in party

Carolin Strobl (LMU M¨ unchen) and Achim Zeileis (WU Wien)

useR! 2009

slide-2
SLIDE 2

Measuring variable importance A new, conditional importance Conclusion References

Introduction

random forests

◮ have become increasingly popular in, e.g., genetics and

the neurosciences

◮ can deal with “small n large p”-problems, high-order

interactions, correlated predictor variables

◮ are used not only for prediction, but also to measure

variable importance (advantage: RF variable importance measures capture the effect of a variable in main effects and interactions → smarter for screening than univariate measures)

slide-3
SLIDE 3

Measuring variable importance A new, conditional importance Conclusion References

(Small) random forest

Start p < 0.001 1 ≤ ≤ 8 > > 8 n = 15 y = (0.4, 0.6) 2 Start p < 0.001 3 ≤ ≤ 14 > > 14 n = 34 y = (0.882, 0.118) 4 n = 32 y = (1, 0) 5 Start p < 0.001 1 ≤ ≤ 12 > > 12 n = 38 y = (0.711, 0.289) 2 Number p < 0.001 3 ≤ ≤ 3 > > 3 n = 25 y = (1, 0) 4 n = 18 y = (0.889, 0.111) 5 Start p < 0.001 1 ≤ ≤ 12 > > 12 Age p < 0.001 2 ≤ ≤ 27 > > 27 n = 10 y = (1, 0) 3 Number p < 0.001 4 ≤ ≤ 4 > > 4 n = 14 y = (0.357, 0.643) 5 n = 9 y = (0.111, 0.889) 6 Start p < 0.001 7 ≤ ≤ 13 > > 13 n = 11 y = (0.818, 0.182) 8 n = 37 y = (1, 0) 9 Start p < 0.001 1 ≤ ≤ 8 > > 8 Start p < 0.001 2 ≤ ≤ 1 > > 1 n = 9 y = (0.778, 0.222) 3 n = 13 y = (0.154, 0.846) 4 Start p < 0.001 5 ≤ ≤ 12 > > 12 n = 12 y = (0.833, 0.167) 6 n = 47 y = (1, 0) 7 Start p < 0.001 1 ≤ ≤ 8 > > 8 n = 13 y = (0.308, 0.692) 2 Age p < 0.001 3 ≤ ≤ 87 > 87 n = 36 y = (1, 0) 4 Start p < 0.001 5 ≤ 13 > 13 n = 16 y = (0.75, 0.25) 6 n = 16 y = (1, 0) 7 Number p < 0.001 1 ≤ ≤ 5 > 5 Age p < 0.001 2 ≤ ≤ 81 > > 81 n = 33 y = (1, 0) 3 Start p < 0.001 4 ≤ ≤ 12 > > 12 n = 13 y = (0.385, 0.615) 5 Start p < 0.001 6 ≤ 15 > 15 n = 12 y = (0.833, 0.167) 7 n = 12 y = (1, 0) 8 n = 11 y = (0.364, 0.636) 9 Start p < 0.001 1 ≤ ≤ 12 > 12 Age p < 0.001 2 ≤ ≤ 81 > > 81 n = 20 y = (0.85, 0.15) 3 n = 16 y = (0.188, 0.812) 4 Start p < 0.001 5 ≤ 13 > 13 n = 11 y = (0.818, 0.182) 6 n = 34 y = (1, 0) 7 Start p < 0.001 1 ≤ ≤ 12 > 12 Age p < 0.001 2 ≤ ≤ 71 > > 71 n = 15 y = (0.667, 0.333) 3 n = 17 y = (0.235, 0.765) 4 Start p < 0.001 5 ≤ 14 > 14 n = 17 y = (0.882, 0.118) 6 n = 32 y = (1, 0) 7 Start p < 0.001 1 ≤ 12 > 12 Age p < 0.001 2 ≤ 68 > 68 Number p < 0.001 3 ≤ 4 > 4 n = 11 y = (1, 0) 4 n = 9 y = (0.556, 0.444) 5 n = 12 y = (0.25, 0.75) 6 n = 49 y = (1, 0) 7 Start p < 0.001 1 ≤ 12 > 12 Age p < 0.001 2 ≤ 18 > 18 n = 10 y = (0.9, 0.1) 3 Number p < 0.001 4 ≤ 4 > 4 n = 12 y = (0.417, 0.583) 5 n = 10 y = (0.2, 0.8) 6 Number p < 0.001 7 ≤ 3 > 3 n = 28 y = (1, 0) 8 n = 21 y = (0.952, 0.048) 9 Start p < 0.001 1 ≤ 8 > 8 Start p < 0.001 2 ≤ 3 > 3 n = 12 y = (0.667, 0.333) 3 n = 14 y = (0.143, 0.857) 4 Age p < 0.001 5 ≤ 136 > 136 n = 47 y = (1, 0) 6 n = 8 y = (0.75, 0.25) 7 Start p < 0.001 1 ≤ 12 > 12 n = 28 y = (0.607, 0.393) 2 Start p < 0.001 3 ≤ 14 > 14 n = 21 y = (0.905, 0.095) 4 n = 32 y = (1, 0) 5 Start p < 0.001 1 ≤ 1 > 1 n = 8 y = (0.375, 0.625) 2 Number p < 0.001 3 ≤ 4 > 4 Age p < 0.001 4 ≤ 125 > 125 n = 31 y = (1, 0) 5 n = 11 y = (0.818, 0.182) 6 n = 31 y = (0.806, 0.194) 7 Start p < 0.001 1 ≤ 14 > 14 Age p < 0.001 2 ≤ 71 > 71 n = 15 y = (0.933, 0.067) 3 Start p < 0.001 4 ≤ 12 > 12 n = 16 y = (0.375, 0.625) 5 n = 15 y = (0.733, 0.267) 6 n = 35 y = (1, 0) 7 Number p < 0.001 1 ≤ 6 > 6 Number p < 0.001 2 ≤ 3 > 3 Start p < 0.001 3 ≤ 13 > 13 n = 10 y = (0.8, 0.2) 4 n = 24 y = (1, 0) 5 n = 37 y = (0.865, 0.135) 6 n = 10 y = (0.5, 0.5) 7 Start p < 0.001 1 ≤ 8 > 8 n = 18 y = (0.5, 0.5) 2 Start p < 0.001 3 ≤ 12 > 12 n = 18 y = (0.833, 0.167) 4 Number p < 0.001 5 ≤ 3 > 3 n = 30 y = (1, 0) 6 n = 15 y = (0.933, 0.067) 7

slide-4
SLIDE 4

Measuring variable importance A new, conditional importance Conclusion References

Measuring variable importance

slide-5
SLIDE 5

Measuring variable importance A new, conditional importance Conclusion References

Measuring variable importance

◮ Gini importance

mean Gini gain produced by Xj over all trees (can be severely biased due to estimation bias and mutiple testing; Strobl et al., 2007)

slide-6
SLIDE 6

Measuring variable importance A new, conditional importance Conclusion References

Measuring variable importance

◮ Gini importance

mean Gini gain produced by Xj over all trees (can be severely biased due to estimation bias and mutiple testing; Strobl et al., 2007)

◮ permutation importance

mean decrease in classification accuracy after permuting Xj over all trees (unbiased when subsampling is used; Strobl et al., 2007)

slide-7
SLIDE 7

Measuring variable importance A new, conditional importance Conclusion References

The permutation importance

within each tree t VI (t)(xj) =

  • i∈B

(t) I

  • yi = ˆ

y(t)

i

  • B

(t)

  • i∈B

(t) I

  • yi = ˆ

y(t)

i,πj

  • B

(t)

  • ˆ

y(t)

i

= f (t)(xi) = predicted class before permuting ˆ y(t)

i,πj = f (t)(xi,πj) = predicted class after permuting Xj

xi,πj = (xi,1, . . . , xi,j−1, xπj(i),j, xi,j+1, . . . , xi,p

  • Note: VI (t)(xj) = 0 by definition, if Xj is not in tree t
slide-8
SLIDE 8

Measuring variable importance A new, conditional importance Conclusion References

The permutation importance

  • ver all trees:

VI(xj) = ntree

t=1 VI (t)(xj)

ntree

slide-9
SLIDE 9

Measuring variable importance A new, conditional importance Conclusion References

What null hypothesis does this permutation scheme correspond to?

  • bs

Y Xj Z 1 y1 xπj(1),j z1 . . . . . . . . . . . . i yi xπj(i),j zi . . . . . . . . . . . . n yn xπj(n),j zn H0 : Xj ⊥ Y , Z or Xj ⊥ Y ∧ Xj ⊥ Z P(Y , Xj, Z)

H0

= P(Y , Z) · P(Xj)

slide-10
SLIDE 10

Measuring variable importance A new, conditional importance Conclusion References

What null hypothesis does this permutation scheme correspond to?

the current null hypothesis reflects independence of Xj from both Y and the remaining predictor variables Z ⇒ a high variable importance can result from violation of either one!

slide-11
SLIDE 11

Measuring variable importance A new, conditional importance Conclusion References

Suggestion: Conditional permutation scheme

  • bs

Y Xj Z 1 y1 xπj|Z=a(1),j z1 = a 3 y3 xπj|Z=a(3),j z3 = a 27 y27 xπj|Z=a(27),j z27 = a 6 y6 xπj|Z=b(6),j z6 = b 14 y14 xπj|Z=b(14),j z14 = b 33 y33 xπj|Z=b(33),j z33 = b . . . . . . . . . . . . H0 : Xj ⊥ Y |Z P(Y , Xj|Z)

H0

= P(Y |Z) · P(Xj|Z)

  • r P(Y |Xj, Z)

H0

= P(Y |Z)

slide-12
SLIDE 12

Measuring variable importance A new, conditional importance Conclusion References

Technically

◮ use any partition of the feature space for conditioning

slide-13
SLIDE 13

Measuring variable importance A new, conditional importance Conclusion References

Technically

◮ use any partition of the feature space for conditioning ◮ here: use binary partition already learned by tree

slide-14
SLIDE 14

Measuring variable importance A new, conditional importance Conclusion References

Simulation study

◮ dgp: yi = β1 ·xi,1 +· · ·+β12 ·xi,12 +εi, εi i.i.d.

∼ N(0, 0.5)

◮ X1, . . . , X12 ∼ N(0, Σ)

Σ =                1 0.9 0.9 0.9 · · · 0.9 1 0.9 0.9 · · · 0.9 0.9 1 0.9 · · · 0.9 0.9 0.9 1 · · · 1 · · · . . . . . . . . . . . . . . . ... 1               

Xj X1 X2 X3 X4 X5 X6 X7 X8 · · · X12 βj 5 5 2

  • 5
  • 5
  • 2

· · ·

slide-15
SLIDE 15

Measuring variable importance A new, conditional importance Conclusion References

Results

mtry = 1

  • 5

15 25 mtry = 3

  • 10

30 50 mtry = 8

  • 1

2 3 4 5 6 7 8 9 10 11 12 20 40 60 80

variable

slide-16
SLIDE 16

Measuring variable importance A new, conditional importance Conclusion References

Peptide-binding data

0.005 unconditional 0.005 conditional h2y8 flex8 pol3 *

slide-17
SLIDE 17

Measuring variable importance A new, conditional importance Conclusion References

R-Example

spurious correlation between shoe size and reading skills in school-children

> mycf <- cforest(score ~ ., data = readingSkills, + control = cforest_unbiased(mtry = 2)) > varimp(mycf) nativeSpeaker age shoeSize 12.62926 74.89542 20.01108 > varimp(mycf, conditional = TRUE) nativeSpeaker age shoeSize 11.808192 46.995336 2.092454 from party 0.9-991

slide-18
SLIDE 18

Measuring variable importance A new, conditional importance Conclusion References

Conclusion

◮ conditional permutation is expensive ◮ but gets us closer to the interpretation of

importance that we (statisticians) are used to → beta coefficients, partial correlations

◮ choice of mtry has a high impact

slide-19
SLIDE 19

Measuring variable importance A new, conditional importance Conclusion References

General remarks

◮ default settings for mtry vary between implementations

e.g., for classification: randomForest: mtry= √p cforest: mtry= 5 small values of mtry may often be a good choice - but not in the case of correlated predictors!

◮ make sure your results are stable before interpreting

importance rankings fit another forest with a different random seed - if the ranking changes increase ntree

slide-20
SLIDE 20

Measuring variable importance A new, conditional importance Conclusion References

slide-21
SLIDE 21

Measuring variable importance A new, conditional importance Conclusion References

Strobl, C., A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis (2008). Conditional variable importance for random forests. BMC Bioinformatics 9:307. Strobl, C., A.-L. Boulesteix, A. Zeileis, and T. Hothorn (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8:25.