iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, - - PowerPoint PPT Presentation

iforest interpreting random forests via visual analytics
SMART_READER_LITE
LIVE PREVIEW

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, - - PowerPoint PPT Presentation

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee, Weiwei Cui Background Random Forest Fraud Detection Medical Diagnosis Churn Prediction 1 Icons created by Anatolii Babii, Atif Arshad, and


slide-1
SLIDE 1

iForest: Interpreting Random Forests via Visual Analytics

Xun Zhao, Ya Yanhong Wu, Dik Lun Lee, Weiwei Cui

slide-2
SLIDE 2

1

  • Random Forest

Background

Fraud Detection Medical Diagnosis Churn Prediction

Icons created by Anatolii Babii, Atif Arshad, and Dinosoft Labs from the Noun Project.

slide-3
SLIDE 3

2

Background – Decision Tree

slide-4
SLIDE 4

3

Background – Decision Tree

slide-5
SLIDE 5

4

Background – Random Forest

slide-6
SLIDE 6

5

Background – Random Forest

slide-7
SLIDE 7

6

Background – Random Forest

slide-8
SLIDE 8

7

Motivation – Random Forest

slide-9
SLIDE 9

8

Random Forests are A+ predictors on performance but rate an F on interpretability

  • L. Breiman “Statistical modeling: The two cultures.”
slide-10
SLIDE 10

9

Interpretability

Source: https://xkcd.com/1838/

slide-11
SLIDE 11

10

Reveal the relationships between features and predictions

Interpretability

Icons created by Melvin, alrigel, and Dinosoft Labs from the Noun Project.

Uncover the underlying working mechanisms Provide case-based reasoning

slide-12
SLIDE 12

11

iForest: Interpreting Random Forests via Visual Analytics

slide-13
SLIDE 13

12

iForest - Visual Components

Data Overview Feature View Decision Path View

slide-14
SLIDE 14

Demo

slide-15
SLIDE 15

14

iForest – Data Overview

Provide case-based reasoning

Data Overview Feature View Decision Path View

slide-16
SLIDE 16

15

iForest – Data Overview

  • Methods: confusion matrix and t-sne projection

True False True True Positive False Negative False False Positive True Negative Predicted Values Actual Values

slide-17
SLIDE 17

16

iForest – Data Overview

  • Methods: confusion matrix and t-sne projection

Positive Negative Panning & Zooming each circle represents a data item Default View

slide-18
SLIDE 18

17

iForest – Feature View

Data Overview Feature View Decision Path View

Reveal the relationships between features and predictions

slide-19
SLIDE 19

18

iForest – Feature View

  • Methods: data distribution and partial dependence plot

each cell illustrates the statistics and importance of a feature

slide-20
SLIDE 20

19

iForest – Feature View

  • Methods: data distribution and partial dependence plot

Feature A (numerical)

high

slide-21
SLIDE 21

20

iForest – Feature View

  • Methods: data distribution and partial dependence plot

Feature A (numerical) x = 60

high

slide-22
SLIDE 22

21

iForest – Feature View

  • Methods: data distribution and partial dependence plot

Feature A (numerical) Split point distribution

slide-23
SLIDE 23

22

iForest – Feature View

  • Methods: data distribution and partial dependence plot

Feature B (ordinal)

high high

slide-24
SLIDE 24

23

iForest – Feature View

Data Overview Feature View Decision Path View

Uncover the underlying working mechanisms

slide-25
SLIDE 25

24

iForest – Decision Path View

  • Goal: audit the decision process of a particular data item
slide-26
SLIDE 26

25

iForest – Decision Path View

  • Decision Path Projection

Positive Negative ration between positive and negative decision paths each circle represents a decision path lasso to select a specific set of paths for exploration

slide-27
SLIDE 27

26

iForest – Decision Path View

  • Feature Summary

Feature Cell: Summarize the feature ranges of the selected paths vertical bar: feature value of the current data item pixel-based bar chart: feature range summary

slide-28
SLIDE 28

27

iForest – Decision Path View

  • Feature Summary

A < 0.5 C > 1.5 C < 3.5

Decision Path I:

C > 2.5 A < 0.5

Decision Path II:

Layer 1 (root) Layer 2 Layer 3

slide-29
SLIDE 29

28

iForest – Decision Path View

  • Feature Summary

A < 0.5 C > 1.5 C < 3.5

Decision Path I:

C > 2.5 A < 0.5

Decision Path II:

Layer 1 (root) Layer 2 Layer 3

slide-30
SLIDE 30

29

iForest – Decision Path View

  • Feature Summary

A < 0.5 C > 1.5 C < 3.5

Decision Path I:

C > 2.5 A < 0.5

Decision Path II:

Layer 1 (root) Layer 2 Layer 3

slide-31
SLIDE 31

30

iForest – Decision Path View

  • Feature Summary

A < 0.5 C > 1.5 C < 3.5

Decision Path I:

C > 2.5 A < 0.5

Decision Path II:

Layer 1 (root) Layer 2 Layer 3

slide-32
SLIDE 32

31

iForest – Decision Path View

  • Decision Path Flow: layer-level feature ranges

Leaf Node Leaf Node

slide-33
SLIDE 33

32

Evaluation – Usage Scenario

  • Two usage scenarios using the Titanic shipwreck and German Credit data
  • Titanic shipwreck statistics:
  • 891 passengers and 6 features after pre-processing
  • German Credit statistics:
  • 1,000 bank accounts and 9 features
slide-34
SLIDE 34

Usage Scenario – Titanic

slide-35
SLIDE 35

34

Evaluation – User Study

  • Qualitative user study
  • 10 participants recruited from local university and an industry research lab
  • 10 tasks covering all important aspects in random forest interpretation
  • 12 questions related with iForest usage in a post-session interview

5 10 15 20 25 30 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Task 10

Task Completion Time (seconds)

slide-36
SLIDE 36

35

  • Support other tree-based model such as boosting trees
  • Support multi-class classification or regression
  • Support random forest diagnosis and debug

Future Work

slide-37
SLIDE 37

Q&A

iForest: Interpreting Random Forests via Visual Analytics

Yanhong Wu Email: yanwu@visa.com URL: http://yhwu.me