iforest interpreting random forests via visual analytics
play

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, - PowerPoint PPT Presentation

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee, Weiwei Cui Background Random Forest Fraud Detection Medical Diagnosis Churn Prediction 1 Icons created by Anatolii Babii, Atif Arshad, and


  1. iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee, Weiwei Cui

  2. Background • Random Forest Fraud Detection Medical Diagnosis Churn Prediction 1 Icons created by Anatolii Babii, Atif Arshad, and Dinosoft Labs from the Noun Project.

  3. Background – Decision Tree 2

  4. Background – Decision Tree 3

  5. Background – Random Forest 4

  6. Background – Random Forest 5

  7. Background – Random Forest 6

  8. Motivation – Random Forest 7

  9. Random Forests are A+ predictors on performance but rate an F on interpretability L. Breiman “Statistical modeling: The two cultures.” 8

  10. Interpretability 9 Source: https://xkcd.com/1838/

  11. Interpretability Reveal the relationships between features and predictions Uncover the underlying working mechanisms Provide case-based reasoning 10 Icons created by Melvin, alrigel, and Dinosoft Labs from the Noun Project.

  12. iForest: Interpreting Random Forests via Visual Analytics 11

  13. iForest - Visual Components Data Overview Feature View Decision Path View 12

  14. Demo

  15. iForest – Data Overview Data Overview Feature View Decision Path View Provide case-based reasoning 14

  16. iForest – Data Overview • Methods: confusion matrix and t-sne projection Predicted Values True False True False True Actual Values Positive Negative False True False Positive Negative 15

  17. iForest – Data Overview • Methods: confusion matrix and t-sne projection Negative Positive each circle represents a data item Default View Panning & Zooming 16

  18. iForest – Feature View Data Overview Feature View Decision Path View Reveal the relationships between features and predictions 17

  19. iForest – Feature View • Methods: data distribution and partial dependence plot each cell illustrates the statistics and importance of a feature 18

  20. iForest – Feature View • Methods: data distribution and partial dependence plot high Feature A (numerical) 19

  21. iForest – Feature View • Methods: data distribution and partial dependence plot high x = 60 Feature A (numerical) 20

  22. iForest – Feature View • Methods: data distribution and partial dependence plot Split point distribution Feature A (numerical) 21

  23. iForest – Feature View • Methods: data distribution and partial dependence plot high Feature B (ordinal) high 22

  24. iForest – Feature View Data Overview Feature View Decision Path View Uncover the underlying working mechanisms 23

  25. iForest – Decision Path View • Goal: audit the decision process of a particular data item 24

  26. iForest – Decision Path View • Decision Path Projection ration between positive and negative decision paths each circle represents a decision path lasso to select a specific set of paths for exploration Negative Positive 25

  27. iForest – Decision Path View • Feature Summary Feature Cell: Summarize the feature ranges of the selected paths pixel-based bar chart: feature range summary vertical bar: feature value of the current data item 26

  28. iForest – Decision Path View • Feature Summary Layer 1 (root) Layer 2 Layer 3 Decision Path I: A < 0.5 C < 3.5 C > 1.5 C > 2.5 A < 0.5 Decision Path II: 27

  29. iForest – Decision Path View • Feature Summary Layer 1 (root) Layer 2 Layer 3 Decision Path I: A < 0.5 C < 3.5 C > 1.5 C > 2.5 A < 0.5 Decision Path II: 28

  30. iForest – Decision Path View • Feature Summary Layer 1 (root) Layer 2 Layer 3 Decision Path I: A < 0.5 C < 3.5 C > 1.5 C > 2.5 A < 0.5 Decision Path II: 29

  31. iForest – Decision Path View • Feature Summary Layer 1 (root) Layer 2 Layer 3 Decision Path I: A < 0.5 C < 3.5 C > 1.5 C > 2.5 A < 0.5 Decision Path II: 30

  32. iForest – Decision Path View • Decision Path Flow: layer-level feature ranges Leaf Node Leaf Node 31

  33. Evaluation – Usage Scenario • Two usage scenarios using the Titanic shipwreck and German Credit data • Titanic shipwreck statistics: • 891 passengers and 6 features after pre-processing • German Credit statistics: • 1,000 bank accounts and 9 features 32

  34. Usage Scenario – Titanic

  35. Evaluation – User Study • Qualitative user study • 10 participants recruited from local university and an industry research lab • 10 tasks covering all important aspects in random forest interpretation • 12 questions related with iForest usage in a post-session interview Task Completion Time (seconds) 30 25 20 15 10 5 0 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Task 10 34

  36. Future Work • Support other tree-based model such as boosting trees • Support multi-class classification or regression • Support random forest diagnosis and debug 35

  37. Q&A iForest: Interpreting Random Forests via Visual Analytics Yanhong Wu Email: yanwu@visa.com URL: http://yhwu.me

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend