iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, - PowerPoint PPT Presentation

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee, Weiwei Cui

Background • Random Forest Fraud Detection Medical Diagnosis Churn Prediction 1 Icons created by Anatolii Babii, Atif Arshad, and Dinosoft Labs from the Noun Project.

Background – Decision Tree 2

Background – Decision Tree 3

Background – Random Forest 4

Motivation – Random Forest 7

Random Forests are A+ predictors on performance but rate an F on interpretability L. Breiman “Statistical modeling: The two cultures.” 8

Interpretability 9 Source: https://xkcd.com/1838/

Interpretability Reveal the relationships between features and predictions Uncover the underlying working mechanisms Provide case-based reasoning 10 Icons created by Melvin, alrigel, and Dinosoft Labs from the Noun Project.

iForest: Interpreting Random Forests via Visual Analytics 11

iForest - Visual Components Data Overview Feature View Decision Path View 12

iForest – Data Overview Data Overview Feature View Decision Path View Provide case-based reasoning 14

iForest – Data Overview • Methods: confusion matrix and t-sne projection Predicted Values True False True False True Actual Values Positive Negative False True False Positive Negative 15

iForest – Data Overview • Methods: confusion matrix and t-sne projection Negative Positive each circle represents a data item Default View Panning & Zooming 16

iForest – Feature View Data Overview Feature View Decision Path View Reveal the relationships between features and predictions 17

iForest – Feature View • Methods: data distribution and partial dependence plot each cell illustrates the statistics and importance of a feature 18

iForest – Feature View • Methods: data distribution and partial dependence plot high Feature A (numerical) 19

iForest – Feature View • Methods: data distribution and partial dependence plot high x = 60 Feature A (numerical) 20

iForest – Feature View • Methods: data distribution and partial dependence plot Split point distribution Feature A (numerical) 21

iForest – Feature View • Methods: data distribution and partial dependence plot high Feature B (ordinal) high 22

iForest – Feature View Data Overview Feature View Decision Path View Uncover the underlying working mechanisms 23

iForest – Decision Path View • Goal: audit the decision process of a particular data item 24

iForest – Decision Path View • Decision Path Projection ration between positive and negative decision paths each circle represents a decision path lasso to select a specific set of paths for exploration Negative Positive 25

iForest – Decision Path View • Feature Summary Feature Cell: Summarize the feature ranges of the selected paths pixel-based bar chart: feature range summary vertical bar: feature value of the current data item 26

iForest – Decision Path View • Feature Summary Layer 1 (root) Layer 2 Layer 3 Decision Path I: A < 0.5 C < 3.5 C > 1.5 C > 2.5 A < 0.5 Decision Path II: 27

iForest – Decision Path View • Decision Path Flow: layer-level feature ranges Leaf Node Leaf Node 31

Evaluation – Usage Scenario • Two usage scenarios using the Titanic shipwreck and German Credit data • Titanic shipwreck statistics: • 891 passengers and 6 features after pre-processing • German Credit statistics: • 1,000 bank accounts and 9 features 32

Usage Scenario – Titanic

Evaluation – User Study • Qualitative user study • 10 participants recruited from local university and an industry research lab • 10 tasks covering all important aspects in random forest interpretation • 12 questions related with iForest usage in a post-session interview Task Completion Time (seconds) 30 25 20 15 10 5 0 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Task 10 34

Future Work • Support other tree-based model such as boosting trees • Support multi-class classification or regression • Support random forest diagnosis and debug 35

Q&A iForest: Interpreting Random Forests via Visual Analytics Yanhong Wu Email: yanwu@visa.com URL: http://yhwu.me

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, - PowerPoint PPT Presentation

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee, Weiwei Cui Background Random Forest Fraud Detection Medical Diagnosis Churn Prediction 1 Icons created by Anatolii Babii, Atif Arshad, and

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

A Look at our Wyoming Forests December 18 - 20, 2013 Governors Task Force on Forests Forests

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

Random forests and wine Machine Learning Toolbox Random forests Popular type of machine

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

Forests and Climate Forests and Climate Keeping Earth a Livable Place Keeping Earth a Livable

South- -East East Pahang Pahang Peat Peat South Swamp Forests, Malaysia Swamp Forests,

Mangrove forests and sea level rise 1 / 48 00001 - 00:00:01 Mangrove forests and sea level rise

Interactive Model Learning from High-Dimensional Data: A Visual Analytics Approach Klaus

Toward a Toward a Overview Sociology of Sociology of Introduction Interpreting

Econ 2148, fall 2019 Trees, forests, and causal trees Maximilian Kasy Department of Economics,

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Technologien und Mobilkommunikation Self-Healing in Self-Organising Networks Oliver Scheit

Web Reasoning Using Fact Tagging Mehdi Terdjimi, Lionel Mdini and Michael Mrissa Laboratoire

Navigation Around Humans Hey!! How you Do' in Importance, Approaches and the Future!!! WHY IS

Instance-based Learning Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1

TEACHERS USE OF LEARNING PROGRESSION-BASED FORMATIVE ASSESSMENT IN WATER INSTRUCTION Beth

The POESIA Decision Mechanism Alberto Raggioli, Stefan Guerra M.E.T.A. S.r.l. POESIA Final

Introduction to R Thomas J. Leeper Department of Political Science and Government Aarhus

CS 2112 Lab 2: Javadoc and I/O September 9 / 11, 2019 Importing A2 Create a new Eclipse

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, - PowerPoint PPT Presentation

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee, Weiwei Cui Background Random Forest Fraud Detection Medical Diagnosis Churn Prediction 1 Icons created by Anatolii Babii, Atif Arshad, and

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

A Look at our Wyoming Forests December 18 - 20, 2013 Governors Task Force on Forests Forests

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

Random forests and wine Machine Learning Toolbox Random forests Popular type of machine

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck &amp; Co., Inc.

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

Forests and Climate Forests and Climate Keeping Earth a Livable Place Keeping Earth a Livable

South- -East East Pahang Pahang Peat Peat South Swamp Forests, Malaysia Swamp Forests,

Mangrove forests and sea level rise 1 / 48 00001 - 00:00:01 Mangrove forests and sea level rise

Interactive Model Learning from High-Dimensional Data: A Visual Analytics Approach Klaus

Toward a Toward a Overview Sociology of Sociology of Introduction Interpreting

Econ 2148, fall 2019 Trees, forests, and causal trees Maximilian Kasy Department of Economics,

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Technologien und Mobilkommunikation Self-Healing in Self-Organising Networks Oliver Scheit

Web Reasoning Using Fact Tagging Mehdi Terdjimi, Lionel Mdini and Michael Mrissa Laboratoire

Navigation Around Humans Hey!! How you Do' in Importance, Approaches and the Future!!! WHY IS

Instance-based Learning Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1

TEACHERS USE OF LEARNING PROGRESSION-BASED FORMATIVE ASSESSMENT IN WATER INSTRUCTION Beth

The POESIA Decision Mechanism Alberto Raggioli, Stefan Guerra M.E.T.A. S.r.l. POESIA Final

Introduction to R Thomas J. Leeper Department of Political Science and Government Aarhus

CS 2112 Lab 2: Javadoc and I/O September 9 / 11, 2019 Importing A2 Create a new Eclipse

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.